Basil Faruqui, BMC Software | BigData NYC 2017

>> Live from Midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (calm electronic music) >> Basil Faruqui, who's the Solutions Marketing Manger at BMC, welcome to theCUBE. >> Thank you, good to be back on theCUBE. >> So first of all, heard you guys had a tough time in Houston, so hope everything's gettin' better, and best wishes to everyone down in-- >> We're definitely in recovery mode now. >> Yeah and so hopefully that can get straightened out quick. What's going on with BMC? Give us a quick update in context to BigData NYC. What's happening, what is BMC doing in the big data space now, the AI space now, the IOT space now, the cloud space? >> So like you said that, you know, the data link space, the IOT space, the AI space, there are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline it's ingestion, storage, processing, and analytics. What keeps changing around it, is the infrastructure, the types of data, the volume of data, and the applications that surround it. And the rate of change has picked up immensely over the last few years with Hadoop coming in to the picture, public cloud providers pushing it. It's obviously creating a number of challenges, but one of the biggest challenges that we are seeing in the market, and we're helping costumers address, is a challenge of automating this and, obviously, the benefit of automation is in scalability as well and reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex, how do you automate all of this from a single point of control? How do you continue to absorb new technologies, and not re-architect our automation strategy every time, whether it's it Hadoop, whether it's bringing in machine learning from a cloud provider? And that is the issue we've been solving for customers-- >> Alright let me jump into it. So, first of all, you mention some things that never change, ingestion, storage, and what's the third one? >> Ingestion, storage, processing and eventually analytics. >> And analytics. >> Okay so that's cool, totally buy that. Now if your move and say, hey okay, if you believe that standard, but now in the modern era that we live in, which is complex, you want breath of data, but also you want the specialization when you get down to machine limits highly bounded, that's where the automation is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Correct >> How do you architect that? If I'm a CXO, or I'm a CDO, what's in it for me? How do I architect this? 'Cause that's really the number one thing, as I know what the building blocks are, but they've changed in their dynamics to the market place. >> So the way I look at it, is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot, and you spend three months on it, and you deliver some results, but if you cannot roll it out worldwide, nationwide, whatever it is, essentially the project has failed. The analogy I often given is Walmart has been testing the pick-up tower, I don't know if you've seen. So this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about a hundred stores today. Now if that's a success, and Walmart wants to roll this out nation wide, how much time do you think their IT department's going to have? Is this a five year project, a ten year project? No, and the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation, you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> But you're describing a very complex automation scenario. How can you automate in a hurry without sacrificing the details of what needs to be? In other words, there would seem to call for repurposing or reusing prior automation scripts and rules, so forth. How can the Walmart's of the world do that fast, but also do it well? >> Yeah so we do it, we go about it in two ways. One is that out of the box we provide a lot of pre-built integrations to some of the most commonly used systems in an enterprise. All the way from the Mainframes, Oracles, SAPs, Hadoop, Tableaus of the world, they're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago was that the automation was something that was considered close to the project becoming operational. Okay, and that's where a lot of rework happened because developers had been writing their own scripts using point solutions, so we said alright, it's time to shift automation left, and allow companies to build automations and artifact very early in the developmental life cycle. About a month ago, we released what we call Control-M Workbench, its essentially a community edition of Control-M, targeted towards developers so that instead of writing their own scripts, they can use Control-M in a completely offline manner, without having to connect to an enterprise system. As they build, and test, and iterate, they're using Control-M to do that. So as the application progresses through the development life cycle, and all of that work can then translate easily into an enterprise edition of Control-M. >> Just want to quickly define what shift left means for the folks that might not know software methodologies, they don't think >> Yeah, so. of left political, left or right. >> So, we're not shifting Control-M-- >> Alt-left, alt-right, I mean, this is software development, so quickly take a minute and explain what shift left means, and the importance of it. >> Correct, so if you think of software development as a straight line continuum, you've got, you will start with building some code, you will do some testing, then unit testing, then user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. Developers would come in and deliver the application to Ops and Ops would say, well hang on a second, all this Crontab, and these other point solutions we've been using for automation, that's not what we use in production, and we need you to now go right in-- >> So test early and often. >> Test early and often. So the challenge was the developers, the tools they used were not the tools that were being used on the production end of the site. And there was good reason for it, because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Now Control-M Workbench is a very light version, which is targeted at developers and focuses on the needs that they have when they're building and developing it. So as the application progresses-- >> How much are you seeing waterfall-- >> But how much can they, go ahead. >> How much are you seeing waterfall, and then people shifting left becoming more prominent now? What percentage of your customers have moved to Agile, and shifting left percentage wise? >> So we survey our customers on a regular basis, and the last survey showed that eighty percent of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it, And that's the other-- >> And getting close to a 100 as possible, pretty much. >> Yeah, exactly. The tipping point is reached. >> And what is driving. >> What is driving all is the need from the business. The days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. >> Iteration, yeah, yeah. And we have also innovated in that space, and the approach we call jobs as code, where you can build entire complex data pipelines in code format, so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and I'll let you take the floor and get a word in soon, but I have one final question on this BMC methodology thing. You guys have a history, obviously BMC goes way back. Remember Max Watson CEO, and Bob Beach, back in '97 we used to chat with him, dominated that landscape. But we're kind of going back to a systems mindset. The question for you is, how do you view the issue of this holy grail, the promised land of AI and machine learning, where end-to-end visibility is really the goal, right? At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So there's a trade-off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market? Because customers want the end-to-end promise, but they don't want to try to get there too fast. There's a diseconomies of scale potentially. How do you talk about that? >> Correct. >> And that's exactly the approach we've taken with Control-M Workbench, the Community Edition, because earlier on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build and test and show value, okay, and they don't need something that is with all the bells and whistles. We're allowing you to handle that piece, in that manner, through Control-M Workbench. As things progress and the application progresses, the needs change as well. Well now I'm closer to delivering this to the business, I need to be able to manage this within an SLA, I need to be able to manage this end-to-end and connect this to other systems of record, and streaming data, and clickstream data, all of that. So that, we believe that it doesn't have to be a trade off, that you don't have to compromise speed and quality for end-to-end visibility and enterprise grade automation. >> You mentioned trade offs, so the Control-M Workbench, the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation when the tool's offline? I mean it seems like the more development they do offline, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate, the mitigation risk in using Control-M Workbench. >> Sure, so we spend a lot of time observing how developers work, right? And very early in the development stage, all they're doing is working off of their Mac or their laptop, and they're not really connected to any. And that is where they end up writing a lot of scripts, because whatever code business logic they've written, the way they're going to make it run is by writing scripts. And that, essentially, becomes the problem, because then you have scripts managing more scripts, and as the application progresses, you have this complex web of scripts and Crontabs and maybe some opensource solutions, trying to simply make all of this run. And by doing this on an offline manner, that doesn't mean that they're losing all of the other Control-M capabilities. Simply, as the application progresses, whatever automation that the builtin Control-M can seamlessly now flow into the next stage. So when you are ready to take an application into production, there's essentially no rework required from an automation perspective. All of that, that was built, can now be translated into the enterprise-grade Control M, and that's where operations can then go in and add the other artifacts, such as SLA management and forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives, 'cause, so you're like an analyst here, so Jim, I want you guys to comment. My question to both of you would be, lookin' at this time in history, obviously in the BMC side we mention some of the history, you guys are transforming on a new journey in extending that capability of this world. Jim, you're covering state-of-the-art AI machine learning. What's your take of this space now? Strata Data, which is now Hadoop World, which is Cloud Air went public, Hortonworks is now public, kind of the big, the Hadoop guys kind of grew up, but the world has changed around them, it's not just about Hadoop anymore. So I'd like to get your thoughts on this kind of perspective, that we're seeing a much broader picture in big data in NYC, versus the Strata Hadoop show, which seems to be losing steam, but I mean in terms of the focus. The bigger focus is much broader, horizontally scalable. And your thoughts on the ecosystem right now? >> Let the Basil answer fist, unless Basil wants me to go first. >> I think that the reason the focus is changing, is because of where the projects are in their lifecycle. Now what we're seeing is most companies are grappling with, how do I take this to the next level? How do I scale? How do I go from just proving out one or two use cases to making the entire organization data driven, and really inject data driven decision making in all facets of decision making? So that is, I believe what's driving the change that we're seeing, that now you've gone from Strata Hadoop to being Strata Data, and focus on that element. And, like I said earlier, the difference between success and failure is your ability to scale and operationalize. Take machine learning for an example. >> Good, that's where there's no, it's not a hype market, it's show me the meat on the bone, show me scale, I got operational concerns of security and what not. >> And machine learning, that's one of the hottest topics. A recent survey I read, which pulled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models, and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of a data scientist's time, and that is exactly one of the problems we're solving for our customers around the world. >> That needs to be automated to the hilt. To help them >> Correct. to be more productive, to deliver faster results. >> Ecosystem perspective, Jim, what's your thoughts? >> Yeah, everything that Basil said, and I'll just point out that many of the core uses cases for AI are automation of the data pipeline. It's driving machine learning driven predictions, classifications, abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's goin' on. The history, historical data, what's goin' on in terms of current streaming data, to drive optimal outcomes, using predictive models and so forth, in line to applications. So really, fundamentally then, what's goin' on is that automation is an artifact that needs to be driven into your application architecture as a repurposable resource for a variety of-- >> Do customers even know what to automate? I mean, that's the question, what do I-- >> You're automating human judgment. You're automating effort, like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that can be automated, 'cause those are pattern structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on with a Glass'Gim CSK, with that scale, and his attitude is, we see the results from the users, then we double down and pay for it and automate it. So the automation question, it's an option question, it's a rhetorical question, but it just begs the question, which is who's writing the algorithms as machines get smarter and start throwing off their own real-time data? What are you looking at? How do you determine? You're going to need machine learning for machine learning? Are you going to need AI for AI? Who writes the algorithms >> It's actually, that's. for the algorithm? >> Automated machine learning is a hot, hot not only research focus, but we're seeing it more and more solution providers, like Microsoft and Google and others, are goin' deep down, doubling down in investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion. I see you're startin' to some things with blockchain and some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now? Because people might have a historical view of BMC. What's the latest, what should they know? What's the new Instagram picture of BMC? What should they know about you guys? >> So I think what I would say people should know about BMC is that all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very, very mature solution, an example of which is Navistar. Their CIO's actually speaking at the Keno tomorrow. They've had Control-M for 15, 20 years, and they've automated virtually every business function through Control-M. And when they started their predictive maintenance project, where they're ingesting data from about 300,000 vehicles today to figure out when this vehicle might break, and to predict maintenance on it. When they started their journey, they said that they always knew that they were going to use Control-M for it, because that was the enterprise standard, and they knew that they could simply now extend that capability into this area. And when they started about three, four years ago, they were ingesting data from about 100,000 vehicles. That has now scaled to over 325,000 vehicles, and they have no had to re-architect their strategy as they grow and scale. So I would say that is one of the key messages that we are taking to market, is that we are bringing innovation that spans over 25 years, and evolving it-- >> Modernizing it, basically. >> Modernizing it, and bringing it to newer platforms. >> Well congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing kind of the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here, on BigData NYC, this is the theCUBE, I'm John Furrier. Jim Kobielus here in New York city. More live coverage, for three days we'll be here, today, tomorrow and Thursday, and BigData NYC, more coverage after this short break. (calm electronic music) (vibrant electronic music)

Published Date : Feb 11 2019

SUMMARY :

Brought to you by SiliconANGLE Media who's the Solutions Marketing Manger at BMC, in the big data space now, the AI space now, And that is the issue we've been solving for customers-- So, first of all, you mention some things that never change, and eventually analytics. but now in the modern era that we live in, 'Cause that's really the number one thing, No, and the management's going to How can the Walmart's of the world do that fast, One is that out of the box we provide a lot of left political, left or right. Alt-left, alt-right, I mean, this is software development, and we need you to now go right in-- and focuses on the needs that they have And getting close to a 100 The tipping point is reached. The days of the five year implementation timelines are gone. and the approach we call jobs as code, At the same time, you want bounded experiences at root level And that's exactly the approach I mean it seems like the more development and as the application progresses, kind of the big, the Hadoop guys kind of grew up, Let the Basil answer fist, and focus on that element. it's not a hype market, it's show me the meat of the problems we're solving That needs to be automated to the hilt. to be more productive, to deliver faster results. and I'll just point out that many of the core uses cases like the judgments that a working data engineer makes So the automation question, it's an option question, for the algorithm? doubling down in investments in exactly that area. What's the latest, what should they know? should know about BMC is that all the work kind of modernizing kind of the core things. Thanks for coming and sharing the BMC perspective

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
BMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Basil Faruqui	PERSON	0.99+
five year	QUANTITY	0.99+
ten months	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Furrier	PERSON	0.99+
15	QUANTITY	0.99+
Basil	PERSON	0.99+
Houston	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
BMC Software	ORGANIZATION	0.99+
two ways	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
One	QUANTITY	0.99+
ten year	QUANTITY	0.99+
over 25 years	QUANTITY	0.99+
over 325,000 vehicles	QUANTITY	0.99+
about 300,000 vehicles	QUANTITY	0.99+
third one	QUANTITY	0.99+
three days	QUANTITY	0.99+
about 100,000 vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
Thursday	DATE	0.98+
eighty percent	QUANTITY	0.98+
today	DATE	0.98+
20 years	QUANTITY	0.98+
one quick question	QUANTITY	0.98+
single point	QUANTITY	0.98+
Bob Beach	PERSON	0.97+
four years ago	DATE	0.97+
two use cases	QUANTITY	0.97+
one final question	QUANTITY	0.97+
'97	DATE	0.97+
Instagram	ORGANIZATION	0.97+
Agile	TITLE	0.96+
New York city	LOCATION	0.96+
About a month ago	DATE	0.96+
Oracles	ORGANIZATION	0.96+
Hadoop	TITLE	0.95+
about a hundred stores	QUANTITY	0.94+
less than 3%	QUANTITY	0.94+
2017	DATE	0.93+
Glass'Gim	ORGANIZATION	0.92+
about	QUANTITY	0.92+
first	QUANTITY	0.91+
Ops	ORGANIZATION	0.91+
Hadoop	ORGANIZATION	0.9+
Max Watson	PERSON	0.88+
100	QUANTITY	0.88+
theCUBE	ORGANIZATION	0.88+
Mainframes	ORGANIZATION	0.88+
Navistar	ORGANIZATION	0.86+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

ENTITIES

Entity	Category	Confidence
Stephanie	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Greg Sands	PERSON	0.99+
John	PERSON	0.99+
Caleb	PERSON	0.99+
John Furrier	PERSON	0.99+
Nenshad	PERSON	0.99+
New York	LOCATION	0.99+
Prakash	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2013	DATE	0.99+
thousands	QUANTITY	0.99+
Costanoa Ventures	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
two companies	QUANTITY	0.99+
both companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Trifacta	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Strata Data	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
eBay	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two values	QUANTITY	0.99+
NYC	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Big Data	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Strata Hadoop	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
earlier this week	DATE	0.98+
Paxata	PERSON	0.98+
today	DATE	0.98+
Day Three	QUANTITY	0.98+
Parquet	TITLE	0.96+
three years ago	DATE	0.96+

Carey James, Jason Schroedl, & Matt Maccaux | Big Data NYC 2017

>> Narrator: Live from Midtown Manhattan, it's theCUBE, covering BigData New York City 2017 Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hey, welcome back everyone, live in New York, it's theCUBE coverage, day three of three days of wall-to-wall coverage of BigData at NYC, in conjunction with Strata Data right around the corner, separate event than ours, we've been covering. It's our eighth year. We're here expanding on our segment we just had with Matt from Deli EMC on, really on the front lines consultant, we've got Jason from BlueData, and Casey from BlueTalon, two separate companies but the blue in the name, team blue. And of course, Matt from Dell EMC, guys, welcome back to theCUBE and let's talk about the partnerships. I know you guys have a partnership, Dell EMC leads the front lines mostly with the customer base you guys come in with the secret sauce to help that solution which I want to get to in a minute, but the big theme here this week is partnerships. And before we get into the relationship that you guys have, I want you to talk about the changes in the ecosystem, because we're seeing a couple key things. Open source, one, and it's winning, continues to grow, but the Linux Foundation pointed out the open source that we cover that exponential growth is going to be in open-source software. You can see from 4 lines of code to billions in the next 10 years. So more onboarding, so clear development path. Ecosystems have work. Now they're coming into the enterprise with suppliers, whether it's consulting, it's front-end, or full stack developers coming together. How do you see ecosystems playing in both the supplier side and also the customer side? >> So we see from the supplier side, right, and from the customer side as well, and it kind of drives both of those conversations together is that you had the early days of I don't want vendor lock-in, right, I want to have a disparate virtual cornucopia of tools in the marketplace, and then they were, each individual shop was trying to develop those and implement those on their own. And what you're now seeing is that companies still want that diversity in the tools that they utilize, and that they work with, but they don't want that, the complication of having to deliver all those tools themselves, and so they're looking more for partners that can actually bring an ecosystem to the table where it's a loose coupling of events, but that one person actually has the forefront, has the customer's best interest in mind, and actually being able to drive through those pieces. And that's what we see from a partnership, why we're driving towards partnerships, 'cause we can be a point solution, we can solve a lot of pieces, but by bringing us as a part of an ecosystem and with a partner that can actually help deliver the customer and business value to the customer, that's where we're starting to see the traction and the movement and the wins for us as an organization. >> BlueData, you guys have had very big successes, big data as a service, docker containers, this is the programmer's nirvana. Infrastructure plus code, that's the DevOps ethos going mainstream. Your thoughts on partnering, 'cause you can't do it alone. >> Yeah, I mean, for us, speaking of DevOps, and we see our software platform provides a solution for bringing a DevOps approach to data science and big data analytics. And it's much more streamlined approached, an elastic and agile approach to big data analytics and data science, but to your point, we're partnered with Dell EMC because they bring together an entire solution that delivers an elastic platform for secure multi-tenant environments for data science teams and analytics teams for a variety of different open source tool sets. So there is a large ecosystem of open source tools out there from Hadoop to Spark to Kafka to a variety of different data science, machine learning and deep learning tool sets out there, and we provide through our platform the ability to dockerize all of those environments, make them available through self-service to the data science community so they can get up and running quickly and start building their models and running their algorithms. And for us, it's on any infrastructure. So, we work closely with Dell EMC to run it on Isilon and their infrastructure, Dell-powered servers, but also you can run it in a hybrid cloud architecture. So you could run it on Azure and now GCP, and AWS. >> So this is the agility piece for the developer. They get a lot of agility, they get their security. Dell EMC has all the infrastructure side, so you got to partner together. Matt, pull this together. The customer doesn't want, they want a single pane of glass, or however you want to look at it, they don't want to deal with the nuances. You guys got to bring it all together. They want it to work. Now the theme I hear at BigData New York is integration is everything, right, so, if it doesn't integrate, the plumbings not working. How important is it for the customer to have this smooth, seamless experience? >> It's critical for them to, they have to be able to believe that it's going to be a seamless experience, and these are just two partners in the ecosystem. When we talk to enterprise customers, they have other vendors. They have half a dozen or a dozen other vendors solving big data problems, right? The Hadoop analytic tools, on and on and on. And when they choose a partner like us, they want to see that we are bringing other partners to the table that are going to complement or enhance capabilities that they have, but they want to see two key things. And we need to see the same things as well when we look at our partnerships. We want to see APIs, we want to see open APIs that are well-documented so that we know these tools can play with each other, and two, these have to be organizations we can work with. At the end of the day, a customer does business with Dell EMC because they know we're going to stand behind whatever we put in front of them. >> John: They get a track record too, you're pretty solid. >> Yep, it is-- >> But I want to push on the ecosystem, not you guys, it's critical, but I mean one thing that I've seen over my 30 years in the enterprise is ecosystems, you see bullshit and you see real deal, right, so. A lot of customers are scared, now with all this FUD and new technology, it's hard to squint through what the BS is in an ecosystem. So how do you do ecosystems right in this new market? 'Cause like you said, it's not API, that's kind of technical, but philosophy-wise you can't do the barney deals, you got Pat Gelsinger standing up on stage at VMworld, basically flew down to stand in front of all the customers of VMworld's customers and said, we're not doing a barney deal. Now, he didn't say barney deals, that's our old term. He said, it's not an optical deal we're doing with VMware. We got your back. He didn't say that, but that's my interpretation, that's what he basically said. The CEO of AWS said that. That's a partner, you know what I'm saying? So, some deals are okay we got a deal on paper, what's the difference, how do you run an ecosystem, in your opinion? >> Yeah, it's not trivial. It's not an easy thing. It takes an executive, at that level, it takes a couple of executives coming together-- >> John: From the top, obviously. >> Committing, it's not just money, it's reputation, right? If you're at that level, it's about reputation which then trickles down to the company's reputation, and so within the ecosystem, we want to sort of crawl, walk, run. Let's do some projects-- >> So you're saying reputation in communities is the number one thing. >> I think so, people are not going to go, so you will always have the bleeding edge. Someone's going to go play with a tool, they're going to see if it works-- >> Wow, reputation's everything. >> Yeah. If it fails, they're going to tell, what is the saying, if something fails, if something bad happens you tell twelve people-- >> All right, so give them a compliment. What's BlueTalon do great for you guys? Explain their talent in the ecosystem. >> So BlueTalon's talent in the ecosystem, other than being just great people, we love Carey, is that they-- >> I'll get you to say something bad about him soon, but give him the compliment first. >> They have simplified the complexity of doing security, policy and role-based security for big data. So regardless of where your data lives, regardless of if it's Hadoop, Spark, Flink, Mongo, AWS, you define a policy once. And so if I am in front of the chief governance officer, my infrastructure doesn't have a value problem to them, but theirs does, right? The legal team, when we have to do proposals, this is what gets us through the legal and compliance for GDPR in this, it's that centralized control that is so critical to the capability we provide for big data. If you sprawl your data everywhere, and we know data sprawls everywhere-- >> So you can rely on them, these guys. >> Absolutely. >> All right, BlueData, give them a compliment, where do they fit? >> So they have solved the problem of deploying containers, big data environments, in any cloud. And the notion of ephemeral clusters for big data workloads is actually really, really hard to solve. We've seen a lot of organizations attempt to do this, we see frameworks out there, like Kupernetes, that people are trying to build on. These guys have fixed it. We have gone through the most rigorous security audits at the biggest banks in the world, and they have signed off because of the network segmentation and the data segmentation, it just works. >> I think I'm running a presidential debate, now you got to say something nice about him. No, I mean, Dell EMC we know what these guys do. But for you guys, I mean, how big is BlueTalon, company-wise? I mean, you guys are not small but you're not massive either. >> We're not small, but we're not massive, right. So, we're probably around 40 resources global, and so from our perspective, we're-- >> John: That's a great deal, working with a big gorilla in Dell EMC, they got a lot of market share, big muscle? >> Exactly, and so for us, like we talked about earlier, right, the big thing for us is ecosystem functions. We do what we do really well, right, we build software that does control unified access across multiple platforms as well as multiple distributions whether it be private cloud, on-prem, or public cloud, and for us, again, it's great that we have the software, it's great that we can do those things, but if we can't actually help customers use that software to deliver value, it's useless. >> Do you guys go to the market together, do you just hold hands in front of the customer, bundle products? >> No, we go to market together, so we actually, we work, a lot of our team in enablement is not enabling our customers, it is enabling Dell EMC on the use of our software and how to do that. So we actually work with Dell EMC to train and work-- >> So you're a tight partner. There's certification involved, close relationships, you're not mailing it in. >> And then we're also involved with the customer side as well, so it's not like we go, okay great, now it's sold, we throw up our hands and walk away. >> John: Well, they're counting on you that. >> They're counting on us for the specific pieces, but we're also working with Dell EMC so that we can get that breadth right in their reach, so that they can actually go confidently to their customers and actually understand where we fit and when we don't fit. Because we're not everything to everybody, right, and so they have to understand those pieces to be able to know when that works right and how the best practices are. And so again, we're 40 people, they're, I forget, there were 80,000 at one point? Maybe even more than that? But even in the services arm, there's several thousands of people in the-- >> What's the whole point of ecosystems you're getting at here? Point at the critical thing. You've got a big piece of the puzzle, it's not just they're bundling you in. You're an active part of that, and it's an integration world right, so he needs to rely on you to integrate with his systems. >> Yeah, we have to integrate with the other parts of the ecosystem too, so it really is a three-way integration on this perspective where they do what they do really well, we do what we do and they're complementary to each other, but without the services and the glue from Dell EMC-- >> So when you bring Dell EMC into the deals too? >> We do, so we bring Dell EMC into deals, and Dell EMC sells us through a reseller agreement with them so we actually help jointly either bring them to a deal we've already found, we'll bring services to them, or we'll actually go out and do joint development of customers. So we actually come out and help with the sales process and cycles to actually understand is there a fit or is there not a fit? So, it's not a one-size-fits-all, it's not just a, yes we got something on paper that we can sell you and we'll sell you every once in a while, it really is a way to develop an ecosystem to deliver value to the customer. >> All right, so let's talk about the customer mindset real quick. When you, are they, how far along on them, I really don't know much 'cause I'm really starting to probe in this area, how savvy are they to the partnership levels? I mean, you disclose it, you're transparent about it, but I mean, are customers getting that the partnering is very key? I mean, are they drilling, asking tough questions, are you kind of getting them educated one way, are they savvy about it? They may have been doing partners in house, but remember the enterprise had a generation of down-to-the-bone cutting, outsource everything, consolidation, and then you know, go back around 2010, the uplift on reinvestment hit, so we're kind of in this renaissance right now. So, thoughts? >> The partnership is actually the secret sauce that's part of our sales cycle. When we talk about big data outcomes and enabling self-service, customers assume oh, okay, you guys built some software, you've got some hardware, and then when we double-click into how we make this capable, we say oh, well we partner with BlueTalon and BlueData, and this other, and they go, wait a minute, that's not your software? No, no, we didn't build that. We have scoured the market and we've found partners that we work with and we trust, and all of a sudden you can see their shoulders relax and they realize that we're not just there to sell them more kit. We're actually there to help them solve their problems. And it is a game changer, because they deal with vendors every day. Software Vendor X, Software Vendor Y, Hardware Vendor Z, and so to have a company that they have good relationships with already bring more capabilities to them, the guard comes down and they say okay, let's talk about how we can make this work. >> All right, so let's get to the meat of the partnership, which I want to get to 'cause I think that's fundamental. Thanks for sharing perspective on the community piece. We're being on it, we've been doing, we're a community brand ourselves. We're not a close guard, we're not about restricting and censoring people at events, that's not what we're about. So you guys know that, so appreciate you commenting on the community there. The Elastic Data Platform you guys are talking about, it's a partnership deal. You provide an EPIC software, you guys providing some great security in there. What is it about, what's the benefit? So it's you're leading them to product, take a minute to explain the product and then the roles. >> Yeah, so the Elastic Data Platform is a capability, a set of capabilities that is meant to help our enterprise customers get to that next level of self-service. Data science as a service, and do that on any cloud with any tools in a security-controlled manner. That's what Elastic Data Platform is. And it's meant to plug in to the customer's existing investments and their existing tools and augment that, and through our services arm, we tie these technologies together using their open APIs, that's why that's so critical for us, and we bring that value back to our customers. >> And you guys are providing the EPIC software? What is EPIC software? I mean, I love epic software, that's an epic, I hope it's not an epic fail, so an epic name, but epic-- >> Elastic Private Instant Clusters, it's actually an acronym for what it stands for, that is what it provides for our customers. >> John: So you're saying that EPIC stands for-- >> Elastic Private Instant Clusters. So it can run in a private cloud environment on your on-prem infrastructure, but as I said before, it can run in a hybrid architecture on the public cloud as well. But yeah, I mean, we're working closely with the Dell EMC team, they're an investor, we work closely with their services organization, with their server organization, the storage organization, but they really are the glue that brings it all together. From services to software to hardware, and provides the complete solution to the customers. So, as I think Matt-- >> John: Multi-tenancy is a huge deal, multi-tenancy's a huge deal. >> Absolutely, yeah. Also the ability to have logical isolation between each of those different tenants for different data science teams, different analyst teams, you know, that's particularly at large financial services organizations like Barclays, you spoke yesterday, Matt alluded to earlier. They talked about the need to support a variety of different business units who each have their own unique use cases, whether it's batch processing with Hadoop or real-time streaming and fast data with Spark, Kafka, and NoSQL Database, or whether it's deep learning, machine learning. Each of those different tenants has different needs, and so you can spin up containers using our solution for each of those tenants. >> John: Yeah, that's been a big theme this week too, and so many little things, this one relates to this one, is the elastic nature of how people want to manage the provisioning of more resource. So, here's what we see. They're using collective intelligence, data, hey, they're data science guys, they figured it out! Whatever the usage is, they can do a virtual layer if you will, and then based upon the use they can then double down. So let the users drive this real collaborative, that seems to the a big theme, so this helps there. The other theme has been the centralized, this is the GDPR hanging over one's head, but the, even though that's more of threat and it's a gun to the head, it's the hammer or the guillotine, however you look at it, there's more of enablement around centralization, so it's not just the threat of that, it's other things that are benefiting. >> Right, it's more than just the threat of the GDPR and being compliant with those perspectives, right? The other big portion of this is, if you want to do, you do want to provide self-service. So the key to self-service is that's great, I can create an environment, but if it takes me a long time to get data to that environment to actually be able to utilize it or protect the data that's in that environment by having to rewrite policies from a different place, then you don't get the benefit right, the acceleration of the self-service. So having centralized policies of distributed enforcements gives you that elastic ability, right? Again, we can deploy the central engines again on-premises, but you can protect data that's in the cloud or protect data that's in a private cloud, so as companies move data for their different workloads, we can put the same protections with them and it goes immediately with them, so you don't have to manage it in multiple places. It's not like, oh, did I remember to put that rule over in this system? Oh, no I didn't, oh and guess what just happened to me? You know, I did get smacked with a big fine because I didn't, I wasn't compliant. So compliance-- >> How about Audit, too? I mean, are you checking the Audit side too? >> Yeah, so Audit's a great portion of that, and we do Audit for a couple of reasons. One is to make sure that you are compliant, but two is to make sure you actually have the right policies defined. Are people accessing the data the way you expect them to access that data? So that's another big portion of us and what we do from an audit perspective is that data usage lineage, and we actually tell you what the customer, what the user was trying to do. So if a customer's trying to access the data you see a large group trying to access a certain set of data but they're being denied access to it, now you can look and say, is that truly correct? Do I want them not being-- >> John: Well, Equifax, that thing was being phished out over months and months and months. Not just four, that thing has been phished over 10 times. In fact, state-sponsored actors were franchises of that organization. So, they were in the VPN, so it's not even, so you, so this is where the issues, okay, let's just say that happened again. You would have flagged it. >> We flag it. >> You would have seen the pattern access and said, okay, a lot of people cleaning us out. >> Yep, while it's happening. Right, so you get to see that usage, the lineage of the usage of the data, right, so you get to see that pattern as well. Not only who's trying to access, all right, 'cause protecting the perimeter is, as we all know, is no longer viable. So we actually get to watch the usage of the, the usage pattern so you can detect an anomaly in that type of system, as well as you can quickly change policies to shut down that gap, and then watch to see what happens, see who's continuing to try to hit it. >> Well, it's been a great conversation. Love that you guys are on and great to see the Elastic Data Platform come together through the partnerships, again. As you know, we're really passionate about highlighting and understanding more about the community dynamic as it becomes more than just socialization, it's a business model to the enterprise, as it was in open source. We'll be covering that. So I'd like to go around the panel here just to end this segment. Share something that someone might not know what's going on in industry that you want to point out, that's an observation, an anecdote that hasn't been covered, hasn't been serviced, it could be a haymaker, it could be something anecdotal, personal observation. In the big data world, BigData NYC this week or beyond, what should people know about that may or may not be covered out there that's happened that they should know about? >> Well, I think this one's, people pretty much should know about this one, right, but four or five years ago Hadoop was going to replace everything in the world. And two, three years ago the RDBMS's groups were like, Hadoop will never make it out of the science fair project. Right, we're in a world now where that's no longer true. It's somewhere in between. Hadoop is going to remain, and they're going to be continued, and the RDBMS is also going to continue. So you need to look at ecosystems that can actually allow you to cover both sides of that coin, which we're talking about here, is those types of tools are going to continue together forward. So you have to look at your entire ecosystem and move away from siloed functions to how you actually look at an entire data protection in data usage on environment. >> Matt? >> I would say that the technology adoption in the enterprise is outstripping the organization's ability to keep up with it. So as we deploy new technologies, tools, and techniques to do all sorts of really amazing things, we see the organization lagging in its ability to keep up. And so policies and procedures, operating models, whatever you want to call that, put it under the data governance umbrella, I suppose. If those don't keep up, you're going to end up with just an organization that is mismatched with the technology that is put into place, and ultimately you can end up in a massive compliance problem. Now, that's worst case. But even in best case, you're going to have a really inefficient use of your resources. My favorite question to ask organizations, so let's say you could put a timer on one of the data science sandboxes. So what happens when the timer goes off and the data science is not done? And you've got a line of people waiting for resources, what do you do? What is, how does the organization respond to that? It's a really simple question, but the answer's going to be very nuanced. So if that's the policy, that's the operating model stuff that we're talking about that we've got to think about when we enable self-service and self-security, those things have to come hand-in-hand. >> That's the operational thinking that needs to come through. >> Okay, Jason? >> Yeah, I think even for us, I mean this has been happening for some time now, but I think there still is this notion that the traditional way to deploy Hadoop and other big data workloads on prem is bare metal, and that's the way it's always been done. Or, you can run it in the cloud. But I think what we're seeing now, what we've seen evolve over the past couple of years is you can run your on-prem workloads using docker containers in a containerized environment. You can have this cloud-like experience on-prem but you can also provide the ability to be able to move those workloads, whether they're on-prem or in the cloud. So you can have this hybrid approach and multi-cloud approach. So I think that's fundamentally changing, it's a new dynamic, a new paradigm for big data, either on-prem or in the cloud. It doesn't have to be on bare metal anymore. And we get the same, we've been able to get-- >> It's on-prem, people want on-prem, that's where the action is, and cloud no doubt, but right now it's the transition. Hybrid cloud's definitely going to be there. I guess my observation is the tool shed problem. You know, I said earlier all day, you don't want to have a tool shed full of tools you don't use anymore or buy a hammer that wants to turn into a lawn mower 'cause the vendor changed, pivoted. You got to be careful what you buy, the tools, so don't think like a tool. Think like a platform. And I think having a platform mentality, understanding the system, or operating environment as you were getting to, I think really is a fundamental exercise that most decision makers think about. 'Cause again, your relationship with the Elastic Data Platform proves that this operating environment's evolving, it's not about the tool. The tool has to be enabled, and if the tool is enabled into the platform it should have a data model that falls into place, no one should have to think about it, you get the compliance, you get the docker container, so don't buy too many tools. If you do, make sure they're clean and in a clean tool shed! You got a lawnmower, I guess that's the platform. Bad analogy, but you know, I think tools has been the rage in this market, and now I think platforming it is something that we're seeing more of. So guys, thanks so much, appreciate it. Elastic Data Platform by Dell EMC, with the EPIC Platform from BlueData, and BlueTalon providing the data governance and compliance, great stuff, I'm certain the GDPR, BlueTalon, you guys got a bright future, congratulations. All right, more CUBE coverage after this short break, live from New York, it's theCUBE. (rippling music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media And before we get into the relationship that you guys have, the complication of having to deliver all those tools that's the DevOps ethos going mainstream. the ability to dockerize all of those environments, so you got to partner together. that it's going to be a seamless experience, but philosophy-wise you can't do the barney deals, It takes an executive, at that level, and so within the ecosystem, is the number one thing. so you will always have the bleeding edge. If it fails, they're going to tell, what is the saying, What's BlueTalon do great for you guys? but give him the compliment first. critical to the capability we provide for big data. and the data segmentation, it just works. I mean, you guys are not small and so from our perspective, we're-- Exactly, and so for us, like we talked about earlier, on the use of our software and how to do that. So you're a tight partner. we throw up our hands and walk away. and so they have to understand those pieces right, so he needs to rely on you the sales process and cycles to actually understand but I mean, are customers getting that the partnering and all of a sudden you can see their shoulders relax All right, so let's get to the meat of the partnership, Yeah, so the Elastic Data Platform is that is what it provides for our customers. and provides the complete solution to the customers. John: Multi-tenancy is a huge deal, and so you can spin up containers or the guillotine, however you look at it, So the key to self-service is and we actually tell you what the customer, so this is where the issues, You would have seen the pattern access and said, the usage pattern so you can detect an anomaly Love that you guys are on and great to see and the RDBMS is also going to continue. but the answer's going to be very nuanced. that needs to come through. and that's the way it's always been done. You got to be careful what you buy, the tools,

ENTITIES

Entity	Category	Confidence
Barclays	ORGANIZATION	0.99+
Jason	PERSON	0.99+
Matt	PERSON	0.99+
John	PERSON	0.99+
Jason Schroedl	PERSON	0.99+
BlueData	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
VMworld	ORGANIZATION	0.99+
BlueTalon	ORGANIZATION	0.99+
Matt Maccaux	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
New York	LOCATION	0.99+
80,000	QUANTITY	0.99+
Carey James	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
4 lines	QUANTITY	0.99+
Casey	PERSON	0.99+
40 people	QUANTITY	0.99+
two partners	QUANTITY	0.99+
two	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
half a dozen	QUANTITY	0.99+
One	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Deli EMC	ORGANIZATION	0.99+
four	QUANTITY	0.99+
twelve people	QUANTITY	0.99+
yesterday	DATE	0.99+
NoSQL	TITLE	0.99+
four	DATE	0.99+
both sides	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
billions	QUANTITY	0.99+
Spark	TITLE	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
over 10 times	QUANTITY	0.99+
Kafka	TITLE	0.98+
this week	DATE	0.98+
Mongo	ORGANIZATION	0.98+
2010	DATE	0.98+
three-way	QUANTITY	0.98+
Flink	ORGANIZATION	0.98+
a dozen	QUANTITY	0.98+
each	QUANTITY	0.98+
Strata Data	ORGANIZATION	0.97+
two separate companies	QUANTITY	0.97+

Murthy Mathiprakasam, Informatica | Big Data NYC 2017

>> Narrator: Live from midtown Manhattan, it's theCUBE. Covering BigData, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we're here live in New York City for theCUBE's coverage of BigData NYC, our event we've been running for five years, been covering BigData space for eight years, since 2010 when it was Hadoop World, Strata Conference, Strata Hadoop, Strata Data, soon to be called Strata AI, just a few. We've been theCUBE for all eight years. Here, live in New York, I'm John Furrier. Our next guest is Murthy Mathiprakasam, who is the Director of Product Marketing at Informatica. Cube alumni has been on many times, we cover Informatica World, every year. Great to see you, thanks for coming by and coming in. >> Great to see you. >> You guys do data, so there's not a lot of recycling going on in the data because we've been talking about it all week, total transformation, but the undercurrent has been a lot of AI, AI this, and you guys have the CLAIRE product, doing a lot of things there. But outside of the AI, the undertone is cloud, cloud, cloud. Governance, governance, governance. There's two kind of the drivers I'm seeing as the force of this week is, a lot of people trying to get their act together on those two fronts and you can kind of see the scabs on the industry, people, some people haven't been paying attention. And they're weak in the area. Cloud is absolutely going to be driving the BigData world, 'cause data is horizontal. Cloud's the power source that you guys have been on that. What's your thoughts, what other drivers encourage you? (mumbles) what I'm saying and what else did I miss? Security is obviously in there, but-- >> Absolutely, no, so I think you're exactly right on. So obviously governments security is a big deal. Largely being driven by the GDPR regulation, it's happening in Europe. But, I mean every company today is global, so. Everybody's essentially affected by it. So, I think data until now has always been a kind of opportunistic thing, that there's a couple guys and their organizations were looking at it as oh, let's do some experimentation. Let's do something interesting here. Now, it's becoming government managed so I think there's a lot of organizations who are, like, to your point, getting their act together, and that's driving a lot of demand for data management projects. So now, people say, well, if I got to get my act together, I don't have to hire armies of people to do it, let me look for automated machine learning based ways of doing it. So that they can actually deliver on their audit reports that they need to deliver on, and ensure the compliance that they need to ensure, but do it in a very scalable way. >> I've been kind of joking all week, and I kind of had this meme in my head, so I've been pounding on it all week, calling it the tool shed problem. The tool shed problem is, everyone's got these tools. They throw them into the tool shed. They bought a hammer and the company that sells them the hammer is trying to turn it to a lawnmower, right? You can't mow your lawn with a hammer, it's not going to work, and so this, these tools are great but it defines work. What you do, but, the platforming issue is a huge one. And you start to see people who took that view. You guys were one of them because in a platform centric view with tools that are enabled, to be highly productive. You don't have to worry about new things like a government's policy, the GDPR that might pop up, or the next Equifax that's around the corner. There's probably two or three of them going on right now. So, that's an impact, the data, who uses it, how it's used, and who's at fault or whatever. So, how does a company deal with that? And machine learning has proven to be a great horse that a lot of people are riding right now. You guys are doing it, how does a customer deal with that tsunami of potential threats? Architecture challenges, what is your solution, how do you talk about that? >> Well, I think machine learning, you know, up until now has been seen as the kind of, nice to have, and I think that very quickly, it's going to become a must have. Because, exactly like you're saying, it really is a tsunami. I mean, you could see people who are nervous about the fact that I mean, there's different estimates. It's like 40% growth in data assets from most organizations every year. So, you can try to get around this somehow with one of these (mumbles) tools or something. But at some point, something is going to break, either you just don't, run out of manpower, you can't train the manpower, people start leaving. whatever the operational challenges are, it just isn't going to scale. Machine learning is the only approach. It is absolutely the only approach that actually ensures that you can maintain data for these kind of defensive reasons like you're saying. The structure and compliance, but also the kind of offensive opportunistic reasons, and do it scalably, 'cause there's just no other way mathematically speaking, that when the data is growing 40% a year, just throwing a bunch of tools at it just doesn't work. >> Yeah, I would just amplify and look right in the camera, say, if you're not on machine learning, you're out of business. That's a straight up obvious trend, 'cause that's a precursor to AI, real AI. Alright, let's get down to data management, so when people throw around data management, it's like, oh yeah we've got some data management. There are challenges with that. You guys have been there from day one. But now if you take it out in the future, how do you guys provide the data management in a totally cloud world where now the customer certainly has public and private, or on premise but theirs might have multi cloud? So now, comes a land grab for the data layer, how do you guys play in that? >> Well, I think it's a great opportunity for these kind of middle work platforms that actually do span multiple clouds, that can span the internal environments. So, I'll give you an example. Yesterday we actually had a customer speaking at Astrada here, and he was talking about from him, the cloud is really just a natural extension of what they're already doing, because they already have a sophisticated data practice. This is a large financial services organization, and he's saying well now the data isn't all inside, some of it's outside, you've got partners, who've got data outside. How do we get to that data? Clearly, the cloud is the path for doing that. So, the fact that the cloud is a national extension a lot of organizations were already doing internally means they don't want to have a completely different approach to the data management. They want to have a consistent, simple, systematic repeatable approach to the data management that spans, as you said, on premise in the cloud. That's why I think the opportunity of a very mature and sophisticated platform because you're not rewriting and re-platforming for every new, is it AWS, is it Azure? Is it something on premise? You just want something that works, that shields you from the underlying infrastructure. >> So I put my skeptic hat on for a second and challenge you on this, because this I think is fundamental. Whether it's real or not, it's perceived, maybe in the back of the mind of the CXO or the CDO, whoever is enabled to make these big calls. If they have the keys to the kingdom in Informatica, I'm going to get locked in. So, this is a deep fear. People wake up with nightmares in the enterprise, they've seen locked in before. How do you explain that to a customer that you're going to be an enabling opportunity for them, not a lock in and foreclosing future benefits. Especially if I have an unknown scenario called multi-cloud. I mean, no one's really doing multi-cloud let's face it. I mean, I have multiple clouds with stuff on it, >> At least not intentionally. Sometimes you got a line of businesses and doing things, but absolutely I get it. >> No one's really moving workloads dynamically between clouds in real time. Maybe a few people doing some hacks, but for the most part of course, not a standard practice. >> Right. >> But they want it to be. >> Absolutely. >> So that's the future. From today, how do you preserve that position with the customer where you say hey we're going to add value, but we're not going to lock you in? >> So the whole premise again of, I mean, this goes back to classic three tier models of how you think about technology stacks, right? There's an infrastructure layer, there's a platform layer, there's an analytics layer and the whole premise of the middle of the layer, the platform layer, is that it enables flexibility in the other two layers. It's precisely when you don't have something that's kind of intermediating the data and the use of the data, that's when you run into challenges with flexibility and with data being locked in the data store. But you're absolutely right. We had dinner with a bunch of our customers last night. They were talking about they'd essentially evaluated every version of sort of BigData platform and data infrastructure platform right? And why? It was because they were a large organization and your different teams start stuff and they had to compute them out and stuff. And I was like that must have been pretty hard for you guys. Now what we were using Informatica, so it didn't really matter where the data was, we were still doing everything as far as the data management goes from a consistent layer and we integrate with all those different platforms. >> John: So you didn't get in the way? >> We didn't get in the way. >> You've actually facilitated. >> We are facilitating increased flexibility. Because without a layer like that, a fabric, or whatever you want to call it a data platform that's facilitating this the complexity's going to get very, very crazy very soon. If it hasn't already. The number of infrastructure platforms that are available like you said, on premise and on the cloud now, keeps growing. The number of analytical tools that are available is also growing. And all this is amazing innovation by the way. This is all great stuff, but to your point about it if your the chief officer of an organization going, I got to get this thing figured out somehow. I need some sanity, that's really the purpose of-- >> They just don't want to know the tool for tool's sake, they need to have it be purposeful. >> And that's why this machine learning aspect is very, very critical because I was thinking about an analogy just like you were and I was thinking, in a way you can think of data managing as sort of cleaning stuff up and there are people that have brooms and mops and all these different tools. Well, we are bringing a Roomba to market, right? Because you don't want to just create tools that transfer the laborer around, which is a little bit of what's going on. You want to actually get the laborer out of the equation, so that the people are focused on the context, business strategy and the data management is sort of cleaning itself. It's doing the work for you. That's really what Informatica's vision is. It's about being a kind of enterprise cloud data management vendor that is leveraging AI under the hood so that you can sort of set it and forget it. A lot of this ingestion and the cleansing, telling annals what data they should be looking for. All the stuff is just happening in an automated way and you're not in this total chaos. >> And that can be some tools will be sitting in the back for a long time. In my tool shed, when I had one back in a big enough property back east. No one has tool sheds by the way. No one does any gardening. The issue is in the day, I need to have a reliable partner. So I want you to take a minute and explain to the folks who aren't yet Informatica customers why they should be and the Informatica customers why they should stay with Informatica. >> Absolutely, so certainly the ones we have, a very loyal customer base. In fact the guy who was presenting with us yesterday, he said he's been with Informatica since 1999, going through various versions of our products and adopting new innovations. So we have a very loyal customer base, so I think that loyalty itself speaks for itself as well. As far as net new customers, I think that in a world of this increasing data complexity, it's exactly what you were saying, you need to find an approach that is going to scale. I keep hearing this word from the chief data officer, I kind of got something some going on today, I don't know how I scale it. How is this going to work in 2018 and 2019, in 2025? And it's just daunting for some of these guys. Especially going back to your point about compliance, right? So it's one thing if you have data sitting around, data so to speak, that you're not using it. But god forbid now, you got legal and regulatory concerns around it as well. So you have to get your arms around the data and that's precisely where Informatica can help because we've actually thought through these problems and we've talked about them. >> Most of them were a problem you solved because at the end of the day, we were talking about problems that have massive importance, big time consequences people can actually quantify. >> That's right. >> So what specific problem highest level do you solve is the most important, has the most consequences? >> Everything from ingestion of raw data sets from wherever like you said, in the cloud on premise, all the way through all the processes you need to make it fully usable. And we view that as one problem. There's other vendors who think that one aspect of that is a problem and it is worth solving. We really think, look at the end of the day, you got raw stuff and you have to turn it into useful stuff. Everything in there has to happen, so we might as well just give you everything and be very, very good at doing all those things. And so that's what we call enterprise cloud data management. It's everything from raw material to finished goods of insights. We want to be able to provide that in a consistent integrated and machine learning integrate it. >> Well you guys have a loyal customer base but to be fair and you kind of have to acknowledge that there is a point in time and not throw Informatica's away the big customers, big engagements. But there was a time in Informatica's history where you went private. There was some new management came in. There was a moment where the boat was taking on water, right? And you could almost look at it and say, hmm, you know, we're in this space. You guys retooled around that. Success to the team. Took it to another dimension. So that's the key thing. You know a lot of the companies become big and it's hard to get rid of. So the question is that's a statement. I think you guys done a great job. Yet, the boat might have taken on water, that's my opinion, but you can probably debate that. But I think as you get mature and you're in public, you just went private. But here's the thing, you guys have had a good product chop in Informatica, so I got to ask you the question. What cool things are you doing? Because remember, cool shiny new toys help put a little flash and glam on the nuts and bolts that scales. What are you guys doing? I know you just announced claire, some AI stuff. What's the hot stuff you're doing that's adding value? >> Yeah, absolutely, first of all, this kind of addresses your water comment as well. So we are probably one of the few vendors that spends almost about $200 million in R and D. And that hasn't changed through the acquisition. If anything, I think it actually increased a little bit because now our investors are even more committed to innovation. >> Well you're more nimble in private. A lot more nimble. >> Absolutely, a lot more ideas that are coming to the forefront. So there's never been any water just to be clear. But to answer your follow on question about some examples of this innovation. So I think Ahmed yesterday talked about some of our recent release as well but we really just keep pushing on this idea of, I know I keep saying this but it's this whole machine learning approach here of how can we learn more about the data? So one of the features, I'll give you an example, is if we can actually go look at a file and if we spot like a name and an address and some order information, that probably is a customer, right? And we know that right, because we've seen past data sets. So, there's examples of this pattern matching where you don't even have to have data that's filled out. And this is increasingly the way the data looks we are not dealing with relational tables anymore it's JSON files, it's web blogs, XML files, all of that data that you had to have that data scientists go through and parse and sift through, we just automatically recognize it now. If we can look for the data and understand it, we can match it. >> Put that in context in the order of benefits that, from the old way versus the current way, what's the pain levels? One versus the other, can you put context around that? In terms of, it's pretty significant. >> It's huge because again, back to this sort of volume and variety of data that people are trying to get into systems and do it very rapidly. I'll give you a really tangible customer case. So, this is a customer that presented at Informatica World a couple months ago. It's Jewelry TV, I can actually tell you the name. So there are one of these online kind of shopping sites and they've got a TV program that goes with the online site. So what they do is obviously when you promote something on TV, your orders go up online, right? They wanted to flip it around and they said, look, let's look at the web logs of the traffic that's on the website and then go promote that on the TV program. Because then you get a closed loop and start to have this explosion of sales. So they used Informatica, didn't have to do any of this hand coding. They just build this very quickly and with the graphical user interface that we provide, it leverages sparks streaming under the hood. So they are using all these technologies under the hood, they just didn't have to do any of the manual coding. Got this thing out in a couple days and it works. And they have been able to measure it and they're actually driving increased sales by taking the data and just getting it out to the people that need to see the data very, very quickly. So that's an example of a use case where this isn't just to your point about is this a small, incremental type of thing. No, there is a lot of money behind data if you can actually put it to good use. >> The consequences are grave and I think you've seen more and more, I mean the hacks just amplify it over and over again. It's not a cost center when you think about it. It has to be somehow configured differently as a profit center, even though it might not drive top line revenue directly like an app or anything else. It's not a cost center. If anything it will be treated as a profit center because you get hacked or someone's data is misused, you can be out of business. There is no profit. Look at the results of these hacks. >> The defensive argument is going to become very, very strong as these regulations come out. But, let's be clear, we work with a lot of the most advanced customers. There are people making money off of this. It can be a top line driver-- >> No it should be, it should be. That's exactly the mindset. So the final question for you before we break. I know we're out of time here. There are some chief data officers that are enabled, some aren't and that's just my observation. I don't want to pidgeonhole anyone, but some are enable to really drive change, some are just figureheads that are just managing the compliance risk and work for the CFO and say no to everything. I'm over-generalizing. But that's essentially how I see it. What's the problem with that? Because the cost center issue has, we've seen this moving before in the security business. Security should not be part of IT. That's it's own deal. >> Exactly. >> So we're kind of, this is kind of smoke, but we're coming out of the jungle here. Your thoughts on that. >> Yeah, you're absolutely right. We see a variety of models. We can see the evolution of those models and it's also very contextual to different industries. There are industries that are inherently more regulated, so that's why you're seeing the data people maybe more in those cost center areas that are focused on regulations and things like that. There's other industries that are a lot more consumer oriented. So for them, it makes more sense to have the data people be in a department that seems more revenue basing. So it's not entirely random. There are some reasons, that's not to say that's not the right model moving forward, but someday, you never know. There is a reason why this role became a CXO in the first place. Maybe it is somebody who reports to the CEO and they really view the data department as a strategic function. And it might take a while to get there, but I don't think it's going to take a long time. Again, we're talking about 40% growth in the data and these guys are realizing that now and I think we're going to see very quickly people moving out of the whole tool shed model, and moving to very systematic, repeatable practices. Sophisticated middleware platforms and-- >> As we say don't be a tool, be a platform. Murphy thanks so much for coming on to theCUBE, we really appreciate it. What's going on in Informatica real quick. Things good? >> Things are great. >> Good, awesome. Live from New York, this is theCUBE here at BigData NYC more live coverage continuing day three after this short break. (digital music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media soon to be called Strata AI, just a few. Cloud's the power source that you guys have been on that. the compliance that they need to ensure, And you start to see people who took that view. that you can maintain data for these kind So now, comes a land grab for the data layer, that shields you from the underlying infrastructure. So I put my skeptic hat on for a second and challenge you Sometimes you got a line of businesses and doing things, but for the most part of course, not a standard practice. So that's the future. is that it enables flexibility in the other two layers. the complexity's going to get very, very crazy very soon. they need to have it be purposeful. so that you can sort of set it and forget it. The issue is in the day, I need to have a reliable partner. So you have to get your arms around the data because at the end of the day, we were talking about all the processes you need to make it fully usable. But here's the thing, you guys have had a good product So we are probably one of the few vendors that spends almost Well you're more nimble in private. So one of the features, I'll give you an example, of benefits that, from the old way versus the current way, So what they do is obviously when you promote something on It's not a cost center when you think about it. of the most advanced customers. So the final question for you before we break. So we're kind of, this is kind of smoke, So for them, it makes more sense to have the data people Murphy thanks so much for coming on to theCUBE, Live from New York, this is theCUBE here at BigData NYC

ENTITIES

Entity	Category	Confidence
Informatica	ORGANIZATION	0.99+
John	PERSON	0.99+
Murthy Mathiprakasam	PERSON	0.99+
2018	DATE	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
Europe	LOCATION	0.99+
Astrada	ORGANIZATION	0.99+
2025	DATE	0.99+
New York	LOCATION	0.99+
yesterday	DATE	0.99+
five years	QUANTITY	0.99+
2019	DATE	0.99+
three	QUANTITY	0.99+
New York City	LOCATION	0.99+
Murphy	PERSON	0.99+
eight years	QUANTITY	0.99+
two layers	QUANTITY	0.99+
one	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
first	QUANTITY	0.99+
today	DATE	0.99+
two fronts	QUANTITY	0.99+
1999	DATE	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one problem	QUANTITY	0.99+
last night	DATE	0.98+
Ahmed	PERSON	0.98+
Yesterday	DATE	0.98+
2010	DATE	0.98+
one thing	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
NYC	LOCATION	0.98+
40% a year	QUANTITY	0.97+
Hadoop World	EVENT	0.97+
Equifax	ORGANIZATION	0.97+
day three	QUANTITY	0.96+
Strata Hadoop	EVENT	0.95+
Informatica World	ORGANIZATION	0.95+
two kind	QUANTITY	0.95+
2017	DATE	0.95+
about $200 million	QUANTITY	0.94+
one aspect	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
Informatica World	EVENT	0.91+
this week	DATE	0.9+
40% growth	QUANTITY	0.88+
BigData	ORGANIZATION	0.87+
three tier	QUANTITY	0.87+
day one	QUANTITY	0.87+
Strata Data	EVENT	0.85+
CXO	TITLE	0.85+
Cube	ORGANIZATION	0.84+
midtown Manhattan	LOCATION	0.83+
about 40% growth	QUANTITY	0.8+
couple months ago	DATE	0.8+
Strata AI	EVENT	0.79+
couple guys	QUANTITY	0.76+
claire	PERSON	0.71+
lot of money	QUANTITY	0.67+
One	QUANTITY	0.66+
BigData	TITLE	0.64+
CDO	TITLE	0.63+
couple days	QUANTITY	0.63+
JSON	TITLE	0.62+

Santhosh Mahendiran, Standard Chartered Bank | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat techno music) >> Okay welcome back, we're live here in New York City. It's theCUBE's presentation of Big Data NYC, our fifth year doing this event in conjunction with Strata Data, formerly Strata Hadoop, formerly Strata Conference, formerly Hadoop World, we've been there from the beginning. Eight years covering Hadoop's ecosystem now Big Data. This is theCUBE, I'm John Furrier. Our next guest is Santhosh Mahendiran, who is the global head of technology analytics at Standard Chartered Bank. A practitioner in the field, here getting the data, checking out the scene, giving a presentation on your journey with Data at a bank, which is big financial obviously an adopter. Welcome to theCUBE. >> Thank you very much. >> So we always want to know what the practitioners are doing because at the end of the day there's a lot of vendors selling stuff here, so you got, everyone's got their story. End of the day you got to implement. >> That's right. >> And one of the themes is the data democratization which sounds warm and fuzzy, collaborating with data, this is all good stuff and you feel good and you move into the future, but at the end of the day it's got to have business value. >> That's right. >> And as you look at that, how do you look at the business value? Cause you want to be in the bleeding edge, you want to provide value and get that edge operationally. >> That's right. >> Where's the value in data democratization? How did you guys roll this out? Share your story. >> Okay, so let me start with the journey first before I come to the value part of it, right? So, data democratization is an outcome, but the journey has been something we started three years back. So what did we do, right? So we had some guiding principles to start our journey. The first was to say that we believed in the three S's, which is speed, scale, and it should be really, really flexible and super fast. So one of the challenges that we had was our historical data warehouses was entirely becoming redundant. And why was it? Because it was RDBMS centric, and it was extremely disparate. So we weren't able to scale up to meet the demands of managing huge chunks of data. So, the first step that we did was to re-pivot it to say that okay, let's embrace Hadoop. And what you mean by embracing is just not putting in the data lake, but we said that all our data will land into the data lake. And this journey started in 2015, so we have close to 80% of the Bank's data in the lake and it is end of day data right now and this data flows in on daily basis, and we have consumers who feed off that data. Now coming to your question about-- >> So the data lake's working? >> The data lake is working, up and running. >> People like it, you just got a good spot, batch 'em all you throw everything in the lake. >> So it is not real time, it is end of day. There is some data that is real-time, but the data lake is not entirely real-time, that I have to tell you. But one part is that the data lake is working. Second part to your question is how do I actually monetize it? Are you getting some value out of it? But I think that's where tools like Paxata has actually enabled us to accelerate this journey. So we call it data democratization. So the best part it's not about having the data. We want the business users to actually use the data. Typically, data has always been either delayed or denied in most of the cases to end-users and we have end-users waiting for the data but they don't get access to the data. It was done because primarily the size of the data was too huge and it wasn't flexible enough to be shared with. So how did tools like Paxata and the data lake help us? So what we did with data democratization is basically to say that "hey we'll get end-users to access the data first in a fast manner, in a self-service manner, and something that gives operational assurance to the data, so you don't hold the data and then say that you're going to get a subset of data to play with. We'll give you the entire set of data and we'll give you the right tools which you can play with. Most importantly, from an IT perspective, we'll be able to govern it. So that's the key about democratization. It's not about just giving them a tool, giving them all data and then say "go figure it out." It's about ensuring that "okay, you've got the tools, you've got the data, but we'll also govern it," so that you obviously have control over what they're doing. >> So now you govern it, they don't have to get involved in the governance, they just have access? >> No they don't need to. Yeah, they have access. So governance works both ways. We establish the boundaries. Look at it as a referee, and then say that "okay, there are guidelines that you don't," and within the datasets that key people have access to, you can further set rules. Now, coming back to specific use cases, I can talk about two specific cases which actually helped us to move the needle. The first is on stress testing, so being a financial institution, we typically have to report various numbers to our regulators, etc. The turnaround time was extremely huge. These kind of stress testing typically involve taking huge amount-- >> What were some of the turnaround times? >> Normally it was two to three weeks, some cases a month-- >> Wow. >> So we were able to narrow it down to days, but what we essentially did was as with any stress testing or reporting, it involved taking huge amounts of data, crunching them and then running some models and then showing the output, basically a number of transformations involved. Earlier, you first couldn't access the entire dataset, so that we solved-- >> So check, that was a good step one-- >> That was step one. >> But was there automation involved in that, the Paxata piece? >> Yeah, I wouldn't say it was fully automated end-to-end, but there was definitely automation given the fact that now you got Paxata to work off the data rather than someone extracting the data and then going off and figuring what needs to be done. The ability to work off the entire dataset was a big plus. So stress testing, bringing down the cycle time. The second one use case I can talk about is again anti-money laundering, and in our financial crime compliance space. We had processes that took time to report, given the clunkiness in the various handoffs that we needed to do. But again, empowering the users, giving the tool to them and then saying "hey, this"-- >> How about know your user, because we have to anti-money launder, you need to have to know your user base, that's all set their too? >> Yeah. So the good part is know the user, know your customer, KYCs all that part is set, but the key part is making sure the end-users are able to access the data much more earlier in the life cycle and are able to play with it. In the case of anti-money laundering, again first question of three weeks to four weeks was shortened down to question of days by giving tools like Paxata again in a structured manner and with which we're able to govern. >> You control this, so you knew what you were doing, but you let their tools do the job? >> Correct, so look at it this way. Typically, the data journey has always been IT-led. It has never been business-led. If you look at the generations of what happens is, you source the data which is IT-led, then you model the data which is IT-led, then you prepare then massage the data which is again IT-led and then you have tools on top of it which is again IT-led so the end-users get it only after the fourth stage. Now look at the generations within. All these life cycles apart from the fact that you source the data which is typically an IT issue, the rest need to be done by the actual business users and that's what we did. That's the progression of the generations in which we now we're in the third generation as I call it where our role is just to source the data and then say, "yeah we'll govern it in the matter and then preparation-- >> It's really an operating system and we were talking with Aaron with Elation's co-founder, we used the analogy of a car, how this show was like a car show engine show, what's in the engine and the technology and then it evolved every year, now it's like we're talking about the cars, now we're talking about driver experience-- >> That's right. >> At the end of the day, you just want to drive. You don't really care what's under the hood, you do but you don't, but there's those people who do care what's under the hood, so you can have best of both worlds. You've got the engines, you set up the infrastructure, but ultimately, you in the business side, you just want to drive, that's what's you're getting at? >> That's right. The time-to-market and speed to empower the users to play around with the data rather than IT trying to churn the data and confine access to data, that's a thing of the past. So we want more users to have faster access to data but at the same time govern it in a seamless manner. The word governance is still important because it's not about just give the data. >> And seamless is key. >> Seamless is key. >> Cause if you have democratization of data, you're implying that it is community-oriented, means that it's available, with access privileges all transparently or abstracted away from the users. >> Absolutely. >> So here's the question I want to ask you. There's been talk, I've been saying it for years going back to 2012 that an abstraction layer, a data layer will evolve and that'll be the real key. And then here in this show, I heard things like intelligent information fabric that is business, consumer-friendly. Okay, it's a mouthful, but intelligent information fabric in essence talks about an abstraction layer-- >> That's right. >> That doesn't really compromise anything but gives some enablement, creates some enabling value-- >> That's right. >> For software, how do you see that? >> As the word suggests, the earlier model was trying to build something for the end-users, but not which was end-user friendly, meaning to say, let me just give you a simple example. You had a data model that existed. Historically the way that we have approached using data is to say "hey, I've got a model and then let's fit that data into this model," without actually saying that "does this model actually serve the purpose?" You abstracted the model to a higher level. The whole point about intelligent data is about saying that, I'll give you a very simple analogy. Take zip code. Zipcode in US is very different from zipcode in India, it's very different from zipcode in Singapore. So if I had the ability for my data to come in, to say that "I know it's a zipcode, but this zipcode belongs to US, this zipcode belongs to Singapore, and this zipcode belongs to India," and more importantly, if I can further rev it up a notch, if I say that "this belongs to India, and this zipcode is valid." Look at where I'm going with intelligent sense. So that's what's up. If you look at the earlier model, you have to say that "yeah, this is a placeholder for zipcode." Now that makes sense, but what are you doing with it? >> Being a relational database model, it's just a field in a schema, you're taking it and abstracting it and creating value out of it. >> Precisely. So what I'm actually doing is accelerating the adoption, I'm making it more simpler for users to understand what the data is. So I don't need to as a user figure out "I got a zipcode, now is it a Singapore, India or what zipcode." >> So all this automation, Paxata's got a good system, we'll come back to the Paxata question in a second, I do want to drill down on that. But the big thing that I've been seeing at the show, and again Dave Alonte, my partner, co-CEO of Silicon Angle, we always talk about this all the time. He's more less bullish on Hadoop than I am. Although I love Hadoop, I think it's great but it's not the end-all, be-all. It's a great use case. We were critical early on and the thing we were critical on it was it was too much time being spent on the engine and how things are built, not on the business value. So there's like a lull period in the business where it was just too costly-- >> That's right. >> Total cost of ownership was a huge, huge problem. >> That's right. >> So now today, how did you deal with that and are you measuring the TCO or total cost of ownership cause at the end of the day, time to value, which is can you be up and running in 90 days with value and can you continue to do that, and then what's the overall cost to get there. Thoughts? >> So look I think TCO always underpins any technology investment. If someone said I'm doing a technology investment without thinking about TCO, I don't think he's a good technology leader, so TCO is obviously a driving factor. But TCO has multiple components. One is the TCO of the solution. The other aspect is TCO of what my value I'm going to get out of this system. So talking from an implementation perspective, what I look at as TCO is my whole ecosystem which is my hardware, software, so you spoke about Hadoop, you spoke about RDBMS, is Hadoop cheaper, etc? I don't want to get into that debate of cheaper or not but what I know is the ecosystem is becoming much, much more cheaper than before. And when I talk about ecosystem, I'm talking about RDBMS tools, I'm talking about Hadoop, I'm talking about BI tools, I'm talking about governance, I'm talking about this whole framework becoming cheaper. And it is also underpinned by the fact that hardware is also becoming cheaper. So the reality is all components in the whole ecosystem are becoming cheaper and given the fact that software is also becoming more open-sourced and people are open to using open-source software, I think the whole question of TCO becomes a much more pertinent question. Now coming to your point, do you measure it regularly? I think the honest answer is I don't think we are doing a good job of measuring it that well, but we do have that as one of the criteria for us to actually measure the success of our project. The way that we do is our implementation cost, at the time of writing out our PETs, we call it PETs, which is the Project Execution Document, we talk about cost. We say that "what's the implementation cost?" What are the business cases that are going to be an outcome of this? I'll give you an example of our anti-money laundering. I told you we reduced our cycle time from few weeks to a few days, and that in turn means the number of people involved in this whole process, you're reducing the overheads and the operational folks involved in it. That itself tells you how much we're able to save. So definitely, TCO is there and to say that-- >> And you are mindful of, it's what you look at, it's key. TCO is on your radar 100% you evaluate that into your deals? >> Yes, we do. >> So Paxata, what's so great about Paxata? Obviously you've had success with them. You're a customer, what's the deal. Was it the tech, was it the automation, the team? What was the key thing that got you engaged with them or specifically why Paxata? >> Look, I think the key to partnership there cannot be one ingredient that makes a partnership successful, I think there are multiple ingredients that make a partnership successful. We were one of the earliest adopters of Paxata. Given that we're a bank and we have multiple different systems and we have lot of manual processing involved, we saw Paxata as a good fit to govern these processes and ensure at the same time, users don't lose their experience. The good thing about Paxata that we like was obviously the simplicity and the look and feel of the tool. That's number one. Simplicity was a big point. The second one is about scale. The scale, the fact that it can take in millions of roles, it's not about just working off a sample of data. It can work on the entire dataset. That's very key for us. The third is to leverage our ecosystem, so it's not about saying "okay you give me this data, let me go figure out what to do and then," so Paxata works off the data lake. The fact that it can leverage the lake that we built, the fact that it's a simple and self-preparation tool which doesn't require a lot of time to bootstrap, so end-use people like you-- >> So it makes it usable. >> It's extremely user-friendly and usable in a very short period of time. >> And that helped with the journey? >> That really helped with the journey. >> Santosh, thanks so much for sharing. Santosh Mahendiran, who is the Global Tech Lead at the Analytics of the Bank at Standard Chartered Bank. Again, financial services, always a great early adopter, and you get success under your belt, congratulations. Data democratization is huge and again, it's an ecosystem, you got all that anti-money laundering to figure out, you got to get those reports out, lot of heavylifting? >> That's right, >> So thanks so much for sharing your story. >> Thank you very much. >> We'll give you more coverage after this short break, I'm John Furrier, stay tuned. More live coverage in New York City, its theCube.

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media here getting the data, checking out the scene, End of the day you got to implement. but at the end of the day it's got to have business value. how do you look at the business value? Where's the value in data democratization? So one of the challenges that we had was People like it, you just got a good spot, in most of the cases to end-users and we have end-users guidelines that you don't," and within the datasets that Earlier, you first couldn't access the entire dataset, So stress testing, bringing down the cycle time. So the good part is know the user, know your customer, That's the progression of the generations in which we At the end of the day, you just want to drive. but at the same time govern it in a seamless manner. Cause if you have democratization of data, So here's the question I want to ask you. So if I had the ability for my data to come in, and creating value out of it. So I don't need to as a user figure out "I got a zipcode, But the big thing that I've been seeing at the show, at the end of the day, time to value, which is can you be So the reality is all components in the whole ecosystem And you are mindful of, it's what you look at, it's key. Was it the tech, was it the automation, the team? The fact that it can leverage the lake that we built, It's extremely user-friendly and usable in a very at the Analytics of the Bank at Standard Chartered Bank. We'll give you more coverage after this short break,

ENTITIES

Entity	Category	Confidence
Dave Alonte	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
2012	DATE	0.99+
2015	DATE	0.99+
Santosh Mahendiran	PERSON	0.99+
two	QUANTITY	0.99+
Aaron	PERSON	0.99+
US	LOCATION	0.99+
Santhosh Mahendiran	PERSON	0.99+
Singapore	LOCATION	0.99+
Santosh	PERSON	0.99+
four weeks	QUANTITY	0.99+
TCO	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
90 days	QUANTITY	0.99+
India	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
fifth year	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
one ingredient	QUANTITY	0.99+
third	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
millions	QUANTITY	0.99+
first	QUANTITY	0.99+
Eight years	QUANTITY	0.99+
Silicon Angle	ORGANIZATION	0.99+
Second part	QUANTITY	0.98+
third generation	QUANTITY	0.98+
fourth stage	QUANTITY	0.98+
two specific cases	QUANTITY	0.98+
both ways	QUANTITY	0.98+
one	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
both worlds	QUANTITY	0.98+
first step	QUANTITY	0.97+
three years back	DATE	0.97+
second one	QUANTITY	0.97+
One	QUANTITY	0.97+
2017	DATE	0.96+
Hadoop	TITLE	0.96+
Strata Data	ORGANIZATION	0.96+
Strata Hadoop	ORGANIZATION	0.94+
step one	QUANTITY	0.94+
first question	QUANTITY	0.93+
a month	QUANTITY	0.92+
Elation	ORGANIZATION	0.9+
Data	EVENT	0.89+
2017	EVENT	0.89+
80%	QUANTITY	0.88+
Paxata	TITLE	0.88+
Big Data	EVENT	0.84+
theCube	ORGANIZATION	0.83+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Nenshad Bardoliwalla & Pranav Rastogi | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> OK, welcome back everyone we're here in New York City it's theCUBE's exclusive coverage of Big Data NYC, in conjunction with Strata Data going on right around the corner. It's out third day talking to all the influencers, CEO's, entrepreneurs, people making it happen in the Big Data world. I'm John Furrier co-host of theCUBE, with my co-host here Jim Kobielus who is the Lead Analyst at Wikibon Big Data. Nenshad Bardoliwalla. >> Bar-do-li-walla. >> Bardo. >> Nenshad Bardoliwalla. >> That guy. >> Okay, done. Of Paxata, Co-Founder & Chief Product Officer it's a tongue twister, third day, being from Jersey, it's hard with our accent, but thanks for being patient with me. >> Happy to be here. >> Pranav Rastogi, Product Manager, Microsoft Azure. Guys, welcome back to theCUBE, good to see you. I apologize for that, third day blues here. So Paxata, we had your partner on Prakash. >> Prakash. >> Prakash. Really a success story, you guys have done really well launching theCUBE fun to watch you guys from launching to the success. Obviously your relationship with Microsoft super important. Talk about the relationship because I think this is really people can start connecting the dots. >> Sure, maybe I'll start and I'LL be happy to get Pranav's point of view as well. Obviously Microsoft is one of the leading brands in the world and there are many aspects of the way that Microsoft has thought about their product development journey that have really been critical to the way that we have thought about Paxata as well. If you look at the number one tool that's used by analysts the world over it's Microsoft Excel. Right, there isn't even anything that's a close second. And if you look at the the evolution of what Microsoft has done in many layers of the stack, whether it's the end user computing paradigm that Excel provides to the world. Whether it's all of their recent innovation in both hybrid cloud technologies as well as the big data technologies that Pranav is part of managing. We just see a very strong synergy between trying to combine the usage by business consumers of being able to take advantage of these big data technologies in a hybrid cloud environment. So there's a very natural resonance between the 2 companies. We're very privileged to have Microsoft Ventures as an investor in Paxata and so the opportunity for us to work with one of the great brands of all time in our industry was really a privilege for us. Yeah, and that's the corporate sides so that wasn't actually part of it. So it's a different part of Microsoft which is great. You have also business opportunity with them. >> Nenshad : We do. >> Obviously data science problem that we're seeing is that they need to get the data faster. All that prep work, seems to be the big issue. >> It does and maybe we can get Pranav's point of view from the Microsoft angle. >> Yeah so to sort of continue what Nenshad was saying, you know the data prep in general is sort of a key core competence which is problematic for lots of users, especially around the knowledge that you need to have in terms of the different tools you can use. Folks who are very proficient will do ETL or data preparation like scenarios using one of the computing engines like Hive or Spark. That's good, but there's this big audience out there who like Excel-like interface, which is easy to use a very visually rich graphical interface where you can drag and drop and can click through. And the idea behind all of this is how quickly can I get insights from my data faster. Because in a big data space, it's volume, variety and velocity. So data is coming at a very fast rate. It's changing it's growing. And if you spend lot of time just doing data prep you're losing the value of data, or the value of data would change over time. So what we're trying to do would sort of enabling Paxata or HDInsight is enabling these users to use Paxata, get insights from data faster by solving key problems of doing data prep. >> So data democracy is a term that we've been kicking around, you guys have been talking about as well. What is actually mean, because we've been teasing out first two days here at theCUBE and BigData NYC is. It's clear the community aspect of data is growing, almost on a similar path as you're seeing with open source software. That genie's out the bottle. Open source software, tier one, it won, it's only growing exponentially. That same paradigm is moving into the data world where the collaboration is super important, in this data democracy, what is that actually mean and how does that relate to you guys? >> So the perspective we have is that first something that one of our customers said, that is there is no democracy without certain degrees of governance. We all live in a in a democracy. And yet we still have rules that we have to abide by. There are still policies that society needs to follow in order for us to be successful citizens. So when when a lot of folks hear the term democracy they really think of the wild wild west, you know. And a lot of the analytic work in the enterprise does have that flavor to it, right, people download stuff to their desktop, they do a little bit of massaging of the data. They email that to their friend, their friend then makes some changes and next thing you know we have what what some folks affectionately call spread mart hell. But if you really want to democratize the technology you have to wrap not only the user experience, like Pranav described, into something that's consumable by a very large number of folks in the enterprise. You have to wrap that with the governance and collaboration capabilities so that multiple people can work off the same data set. That you can apply the permissions so that people, who is allowed to share with each other and under what circumstances are they allowed to share. Under what circumstances are you allowed to promote data from one environment to another? It may be okay for someone like me to work in a sandbox but I cannot push that to a database or HDFS or Azure BLOB storage unless I actually have the right permissions to do so. So I think what you're seeing is that, in general, technology is becoming a, always goes on this trend, towards democratization. Whether it's the phone, whether it's the television, whether it's the personal computer and the same thing is happening with data technologies and certainly companies like. >> Well, Pranav, we're talking about this when you were on theCUBE yesterday. And I want to get your thoughts on this. The old way to solve the governance problem was to put data in silos. That was easy, I'll just put it in a silo and take care of it and access control was different. But now the value of the data is about cross-pollinating and make it freely available, horizontally scalable, so that it can be used. But the same time and you need to have a new governance paradigm. So, you've got to democratize the data by making it available, addressable and use for apps. The same time there's also the concerns on how do you make sure it doesn't get in the wrong hands and so on and so forth. >> Yeah and which is also very sort of common regarding open source projects in the cloud is a how do you ensure that the user authorized to access this open source project or run it has the right credentials is authorized and stuff. So, the benefit that you sort of get in the cloud is there's a centralized authentication system. There's Azure Active Directory, so you know most enterprise would have Active Directory users. Who are then authorized to either access maybe this cluster, or maybe this workload and they can run this job and that sort of further that goes down to the data layer as well. Where we have active policies which then describe what user can access what files and what folders. So if you think about the entrance scenario there is authentication and authorization happening and for the entire system when what user can access what data. And part of what Paxata brings in the picture is like how do you visualize this governance flow as data is coming from various sources, how do you make sure that the person who has access to data does have access data, and the one who doesn't cannot access data. >> Is that the problem with data prep is just that piece of it? What is the big problem with data prep, I mean, that seems to be, everyone keeps coming back to the same problem. What is causing all this data prep. >> People not buying Paxata it's very simple. >> That's a good one. Check out Paxata they're going to solve your problems go. But seriously, there seems to be the same hole people keep digging themselves into. They gather their stuff then next thing they're in the in the same hole they got to prepare all this stuff. >> I think the previous paradigms for doing data preparation tie exactly to the data democracy themes that we're talking about here. If you only have a very silo'd group of people in the organization with very deep technical skills but don't have the business context for what they're actually trying to accomplish, you have this impedance mismatch in the organization between the people who know what they want and the people who have the tools to do it. So what we've tried to do, and again you know taking a page out of the way that Microsoft has approached solving these problems you know both in the past in the present. Is to say look we can actually take the tools that once were only in the hands of the, you know, shamans who know how to utter the right incantations and instead move that into the the common folk who actually. >> The users. >> The users themselves who know what they want to do with the data. Who understand what those data elements mean. So if you were to ask the Paxata point of view, why have we had these data prep problems? Because we've separated the people who had the tools from the people who knew what they wanted to do with it. >> So it sounds to me, correct me if this is the wrong term, that what you offer in your partnership is it basically a broad curational environment for knowledge workers. You know, to sift and sort and annotating shared data with the lineage of the data preserved in essentially a system of record that can follow the data throughout its natural life. Is that a fair characterization? >> Pranav: I would think so yeah. >> You mention, Pranav, the whole issue of how one visualizes or should visualize this entire chain of custody, as it were, for the data, is there is there any special visualization paradigm that you guys offer? Now Microsoft, you've made a fairly significant investment in graph technology throughout your portfolio. I was at Build back in May and Sacha and the others just went to town on all things to do with Microsoft Graph, will that technology be somehow at some point, now or in the future, be reflected in this overall capability that you've established here with your partner here Paxata? >> I am not sure. So far, I think what you've talked about is some Graph capabilities introduced from the Microsoft Graph that's sort of one extreme. The other side of Graph exists today as a developer you can do some Graph based queries. So you can go to Cosmos DB which had a Gremlin API. For Graph based query, so I don't know how. >> I'll get right to the question. What's the Paxata benefits of with HDInsight? How does that, just quickly, explain for the audience. What is that solution, what are the benefits? >> So the the solution is you get a one click install of installing Paxata HDInsight and the benefit is as a benefit for a user persona who's not, sort of, used to big data or Hadoop they can use a very familiar GUI-based experience to get their insights from data faster without having any knowledge of how Spark works or Hadoop works. >> And what does the Microsoft relationship bring to the table for Paxata? >> So I think it's a couple of things. One is Azure is clearly growing at an extremely fast pace. And a lot of the enterprise customers that we work with are moving many of their workloads to Azure and and these cloud based environments. Especially for us, the unique value proposition of a partner who truly understands the hybrid nature of the world. The idea that everything is going to move to the cloud or everything is going to stay on premise is too simplistic. Microsoft understood that from day one. That data would be in it and all of those different places. And they've provided enabling technologies for vendors like us. >> I'll just say it to maybe you're too coy to say it, but the bottom line is you have an Excel-like interface. They have Office 365 they're user's going to instantly love that interface because it's an easy to use interface an Excel-like it's not Excel interface per se. >> Similar. >> Metaphor, graphical user interface. >> Yes it is. >> It's clean and it's targeted at the analyst role or user. >> That's right. >> That's going to resonate in their install base. >> And combined with a lot of these new capabilities that Microsoft is rolling out from a big data perspective. So HDInsight has a very rich portfolio of runtime engines and capabilities. They're introducing new data storage layers whether it's ADLS or Azure BLOB storage, so it's really a nice way of us working together to extract and unlock a lot of the value that Microsoft. >> So, here's the tough question for you, open source projects I see Microsoft, comments were hell froze because LINUX is now part of their DNA, which was a comment I saw at the even this week in Orlando, but they're really getting behind open source. From open compute, it's just clearly new DNA's. They're they're into it. How are you guys working together in open source and what's the impact to developers because now that's only one cloud, there's other clouds out there so data's going to be an important part of it. So open source, together, you guys working together on that and what's the role for the data? >> From an open source perspective, Microsoft plays a big role in embracing open source technologies and making sure that it runs reliably in the cloud. And part of that value prop that we provide in sort of Azure HDInsight is being sure that you can run these open source big data workloads reliably in the cloud. So you can run open source like Apache, Spark, Hive, Storm, Kafka, R Server. And the hard part about running open source technology in the cloud is how do you fine tune it, and how do you configure it, how do you run it reliably. And that's what sort of what we bring in from a cloud perspective. And we also contribute back to the community based on sort of what learned by running these workloads in the cloud. And we believe you know in the broader ecosystem customers will sort of have a mixture of these combinations and their solution They'll be using some of the Microsoft solutions some open source solutions some solutions from ecosystem that's how we see our customer solution sort of being built today. >> What's the big advantage you guys have at Paxata? What's the key differentiator for why someone should work with you guys? Is it the automation? What's the key secret sauce to you guys? >> I think it's a couple of dimensions. One is I think we have come the closest in the industry to getting a user experience that matches the Excel target user. A lot of folks are attempting to do the same but the feedback we consistently get is that when the Excel user uses our solution they just, they get it. >> Was there a design criteria, was that from the beginning how you were going to do this? >> From day one. >> So you engineer everything to make it as simple as like Excel. >> We want people to use our system they shouldn't be coding, they shouldn't be writing scripts. They just need to be able. >> Good Excel you just do good macros though. >> That's right. >> So simple things like that right. >> But the second is being able to interact with the data at scale. There are a lot of solutions out there that make the mistake in our opinion of sampling very tiny amounts of data and then asking you to draw inferences and then publish that to batch jobs. Our whole approach is to smash the batch paradigm and actually bring as much into the interactive world as possible. So end users can actually point and click on 100 million rows of data, instead of the million that you would get in Excel, and get an instantaneous response. Verses designing a job in a batch paradigm and then pushing it through the the batch. >> So it's interactive data profiling over vast corpuses of data in the cloud. >> Nenshad: Correct. >> Nenshad Bardoliwalla thanks for coming on theCUBE appreciate it, congratulations on Paxata and Microsoft Azure, great to have you. Good job on everything you do with Azure. I want to give you guys props, with seeing the growth in the market and the investment's been going well, congratulations. Thanks for sharing, keep coverage here in BigData NYC more coming after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media in the Big Data world. it's hard with our accent, So Paxata, we had your partner on Prakash. launching theCUBE fun to watch you guys has done in many layers of the stack, is that they need to get the data faster. from the Microsoft angle. the different tools you can use. and how does that relate to you guys? have the right permissions to do so. But the same time and you need to have So, the benefit that you sort of get in the cloud What is the big problem with data prep, But seriously, there seems to be the same hole and instead move that into the the common folk from the people who knew what they wanted to do with it. is the wrong term, that what you offer for the data, is there is there So you can go to Cosmos DB which had a Gremlin API. What's the Paxata benefits of with HDInsight? So the the solution is you get a one click install And a lot of the enterprise customers but the bottom line is you have an Excel-like interface. user interface. It's clean and it's targeted at the analyst role to extract and unlock a lot of the value So open source, together, you guys working together and making sure that it runs reliably in the cloud. A lot of folks are attempting to do the same So you engineer everything to make it as simple They just need to be able. Good Excel you just do But the second is being able to interact So it's interactive data profiling and Microsoft Azure, great to have you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jersey	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Excel	TITLE	0.99+
2 companies	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Orlando	LOCATION	0.99+
Nenshad	PERSON	0.99+
Bardo	PERSON	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
third day	QUANTITY	0.99+
both	QUANTITY	0.99+
Office 365	TITLE	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
100 million rows	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Microsoft Ventures	ORGANIZATION	0.99+
Pranav Rastogi	PERSON	0.99+
first two days	QUANTITY	0.99+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
million	QUANTITY	0.98+
second	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
Spark	TITLE	0.98+
this week	DATE	0.98+
first	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
one click	QUANTITY	0.97+
Prakash	PERSON	0.97+
Azure	TITLE	0.97+
May	DATE	0.97+
Wikibon Big Data	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hive	TITLE	0.94+
today	DATE	0.94+
Strata Data	ORGANIZATION	0.94+
Pranav	PERSON	0.93+
NYC	LOCATION	0.93+
one cloud	QUANTITY	0.93+
2017	DATE	0.92+
Apache	ORGANIZATION	0.9+
Paxata	TITLE	0.9+
Graph	TITLE	0.89+
Pranav	ORGANIZATION	0.88+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Amit Walia, Informatica | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone, live here in New York City it's theCUBE's coverage of Big Data NYC. It's our event we've been doing for five years in conjunction with Strata Hadoop now called Strata Data right around the corner, separate place. Every year we get the best voices tech. Thought leaders, CEO's, executives, entrepreneurs anyone who's bringing the signal, we share that with you. I'm John Furrier, the co-host of theCUBE. Eight years covering Big Data, since 2010, the original Hadoop world. I'm here with Amit Walia, who's the Executive Vice President, Chief Product Officer for Informatica. Welcome back, good to see you. >> Good to be here John. >> theCUBE alumni, always great to have you on. Love product we had everyone on from Hortonworks. >> I just saw that. >> Product guys are great, can share the road map and kind of connect the dots. As Chief Product Officer, you have to have a 20 mile stare into the future. You got to know what the landscape is today, where it's going to be tomorrow. So I got to ask you, where's it going to be tomorrow? It seems that the rubber's hit the road, real value has to be produced. The hype of AI is out there, which I love by the way. People can see through that but they get it's good. Where's the value today? That's what customers want to know. I got hybrid cloud on the table, I got a lot of security concerns. Governance is a huge problem. The European regulations are coming over the top. I don't have time to do IoT and these other things, or do I? I mean this is a lot of challenges but how do you see it playing out? >> I think, to be candid, it's the best of times. The changing times are the best of times because people can experiment. I would say if you step back and take a look, we've been talking for such a long time. If there was any time, where forget the technology jargon of infrastructure, cloud, IoT, data has become the currency for every enterprise right? Everybody wants data. I say like you know, business users want today's data yesterday to make a decision tomorrow. IT has always been in the business of data, everybody wants more data. But the point you're making is that while that has become more relevant to an enterprise, it brings into the lot of other things, GDPR, it brings governance, it brings security issues, I mean hybrid clouds, some data on-prem, some data on cloud but in essence, what I think every company has realized that they will live and die by how well do they predict the future with the data they have on all their customers, products, whatever it is, and that's the new normal. >> Well hate to say it, admit pat myself on the back, but we in theCUBE team and Wikibon saw this early. You guys did too, and I want to bring up a comment we've talked about a couple of years ago. One, you guys were in the data business, Informatica. You guys went private but that was an early indicator of the trend that everyone's going private now. And that's a signal. For the first time, private equity finance have had trumped bigger venture capital asset class financing. Which is a signal that the waves are coming. We're surfing these little waves right now, we think they're big but they big ones are coming. The indicator is everyone's retrenching. Private equity's a sign of undervaluation. They want to actually also transform maybe some of the product engineering side of it or go to market. Basically get the new surfboard. >> Yeah. >> For the big waves. >> I mean that was the premise for us too because we saw as we were chatting right. We knew the new world, which was going towards predictive analytics or AI. See data is the richest thing for AI to be applied to but the thing is that it requires some heavy lifting. In fact that was our thesis, that as we went private, look we can double down on things like cloud. Invest truly for the next four years which being in public markets sometimes is hard. So we step back and look where we are as you were acting from my cover today. Our big believers look, there's so much data, so many varying architecture, so many different places. People are in Azure, or AWS, on-prem, by the way, still on mainframe. That hasn't gone away, you go back to the large customers. But ultimately when you talk about the biggest, I would say the new normal, which is AI, which clearly has been overtalked about but in my opinion has been barely touched because the biggest application of machine learning is on data. And that predicts things, whether you want to predict forecasting, or you predict something will come down or you can predict, and that's what we believe is where the world is going to go and that's what we double down on with our Claire technology. Just go deep, bring AI to data across the enterprise. >> We got to give you guys props, you guys are right on the line. I got to say as a product person myself, I see you guys executing great strategy, you've been very complimentary to your team, think you're doing a great job. Let's get back to AI. I think if you look at the hype cycles of things, IoT certainly has, still think there's a lot more hype to have there, there's so much more to do there. Cloud was overhyped, remember cloud washing? Pexus back in 2010-11, oh they're just cloud washing. Well that's a sign that ended up becoming what everyone was kind of hyping up. It did turn out. AI thinks the same thing. And I think it's real because you can almost connect the dots and be there but the reality is, is that it's just getting started. And so we had Rob Thomas from IBM on theCUBE and, you know we were talking. He made a comment, I want to get your reaction to, he said, "You can't have AI without IA." Information architecture. And you're in the information Informatica business you guys have been laying out an architecture specifically around governance. You guys kind of saw that early too. You can't just do AI, AI needs to be trained as data models. There's a lot of data involved that feeds AI. Who trains the machines that are doing the learning? So, you know, all these things come into play back to data. So what is the preferred information architecture, IA, that can power AI, artificial intelligence? >> I think it's a great question. I think of what typically, we recommend and we see large companies do look in the current complex architectures the companies are in. Hybrid cloud, multicloud, old architecture. By the way mainframe, client server, big data, you pick your favorite archit, everything exists for any enterprise right. People are not, companies are not going to move magically, everything to one place, to just start putting data in one place and start running some kind of AI on it. Our belief is that that will get organized around metadata. Metadata is data about data right? The organizing principle for any enterprise has to be around metadata. Leave your data wherever it is, organize your metadata, which is a much lighter footprint and then, that layer becomes the true central nervous system for your new next gen information architecture. That's the layer on which you apply machine learning too. So a great example is look, take GDPR. I mean GDPR is, if I'm a distributor, large companies have their GDPR. I mean who's touching my data? Where is my data coming from? Which database has sensitive data? All of these things are such complex problems. You will not move everything magically to one place. You will apply metadata approach to it and then machine learning starts to telling you gee I some anomaly detection. You see I'm seeing some data which does not have access to leave the geographical boundaries, of lets say Germany, going to, let's say UK. Those are kind of things that become a lot easier to solve once you go organize yourself at the metadata layer and that's the layer on which you apply AI. To me, that's the simplest way to describe as the organizing principle of what I call the data architecture or the information architecture for the next ten years. >> And that metadata, you guys saw that earlier, but how does that relate to these new things coming in because you know, one would argue that the ideal preferred infrastructure would be one that says hey no matter what next GDPR thing will happen, there'll be another Equifax that's going to happen, there'll be some sort of state sponsor cyber attack to the US, all these things are happening. I mean hell, all securities attacks are going up-- >> Security's a great example of that. We saw it four years ago you know, and we worked on a metadata driven approach to security. Look I've been on the security business however that's semantic myself. Security's a classic example of where it was all at the infrastructure layer, network, database, server. But the problem is that, it doesn't matter. Where is your database? In the cloud. Where is your network? I mean, do you run a data center anymore right? If I may, figuratively you don't. Ultimately, it's all about the data. The way at which we are going and we want more users like you and me access to data. So security has to be applied at the data layer. So in that context, I just talked about the whole metadata driven approach. Once you have the context of your data, you can apply governance to your data, you can apply security to your data, and as you keep adding new architectures, you do not have to create a paddle architecture you have to just append your metadata. So security, governance, hybrid cloud, all of those things become a lot easier for you, versus clearing one new architecture after another which you can never get to. >> Well people will be afraid of malware and these malicious attacks so auditing becomes now a big thing. If you look at the Equifax, it might take on, I have some data on that show that there was other action, they were fleeced out for weeks and months before the hack was even noticed. >> All this happens. >> I mean, they were ten times phished over even before it was discovered. They were inside, so audit trail would be interesting. >> Absolutely, I'll give you, typically, if you read any external report this is nothing tied to Equifax. It takes any enterprise three months minimum to figure out they're under attack. And now if a sophisticated attacker always goes to right away when they enter your enterprise, they're finding the weakest link. You're as secure as your weakest link in security. And they will go to some data trail that was left behind by some business user who moved onto the next big thing. But data was still flowing through that pipe. Or by the way, the biggest issue is inside our attack right? You will have somebody hack your or my credentials and they don't download like Snowden, a big fat document one day. They'll go drip by drip by drip by drip. You won't even know that. That again is an anomaly detection thing. >> Well it's going to get down to the firmware level. I mean look at the sophisticated hacks in China, they run their own DNS. They have certificates, they hack the iPhones. They make the phones and stuff, so you got to assume packing. But now, it's knowing what's going on and this is really the dynamic nature. So we're in the same page here. I'd love to do a security feature, come into the studio in our office at Palo Alto, think that's worthy. I just had a great cyber chat with Vidder, CTO of Vidder. Junaid is awesome, did some work with the government. But this brings up the question around big data. The landscape that we're in is fast and furious right now. You have big data being impacted by cloud because you have now unlimited compute, low latency storage, unlimited power source in that engine. Then you got the security paradigm. You could argue that that's going to slow things down maybe a little bit, but it also is going to change the face of big data. What is your reaction to the impact to security and cloud to big data? Because even though AI is the big talk of the show, what's really happening here at Strata Data is it's no longer a data show, it's a cloud and security show in my opinion. >> I mean cloud to me is everywhere. It was the, when Hadoop started it was on-prem but it's pretty much in the cloud and look at AWS and Azure, everyone runs natively there, so you're exactly right. To me what has happened is that, you're right, companies look at things two ways. If I'm experimenting, then I can look at it in a way where I'm not, I'm in dev mode. But you're right. As things are getting more operational and production then you have to worry about security and governance. So I don't think it's a matter of slowing down, it's a nature of the business where you can be fast and experiment on one side, but as you go prod, as you go real operational, you have to worry about controls, compliance and governance. By the way in that case-- >> And by the way you got to know what's going on, you got to know the flows. A data lake is a data lake, but you got the Niagara falls >> That's right. >> streaming content. >> Every, every customer of ours who's gone production they always want to understand full governance and lineage in the data flow. Because when I go talk to a regulator or I got talk to my CEO, you may have hundred people going at the data lake. I want to know who has access to it, if it's a production data lake, what are they doing, and by the way, what data is going in. The other one is, I mean walk around here. How much has changed? The world of big data or the wild wild west. Look at the amount of consolidation that has happened. I mean you see around the big distribution right? To me it's going to continue to happen because it's a nature of any new industry. I mean you looked at securities, cyber security big data, AI, you know, massive investment happens and then as customers want to truly go to scale they say look I can only bet on a few that can not only scale, but had the governance and compliance of what a large company wants. >> The waves are coming, there's no doubt about it. Okay so, let me get your reaction to end this segment. What's Informatica doing right now? I mean I've seen a whole lot 'cause we've cover you guys with the show and also we keep in touch, but I want you to spend a minute to talk about why you guys are better than what's out there on the floor. You have a different approach, why are customers working with you and if the folks aren't working with you yet, why should they work with Informatica? >> Our approach in a way has changed but not changed. We believe we operate in what we call the enterprise cloud data management. Our thing is look, we embrace open source. Open source, parks, parkstreaming, Kafka, you know, Hive, MapReduce, we support them all. To us, that's not where customers are spending their time. They're spending their time, once I got all that stuff, what can I do with it? If I'm truly building next gen predictive analytics platform I need some level of able to manage batch and streaming together. I want to make sure that it can scale. I want to make sure it has security, it has governance, it has compliance. So customers work with us to make sure that they can run a hybrid architecture. Whether it is cloud on-prem, whether it is traditional or big data or IoT, all in once place, it is scale-able and it has governance and compliance bricked into it. And then they also look for somebody that can provide true things like, not only data integration, quality, cataloging, all of those things, so when we working with large or small customers, whether you are in dev or prod, but ultimately helping you, what I call take you from an experiment stage to a large scale operational stage. You know, without batting an eyelid. That's the business we are in and in that case-- >> So you are in the business of operationalizing data for customers who want to add scale. >> Our belief is, we want to help our customers succeed. And customers will only succeed, not just by experimenting, but taking their experiments to production. So we have to think of the entire lifecycle of a customer. We cannot stop and say great for experiments, sorry don't go operational with us. >> So we've had a theme here in theCUBE this week called, I'm calling it, don't be a tool, and too many tools are out there right now. We call it the tool shed phenomenon. The tool shed phenomenon is customers aren't, they're tired of having too many tools and they bought a hammer a couple years ago that wants to try to be a lawn mower now and so you got to understand the nature of having great tooling, which you need which defines the work, but don't confuse a tool with a platform. And this is a huge issue because a lot of these companies that are flowing by wayside are groping for platforms. >> So there are customers tell us the same thing, which is why we-- >> But tools have to work in context. >> That's exactly, so that's why you heard, we talked about that for the last couple, it was the intelligent data platform. Customers don't buy a platform but all of our products, like are there microservices on our platform. Customers want to build the next gen data management platform, which is the intelligent data platform. A lot of little things are features or tools along the way but if I am a large bank, if I'm a large airline, and I want to go at scale operational, I can't stitch hundred tools and expect to run my IT shop from there. >> Yeah >> I can't I will never be able to do it. >> There's good tools out there that have a nice business model, lifestyle business or cashflow business, or even tools that are just highly focused and that's all they do and that's great. It's the guys who try to become something that they're not. It's hard, it's just too difficult. >> I think you have to-- >> The tool shed phenomenon is real. >> I think companies have to realize whether they are a feature. I always say are you a feature or are you a product? You have to realize the difference between the two and in between sits our tool. (John laughing) >> Well that quote came, the tool comment came from one of our chief data officers, that was kind of sparked the conversation but people buy a hammer, everything looks like a nail and you don't want to mow your lawn with a hammer, get a lawn mower right? Do the right tool for the job. But you have to platform, the data has to have a holistic view. >> That's exactly right. The intelligent data platform, that's what we call it. >> What's new with Informatica, what's going on? Give us a quick update, we'll end the segment with a quick update on Informatica. What do you got going on, what events are coming up? >> Well we just came off a very big release, we call it 10-2 which had lot of big data, hybrid cloud, AI and catalog and security and governance, all five of them. Big release, just came out and basically customers are adopting it. Which obviously was all centered around the things we talked in Informatica. Again, single platform, cloud, hybrid, big data, streaming and governance and compliance. And then right now, we are basically in the middle, after Informatica, we go on as barrage of tours across multiple cities across the globe so customers can meet us there. Paris is coming up, I was in London a few weeks ago. And then separately we're getting up for coming up, I will probably see you there at Amazon re:Invent. I mean we are obviously all-in partner for-- >> Do you have anything in China? >> China is a- >> Alibaba? >> We're working with them, I'll leave it there. >> We'll be in Alibaba in two weeks for their cloud event. >> Excellent. >> So theCUBE is breaking into China, CUBE China. We need some translators so if anyone out there wants to help us with our China blog. >> We'll be at Dreamforce. We were obviously, so you'll see us there. We were at Amazon Ignite, obviously very close to- >> re:Invent will be great. >> Yeah we will be there and Amazon obviously is a great partner and by the way a great customer of ours. >> Well congratulations, you guys are doing great, Informatica. Great to see the success. We'll see you at re:Invent and keep in touch. Amit Walia, the Executive Vice President, EVP, Chief Product Officer, Informatica. They get the platform game, they get the data game, check em out. It's theCUBE ending day two coverage. We've got a big event tonight. We're going to be streaming live our research that we are going to be rolling out here at Big Data NYC, our even that we're running in conjunction with Strata Data. They run their event, we run our event. Thanks for watching and stay tuned, stay with us. At five o'clock, live Wikibon coverage of their new research and then Party at Seven, which will not be filmed, that's when we're going to have some cocktails. I'm John Furrier, thanks for watching. Stay tuned. (techno music)

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE. theCUBE alumni, always great to have you on. and kind of connect the dots. I say like you know, business users want today's data of the product engineering side of it or go to market. See data is the richest thing for AI to be applied to We got to give you guys props, and that's the layer on which you apply AI. And that metadata, you guys saw that earlier, and we want more users like you and me access to data. I have some data on that show that there was other action, I mean, they were if you read any external report I mean look at the sophisticated hacks in China, it's a nature of the business where you can be fast And by the way you got to know what's going on, I mean you see around the big distribution right? and if the folks aren't working with you yet, That's the business we are in and in that case-- So you are in the business of operationalizing data but taking their experiments to production. and so you got to understand the nature That's exactly, so that's why you heard, I will never be able to do it. It's the guys who try to become something that they're not. I always say are you a feature or are you a product? and you don't want to mow your lawn with a hammer, The intelligent data platform, that's what we call it. What do you got going on, what events are coming up? I will probably see you there at Amazon re:Invent. wants to help us with our China blog. We were obviously, so you'll see us there. is a great partner and by the way a great customer of ours. you guys are doing great, Informatica.

ENTITIES

Entity	Category	Confidence
Amit Walia	PERSON	0.99+
London	LOCATION	0.99+
Alibaba	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
China	LOCATION	0.99+
ten times	QUANTITY	0.99+
Informatica	ORGANIZATION	0.99+
John	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
yesterday	DATE	0.99+
Rob Thomas	PERSON	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
hundred people	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
20 mile	QUANTITY	0.99+
three months	QUANTITY	0.99+
Paris	LOCATION	0.99+
today	DATE	0.99+
five	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
iPhones	COMMERCIAL_ITEM	0.99+
theCUBE	ORGANIZATION	0.99+
2010	DATE	0.99+
one side	QUANTITY	0.99+
UK	LOCATION	0.99+
Palo Alto	LOCATION	0.98+
Germany	LOCATION	0.98+
AWS	ORGANIZATION	0.98+
one	QUANTITY	0.98+
four years ago	DATE	0.98+
one place	QUANTITY	0.98+
Dreamforce	ORGANIZATION	0.98+
two ways	QUANTITY	0.98+
Eight years	QUANTITY	0.98+
Vidder	ORGANIZATION	0.98+
2010-11	DATE	0.98+
tonight	DATE	0.97+
GDPR	TITLE	0.97+
NYC	LOCATION	0.97+
Junaid	PERSON	0.97+
this week	DATE	0.97+
MapReduce	ORGANIZATION	0.96+
Pexus	ORGANIZATION	0.95+
One	QUANTITY	0.95+
two weeks	QUANTITY	0.95+
five o'clock	DATE	0.94+
first time	QUANTITY	0.94+
big	EVENT	0.94+
single platform	QUANTITY	0.92+
CTO	PERSON	0.92+
Strata Hadoop	ORGANIZATION	0.91+
Claire	ORGANIZATION	0.9+
Strata Data	ORGANIZATION	0.89+
US	LOCATION	0.88+

Day Two Kickoff | Big Data NYC

(quite music) >> I'll open that while he does that. >> Co-Host: Good, perfect. >> Man: All right, rock and roll. >> This is Robin Matlock, the CMO of VMware, and you're watching theCUBE. >> This is John Siegel of VPA Product Marketing at Dell EMC. You're watching theCUBE. >> This is Matthew Morgan, I'm the chief marketing officer at Druva and you are watching theCUBE. >> Announcer: Live from midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (rippling music) >> Hello, everyone, welcome to a special CUBE live presentation here in New York City for theCUBE's coverage of BigData NYC. This is where all the action's happening in the big data world, machine learning, AI, the cloud, all kind of coming together. This is our fifth year doing BigData NYC. We've been covering the Hadoop ecosystem, Hadoop World, since 2010, it's our eighth year really at ground zero for the Hadoop, now the BigData, now the Data Market. We're doing this also in conjunction with Strata Data, which was Strata Hadoop. That's a separate event with O'Reilly Media, we are not part of that, we do our own event, our fifth year doing our own event, we bring in all the thought leaders. We bring all the influencers, meaning the entrepreneurs, the CEOs to get the real story about what's happening in the ecosystem. And of course, we do it with our analyst at Wikibon.com. I'm John Furrier with my cohost, Jim Kobielus, who's the chief analyst for our data piece. Lead analyst Jim, you know the data world's changed. We had commenting yesterday all up on YouTube.com/SiliconAngle. Day one was really set the table. And we kind of get the whiff of what's happening, we can kind of feel the trend, we got a finger on the pulse. Two things going on, two big notable stories is the world's continuing to expand around community and hybrid data and all these cool new data architectures, and the second kind of substory is the O'Reilly show has become basically a marketing. They're making millions of dollars over there. A lot of people were, last night, kind of not happy about that, and what's giving back to the community. So, again, the community theme is still resonating strong. You're starting to see that move into the corporate enterprise, which you're covering. What are you finding out, what did you hear last night, what are you hearing in the hallways? What is kind of the tea leaves that you're reading? What are some of the things you're seeing here? >> Well, all things hybrid. I mean, first of all it's building hybrid applications for hybrid cloud environments and there's various layers to that. So yesterday on theCUBE we had, for example, one layer is hybrid semantic virtualization labels are critically important for bridging workloads and microservices and data across public and private clouds. We had, from AtScale, we had Bruno Aziza and one of his customers discussing what they're doing. I'm hearing a fair amount of this venerable topic of semantic data virtualization become even more important now in the era of hybrid clouds. That's a fair amount of the scuttlebutt in the hallway and atrium talks that I participated in. Also yesterday from BMC we had Basil Faruqi talking about basically talking about automating data pipelines. There are data pipelines in hybrid environments. Very, very important for DevOps, productionizing these hybrid applications for these new multi-cloud environments. That's quite important. Hybrid data platforms of all sorts. Yesterday we had from ActIn Jeff Veis discussing their portfolio for on-prem, public cloud, putting the data in various places, and speeding up the queries and so forth. So hybrid data platforms are going increasingly streaming in real time. What I'm getting is that what I'm hearing is more and more of a layering of these hybrid environments is a critical concern for enterprises trying to put all this stuff together, and future-proof it so they can add on all the new stuff. That's coming along like cirrus clouds, without breaking interoperability, and without having to change code. Just plug and play in a massively multi-cloud environment. >> You know, and also I'm critical of a lot of things that are going on. 'Cause to your point, the reason why I'm kind of critical on the O'Reilly show and particularly the hype factor going on in some areas is two kinds of trends I'm seeing with respect to the owners of some of the companies. You have one camp that are kind of groping for solutions, and you'll see that with they're whitewashing new announcements, this is going on here. It's really kind of-- >> Jim: I think it's AI now, by the way. >> And they're AI-washing it, but you can, the tell sign is they're always kind of doing a magic trick of some type of new announcement, something's happening, you got to look underneath that, and say where is the deal for the customers? And you brought this up yesterday with Peter Burris, which is the business side of it is really the conversation now. It's not about the speeds and feeds and the cluster management, it's certainly important, and those solutions are maturing. That came up yesterday. The other thing that you brought up yesterday I thought was notable was the real emphasis on the data science side of it. And it's that it's still not easy or data science to do their job. And this is where you're seeing productivity conversations come up with data science. So, really the emphasis at the end of the day boils down to this. If you don't have any meat on the bone, you don't have a solution that rubber hits the road where you can come in and provide a tangible benefit to a company, an enterprise, then it's probably not going to work out. And we kind of had that tool conversation, you know, as people start to grow. And so as buyers out there, they got to look, and kind of squint through it saying where's the real deal? So that kind of brings up what's next? Who's winning, how do you as an analyst look at the playing field and say, that's good, that's got traction, that's winning, mm not too sure? What's your analysis, how do you tell the winners from the losers, and what's your take on this from the data science lens? >> Well, first of all you can tell the winners when they have an ample number of referenced customers who are doing interesting things. Interesting enough to get a jaded analyst to pay attention. Doing something that changes the fabric of work or life, whatever, clearly. Solution providers who can provide that are, they have all the hallmarks of a winner meaning they're making money, and they're likely to grow and so forth. But also the hallmarks of a winner are those, in many ways, who have a vision and catalyze an ecosystem around that vision of something that could be made, possibly be done before but not quite as efficiently. So you know, for example, now the way what we're seeing now in the whole AI space, deep learning, is, you know, AI means many things. The core right now, in terms of the buzzy stuff is deep learning for being able to process real time streams of video, images and so forth. And so, what we're seeing now is that the vendors who appear to be on the verge of being winners are those who use deep learning inside some new innovation that has enough, that appeals to a potential mass market. It's something you put on your, like an app or something you put on your smart phone, or it's something you buy at Walmart, install in your house. You know, the whole notion of clearly Alexa, and all that stuff. Anything that takes chatbot technology, really deep learning powers chatbots, and is able to drive a conversational UI into things that you wouldn't normally expect to talk to you and does it well in a way that people have to have that. Those are the vendors that I'm looking for, in terms of those are the ones that are going to make a ton of money selling to a mass market, and possibly, and very much once they go there, they're building out a revenue stream and a business model that they can conceivably take into other markets, especially business markets. You know, like Amazon, 20-something years ago when they got started in the consumer space as the exemplar of web retailing, who expected them 20 years later to be a powerhouse provider of business cloud services? You know, so we're looking for the Amazons of the world that can take something as silly as a conversational UI inside of a, driven by DL, inside of a consumer appliance and 20 years from now, maybe even sooner, become a business powerhouse. So that's what's new. >> Yeah, the thing that comes up that I want to get your thoughts on is that we've seen data integration become a continuing theme. The other thing about the community play here is you start to see customers align with syndicates or partnerships, and I think it's always been great to have customer traction, but, as you pointed out, as a benchmark. But now you're starting to see the partner equation, because this isn't open, decentralized, distributed internet these days. And it is looking like it's going to form differently than they way it was, than the web days and with mobile and connected devices it IoT and AI. A whole new infrastructure's developing, so you're starting to see people align with partnerships. So I think that's something that's signaling to me that the partnership is amping up. I think the people are partnering more. We've had Hortonworks on with IBM, people are partner, some people take a Switzerland approach where they partner with everyone. You had, WANdisco partners with all the cloud guys, I mean, they have unique ITP. So you have this model where you got to go out, do something, but you can't do it alone. Open source is a key part of this, so obviously that's part of the collaboration. This is a key thing. And then they're going to check off the boxes. Data integration, deep learning is a new way to kind of dig deeper. So the question I have for you is, the impact on developers, 'cause if you can connect the dots between open source, 90% of the software written will be already open source, 10% differentiated, and then the role of how people going to market with the enterprise of a partnership, you can almost connect the dots and saying it's kind of a community approach. So that leaves the question, what is the impact to developers? >> Well the impact to developers, first of all, is when you go to a community approach, and like some big players are going more community and partnership-oriented in hot new areas like if you look at some of the recent announcements in chatbots and those technologies, we have sort of a rapprochement between Microsoft and Facebook and so forth, or Microsoft and AWS. The impact for developers is that there's convergence among the companies that might have competed to the death in particular hot new areas, like you know, like I said, chatbot-enabled apps for mobile scenarios. And so it cuts short the platform wars fairly quickly, harmonizes around a common set of APIs for accessing a variety of competing offerings that really overlap functionally in many ways. For developers, it's simplification around a broader ecosystem where it's not so much competition on the underlying open source technologies, it's now competition to see who penetrates the mass market with actually valuable solutions that leverage one or more of those erstwhile competitors into some broader synthesis. You know, for example, the whole ramp up to the future of self-driving vehicles, and it's not clear who's going to dominate there. Will it be the vehicle manufacturers that are equipping their cars with all manner of computerized everything to do whatnot? Or will it be the up-and-comers? Will it be the computer companies like Apple and Microsoft and others who get real deep and invest fairly heavily in self-driving vehicle technology, and become themselves the new generation of automakers in the future? So, what we're getting is that going forward, developers want to see these big industry segments converge fairly rapidly around broader ecosystems, where it's not clear who will be the dominate player in 10 years. The developers don't really care, as long as there is consolidation around a common framework to which they can develop fairly soon. >> And open source is obviously a key role in this, and how is deep learning impacting some of the contributions that are being made, because we're starting to see the competitive advantage in collaboration on the community side is with the contributions from companies. For example, you mentioned TensorFlow multiple times yesterday from Google. I mean, that's a great contribution. If you're a young kind coming into the developer community, I mean, this is not normal. It wasn't like this before. People just weren't donating massive libraries of great stuff already pre-packaged, So all new dynamics emerging. Is that putting pressure on Amazon, is that putting pressure on AWS and others? >> It is. First of all, there is a fair amount of, I wouldn't call it first-mover advantage for TensorFlow, there've been a number of DL toolkits on the market, open source, for the last several years. But they achieved the deepest and broadest adoption most rapidly, and now they are a, TensorFlow is essentially a defacto standard in the way, that we just go back, betraying my age, 30, 40 years ago where you had two companies called SAS and SPSS that quickly established themselves as the go-to statistical modeling tools. And then they got a generation, our generation, of developers, or at least of data scientists, what became known as data scientists, to standardize around you're either going to go with SAS or SPSS if you're going to do data mining. Cut ahead to the 2010s now. The new generation of statistical modelers, it's all things DL and machine learning. And so SAS versus SPSS is ages ago, those companies are, those products still exist. But now, what are you going to get hooked on in school? What are you going to get hooked on in high school, for that matter, when you're just hobby-shopping DL? You'll probably get hooked on TensorFlow, 'cause they have the deepest and the broadest open source community where you learn this stuff. You learn the tools of the trade, you adopt that tool, and everybody else in your environment is using that tool, and you got to get up to speed. So the fact is, that broad adoption early on in a hot new area like DL, means tons. It means that essentially TensorFlow is the new Spark, where Spark, you know, once again, Spark just in the past five years came out real fast. And it's been eclipsed, as it were, on the stack of cool by TensorFlow. But it's a deepening stack of open source offerings. So the new generation of developers with data science workbenches, they just assume that there's Spark, and they're going to increasingly assume that there's TensorFlow in there. They're going to increasingly assume that there are the libraries and algorithms and models and so forth that are floating around in the open source space that they can use to bootstrap themselves fairly quickly. >> This is a real issue in the open source community which we talked, when we were in LA for the Open Source Summit, was exactly that. Is that, there are some projects that become fashionable, so for example, a cloud-native foundation, very relevant but also hot, really hot right now. A lot of people are jumping on board the cloud natives bandwagon, and rightfully so. A lot of work to be done there, and a lot of things to harvest from that growth. However, the boring blocking and tackling projects don't get all the fanfare but are still super relevant, so there's a real challenge of how do you nurture these awesome projects that we don't want to become like a nightclub where nobody goes anymore because it's not fashionable. Some of these open source projects are super important and have massive traction, but they're not as sexy, or flair-ish as some of that. >> Dl is not as sexy, or machine learning, for that matter, not as sexy as you would think if you're actually doing it, because the grunt work, John, as we know for any statistical modeling exercise, is data ingestion and preparation and so forth. That's 75% of the challenge for deep learning as well. But also for deep learning and machine learning, training the models that you build is where the rubber meets the road. You can't have a really strongly predictive DL model in terms of face recognition unless you train it against a fair amount of actual face data, whatever it is. And it takes a long time to train these models. That's what you hear constantly. I heard this constantly in the atrium talking-- >> Well that's a data challenge, is you need models that are adapting and you need real time, and I think-- >> Oh, here-- >> This points to the real new way of doing things, it's not yesterday's model. It's constantly evolving. >> Yeah, and that relates to something I read this morning or maybe it was last night, that Microsoft has made a huge investment in AI and deep learning machinery. They're doing amazing things. And one of the strategic advantages they have as a large, established solution provider with a search engine, Bing, is that from what I've been, this is something I read, I haven't talked to Microsoft in the last few hours to confirm this, that Bing is a source of training data that they're using for machine learning and I guess deep learning modeling for their own solutions or within their ecosystem. That actually makes a lot of sense. I mean, Google uses YouTube videos heavily in its deep learning for training data. So there's the whole issue of if you're a pipsqueak developer, some, you know, I'm sorry, this sounds patronizing. Some pimply-faced kid in high school who wants to get real deep on TensorFlow and start building and tuning these awesome kickass models to do face recognition, or whatever it might be. Where are you going to get your training data from? Well, there's plenty of open source database, or training databases out there you can use, but it's what everybody's using. So, there's sourcing the training data, there's labeling the training data, that's human-intensive, you need human beings to label it. There was a funny recent episode, or maybe it was a last-season episode of Silicone Valley that was all about machine learning and building and training models. It was the hot dog, not hot dog episode, it was so funny. They bamboozle a class on the show, fictionally. They bamboozle a class of college students to provide training data and to label the training data for this AI algorithm, it was hilarious. But where are you going to get the data? Where are you going to label it? >> Lot more work to do, that's basically what you're getting at. >> Jim: It's DevOps, you know, but it's grunt work. >> Well, we're going to kick off day two here. This is the SiliconeANGLE Media theCUBE, our fifth year doing our own event separate from O'Reilly media but in conjunction with their event in New York City. It's gotten much bigger here in New York City. We call it BigData NYC, that's the hashtag. Follow us on Twitter, I'm John Furrier, Jim Kobielus, we're here all day, we've got Peter Burris joining us later, head of research for Wikibon, and we've got great guests coming up, stay with us, be back with more after this short break. (rippling music)

Published Date : Sep 27 2017

SUMMARY :

This is Robin Matlock, the CMO of VMware, This is John Siegel of VPA Product Marketing This is Matthew Morgan, I'm the chief marketing officer Brought to you by SiliconANGLE Media What is kind of the tea leaves that you're reading? That's a fair amount of the scuttlebutt I'm kind of critical on the O'Reilly show is really the conversation now. Doing something that changes the fabric So the question I have for you is, the impact on developers, among the companies that might have competed to the death and how is deep learning impacting some of the contributions You learn the tools of the trade, you adopt that tool, and a lot of things to harvest from that growth. That's 75% of the challenge for deep learning as well. This points to the in the last few hours to confirm this, that's basically what you're getting at. This is the SiliconeANGLE Media theCUBE,

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Robin Matlock	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Matthew Morgan	PERSON	0.99+
Basil Faruqi	PERSON	0.99+
Jim	PERSON	0.99+
John Siegel	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
yesterday	DATE	0.99+
90%	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
two companies	QUANTITY	0.99+
New York City	LOCATION	0.99+
SPS	ORGANIZATION	0.99+
SAS	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
75%	QUANTITY	0.99+
LA	LOCATION	0.99+
Silicone Valley	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Walmart	ORGANIZATION	0.99+
2010s	DATE	0.99+
YouTube	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
AtScale	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
WANdisco	ORGANIZATION	0.99+
Jeff Veis	PERSON	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Yesterday	DATE	0.99+
Dell EMC	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
eighth year	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
millions of dollars	QUANTITY	0.99+
Bing	ORGANIZATION	0.99+
BMC	ORGANIZATION	0.98+
Amazons	ORGANIZATION	0.98+
last night	DATE	0.98+
two kinds	QUANTITY	0.98+
Spark	TITLE	0.98+
Hortonworks	ORGANIZATION	0.98+
Day one	QUANTITY	0.98+
20 years later	DATE	0.98+
VPA	ORGANIZATION	0.98+
2010	DATE	0.98+
ActIn	ORGANIZATION	0.98+
Open Source Summit	EVENT	0.98+
one layer	QUANTITY	0.98+
Druva	ORGANIZATION	0.97+
Alexa	TITLE	0.97+
day two	QUANTITY	0.97+
Bruno Aziza	PERSON	0.97+
SPSS	TITLE	0.97+
Switzerland	LOCATION	0.97+
Two things	QUANTITY	0.96+
NYC	LOCATION	0.96+
Wikibon	ORGANIZATION	0.96+
30	DATE	0.95+
Wikibon.com	ORGANIZATION	0.95+
SiliconeANGLE Media	ORGANIZATION	0.95+
O'Reilly	ORGANIZATION	0.95+

Yaron Haviv, iguazio | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back everyone, we're live in New York City, this is theCUBE's coverage of BigData NYC, this is our own event for five years now we've been running it, been at Hadoop World since 2010, it's our eighth year covering the Hadoop World which has evolved into Strata Conference, Strata Hadoop, now called Strata Data, and of course it's bigger than just Strata, it's about big data in NYC, a lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the cohost this week with Jim Kobielus, who's the lead analyst on our BigData and our Wikibon team. Our next guest is Yaron Haviv, who's with iguazio, he's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE, good to see you again, congratulations. >> Yes, thanks, thanks very much. We're happy to be here again. >> You're known in the theCUBE community as the guy on Twitter who's always pinging me and Dave and team, saying, "Hey, you know, you guys got to "get that right." You really are one of the smartest guys on the network in our community, you're super-smart, your team has got great tech chops, and in the middle of all that is the hottest market which is cloud native, cloud native as it relates to the integration of how apps are being built, and essentially new ways of engineering around these solutions, not just repackaging old stuff, it's really about putting things in a true cloud environment, with an application development, with data at the center of it, you got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my pointed questions I know Jim's got a ton of questions, is give us an update on what's going on so you guys got some news here at the show, let's get to that first. >> So since the last time we spoke, we had tons of news. We're making revenues, we have customers, we've just recently GA'ed, we recently got significant investment from major investors, we raised about $33 million recently from companies like Verizon Ventures, Bosch, you know for IoT, Chicago Mercantile Exchange, which is Dow Jones and other properties, Dell EMC. So pretty broad. >> John: So customers, pretty much. >> Yeah, so that's the interesting thing. Usually you know investors are sort of strategic investors or partners or potential buyers, but here it's essentially our customers that it's so strategic to the business, we want to... >> Let's go with GA of the projects, just get into what's shipping, what's available, what's the general availability, what are you now offering? >> So iguazio is trying to, you know, you alluded to cloud native and all that. Usually when you go to events like Strata and BigData it's nothing to do with cloud native, a lot of hard labor, not really continuous development and integration, it's like continuous hard work, it's continuous hard work. And essentially what we did, we created a data platform which is extremely fast and integrated, you know has all the different forms of states, streaming and events and documents and tables and all that, into a very unique architecture, won't dive into that today. And on top of it we've integrated cloud services like Kubernetes and serverless functionality and others, so we can essentially create a hybrid cloud. So some of our customers they even deploy portions as an Opix-based settings in the cloud, and some portions in the edge or in the enterprise deployed the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. >> John: Is this a SAS product? >> So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you can just rent it, and it's deployed in Equanix facility, we have very strong partnerships with them globally. If you want to have something on-prem, you can get a software reference architecture, you go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics tech you could think of. You just put it in the factory instead of like two racks of Hadoop. >> So you're not general purpose, you're just whatever the customer wants to deploy the stack, their flexibility is on them. >> Yeah. Now it is an appliance >> You have a hosting solution? >> It is an appliance even when you deploy it on-prem, it's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs and you have UIs, and just like the cloud experience when you go to Amazon, you don't open the Kimono, you know, you just use it. So our experience that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs, >> You don't host anything for anyone? >> We host for some customers, including >> So you do whatever the customer was interested in doing? >> Yes. (laughs) >> So you're flexible, okay. >> We just want to make money. >> You're pretty good, sticking to the product. So on the GA, so here essentially the big data world you mentioned that there's data layers, like data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right. >> Yaron: Okay. >> Okay, yeah. >> No, you're a smart guy. >> What problem are you solving. So we'll just go to the simple. I love what you're doing, I assume you guys are super-smart, which I can say you are, but what's the problem you're solving, what's in it for me? >> Okay, so there are two problems. One is the challenge everyone wants to transform. You know there is this digital transformation mantra. And it means essentially two things. One is, I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. You know, I want to do mobile apps which are smarter, you know get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy Hadoop and Hive and all that stuff, and it takes them two years to productize it. And then they get to the data science bit. And by the time they finished they understand that this Hadoop thing can only do one thing. It's queries, and reporting and BI, and data warehousing. How do you do actionable insights from that stuff, okay? 'Cause actionable insights means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details. And then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into data lake, and started querying it and generating reports. And the boss said >> Low cost data link basically, was what you say. >> Yes, and the boss said, "Okay, what are we going to do with this report? "Is it generating any revenue to the business?" No. The only revenue generation if you take this data >> You're fired, exactly. >> No, not all fired, but now >> John: Look at the budget >> Now they're starting to buy our stuff. So now the point is okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase, I want to promote it into production. That's cloud native architectures, okay? Hadoop is not cloud, How do I take a Spark, Zeppelin, you know, a notebook and I turn it into production? There's no way to do that. >> By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. >> Yeah, so the cloud providers do address that because they are selling the package, >> Expands all the clouds, yeah. >> Yeah, so cloud providers are starting to have their own offerings which are all proprietary around this is how you would, you know, forget about HDFS, we'll have S3, and we'll have Redshift for you, and we'll have Athena, and again you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Redshift, you know, Kinesis, all that stuff, and it took them about two hours to generate the insights. Now the problem is they want to do driver incentives in real time. So they want to incent the driver to go and make more rides or other things, so they have to analyze the event of the location of the driver, the event of the location of the customers, and just throwing messages back based on analytics. So that's real time analytics, and that's not something that you can do >> They got to build that from scratch right away. I mean they can't do that with the existing. >> No, and Uber invested tons of energy around that and they don't get the same functionality. Another unique feature that we talk about in our PR >> This is for the use case you're talking about, this is the Grab, which is the car >> Grab is the number one ride-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use Hadoop, they use MemSQL for that stuff, so it's not really using open source and all that. But the point is for example, with Uber, when you have a, when they monetize the rides, they do it just based on demand, okay. And with Grab, now what they do, because of the capability that we can intersect tons of data in real time, they can also look at the weather, was there a terror attack or something like that. They don't want to raise the price >> A lot of other data points, could be traffic >> They don't want to raise the price if there was a problem, you know, and all the customers get aggravated. This is actually intersecting data in real time, and no one today can do that in real time beyond what we can do. >> A lot of people have semantic problems with real time, they don't even know what they mean by real time. >> Yaron: Yes. >> The data could be a week old, but they can get it to them in real time. >> But every decision, if you think if you generalize round the problem, okay, and we have slides on that that I explain to customers. Every time I run analytics, I need to look at four types of data. The context, the event, okay, what happened, okay. The second type of data is the previous state. Like I have a car, was it up or down or what's the previous state of that element? The third element is the time aggregation, like, what happened in the last hour, the average temperature, the average, you know, ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now. That's secondary data. So every time I run a machine learning task or any decision I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay, you take MemSQL, it's only the state part, you take Hadoop it's only like historical stuff. How do you assemble and stitch a feature vector. >> Well you talked about complex machine learning pipeline, so clearly, you're talking about a hybrid >> It's a prediction. And actions based on just dumb things, like the car broke and I need to send a garage, I don't need machine learning for that. >> So within your environment then, do you enable the machine learning models to execute across the different data platforms, of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real time decision? >> In our solution, everything is a document, so even a picture is a document, a lot of things. So you can essentially throw in a picture, run tensor flow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible, so that's what we give customers. The first thing is simplicity. They can now build applications, you know we have tier one now, automotive customer, CIO coming, meeting us. So you know when I have a project, one year, I need to have hired dozens of people, it's hugely complex, you know. Tell us what's the use case, and we'll build a prototype. >> John: All right, well I'm going to >> One week, we gave them a prototype, and he was amazed how in one week we created an application that analyzed all the streams from the data from the cars, did enrichment, did machine learning, and provided predictions. >> Well we're going to have to come in and test you on this, because I'm skeptical, but here's why. >> Everyone is. >> We'll get to that, I mean I'm probably not skeptical but I kind of am because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack. I mean that thing just morphed into a beast. Hadoop was a cost of ownership nightmare as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how are you different, if I'm going to play the skeptic here? You know, I've heard this before. How are you different than say OpenStack or Hadoop Clusters, 'cause that was a nightmare, cost of ownership, I couldn't get the type of value I needed, lost my budget. Why aren't you the same? >> Okay, that's interesting. I don't know if you know but I ran a lot of development for OpenStack when I was in Matinox and Hadoop, so I patched a lot of those >> So do you agree with what I said? That that was a problem? >> They are extremely complex, yes. And I think one of the things that first OpenStack tried to bite on too much, and it's sort of a huge tent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay. And also Hadoop is sort of a something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data. And then the ecosystem evolved into real time, and streaming and machine learning. >> A data warehousing alternative or whatever. >> So it doesn't fit the original model of batch processing, 'cause if an event comes from the car or an IoT device, and you have to do something with it, you need a table with an index. You can't just go and build a huge Parquet file. >> You know, you're talking about complexity >> John: That's why he's different. >> Go ahead. >> So what we've done with our team, after knowing OpenStack and all those >> John: All the scar tissue. >> And all the scar tissues, and my role was also working with all the cloud service providers, so I know their internal architecture, and I worked on SAP HANA and Exodata and all those things, so we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide, provide you infrastructure as a service. Let's focus on the application, and build from the application all the way to the flash, and the CPU instruction set, and the adapters and the networking, okay. That's what's different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with the dozen of dev ops guys all the stack above. You go to Amazon, you get services. Just they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes a VM and starts writing some >> But they're still a good service, but you got to put it together. >> Yeah right. But also the way they implement, because in order for them to scale is that they have a common layer, they found VMs, and then they're starting to build up applications so it's inefficient. And also a lot of it is built on 10-year-old baseline architecture. We've designed it for a very modern architecture, it's all parallel CPUs with 30 cores, you know, flash and NVMe. And so we've avoided a lot of the hardware challenges, and serialization, and just provide and abstraction layer pretty much like a cloud on top. >> Now in terms of abstraction layers in the cloud, they're efficient, and provide a simplification experience for developers. Serverless computing is up and coming, it's an important approach, of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. Can you talk about what you're doing at iguazio on serverless frameworks for on-prem or public? >> Yes, it's the first time I'm very active in CNC after Cloud Native Foundation. I'm one of the authors of the serverless white paper, which tries to normalize the definitions of all the vendors and come with a proposal for interoperable standard. So I spent a lot of energy on that, 'cause we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guys' stuff. We have all the Amazon APIs for data services, like Kinesis, Dynamo, S3, et cetera. We have the open source APIs, like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or iguazio, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nuclio. It's designed for real time >> John: How do you spell that? >> N-U-C-L-I-O. I even have the logo >> He's got a nice slick here. >> It's really fast because it's >> John: Nuclio, so that's open source that you guys just sponsor and it's all code out in the open? >> All the code is in the open, pretty cool, has a lot of innovative ideas on how to do stream processing and best, 'cause the original serverless functionality was designed around web hooks and HTTP, and even many of the open source projects are really designed around HTTP serving. >> I have a question. I'm doing research for Wikibon on the area of serverless, in fact we've recently published a report on serverless, and in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public, you know, serverless like AWS Lambda, and private on-prem deployment of serverless. Do you have any customers who are doing that or interested in hybridizing serverless across public and private? >> Of course, and we have some patents I don't want to go into, but the general idea is, what we've done in Nuclio is also the decoupling of the data from the computation, which means that things can sort of be disjoined. You can run a function in Raspberry Pi, and the data will be in a different place, and those things can sort of move, okay. >> So the persistence has to happen outside the serverless environment, like in the application itself? >> Outside of the function, the function acts as the persistent layer through APIs, okay. And how this data persistence is materialized, that server separate thing. So you can actually write the same function that will run against Kafka or Kinesis or Private MQ, or HTTP without modifying the function, and ad hoc, through what we call function bindings, you define what's going to be the thing driving the data, or storing the data. So that can actually write the same function that does ETL drop from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about a hundred times faster than Lambda, we do 400,000 events per second in Nuclio. So if you write your serverless code in Nuclio, it's faster than writing it yourself, because of all those low-level optimizations. >> Yaron, thanks for coming on theCUBE. We want to do a deeper dive, love to have you out in Palo Alto next time you're in town. Let us know when you're in Silicon Valley for sure, we'll make sure we get you on camera for multiple sessions. >> And more information re:Invent. >> Go to re:Invent. We're looking forward to seeing you there. Love the continuous analytics message, I think continuous integration is going through a massive renaissance right now, you're starting to see new approaches, and I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. >> Great. >> That's very great. >> This is theCUBE coverage of the hot startups here at BigData NYC, live coverage from New York, after this short break. I'm John Furrier, Jim Kobielus, after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media I'm John Furrier, the cohost this week with Jim Kobielus, We're happy to be here again. and in the middle of all that is the hottest market So since the last time we spoke, we had tons of news. Yeah, so that's the interesting thing. and some portions in the edge or in the enterprise all the analytics tech you could think of. So you're not general purpose, you're just Now it is an appliance and just like the cloud experience when you go to Amazon, So I got to ask you the question, which I can say you are, So the first generation is people that basically, was what you say. Yes, and the boss said, and in the same time generate actions, By the way, depending on which cloud you go to, and that's not something that you can do I mean they can't do that with the existing. and they don't get the same functionality. because of the capability that we can intersect and all the customers get aggravated. A lot of people have semantic problems with real time, but they can get it to them in real time. the average temperature, the average, you know, like the car broke and I need to send a garage, So you know when I have a project, an application that analyzed all the streams from the data Well we're going to have to come in and test you on this, but I kind of am because the history is pretty clear. I don't know if you know but I ran a lot of development is how do you do web ranking, okay, and you have to do something with it, and build from the application all the way to the flash, but you got to put it together. it's all parallel CPUs with 30 cores, you know, Now in terms of abstraction layers in the cloud, So also on the serverless, my agenda is trying to promote I even have the logo and even many of the open source projects on the area of serverless, in fact we've recently and the data will be in a different place, So if you write your serverless code in Nuclio, We want to do a deeper dive, love to have you is exactly along the lines of what the world wants, I'm John Furrier, Jim Kobielus, after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Verizon Ventures	ORGANIZATION	0.99+
Yaron Haviv	PERSON	0.99+
Asia	LOCATION	0.99+
NYC	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Jim	PERSON	0.99+
Palo Alto	LOCATION	0.99+
30 cores	QUANTITY	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
two problems	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Yaron	PERSON	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
Kafka	TITLE	0.99+
third element	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Dow Jones	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two racks	QUANTITY	0.99+
today	DATE	0.99+
Grab	ORGANIZATION	0.99+
Nuclio	TITLE	0.99+
two key challenges	QUANTITY	0.99+
Cloud Native Foundation	ORGANIZATION	0.99+
about $33 million	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Hadoop	TITLE	0.98+
second type	QUANTITY	0.98+
Lambda	TITLE	0.98+
10 years ago	DATE	0.98+
each cloud	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
Equanix	LOCATION	0.98+
10-year-old	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first generation	QUANTITY	0.98+
one	QUANTITY	0.98+
second generation	QUANTITY	0.98+
Hadoop World	EVENT	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
Nutanix	ORGANIZATION	0.97+
MemSQL	TITLE	0.97+
each one	QUANTITY	0.97+
2010	DATE	0.97+
Kinesis	TITLE	0.97+
SAS	ORGANIZATION	0.96+
Wikibon	ORGANIZATION	0.96+
Chicago Mercantile Exchange	ORGANIZATION	0.96+
about two hours	QUANTITY	0.96+
this week	DATE	0.96+
one thing	QUANTITY	0.95+
dozen	QUANTITY	0.95+

Christian Rodatus, Datameer | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to by SiliconANGLE Media and its ecosystem sponsors. >> Coverage to theCUBE in New York City for Big Data NYC, the hashtag is BigDataNYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data, used to be Hadoop World, our eighth year covering the industry, we've been there from the beginning in 2010, the beginning of this revolution. I'm John Furrier, the co-host, with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rodatus, who is the CEO of Datameer. Datameer, obviously, one of the startups now evolving on the, I think, eighth year or so, roughly seven or eight years old. Great customer base, been successful blocking and tackling, just doing good business. Your shirt says show him the data. Welcome to theCUBE, Christian, appreciate it. >> So well established, I barely think of you as a startup anymore. >> It's kind of true, and actually a couple of months ago, after I took on the job, I met Mike Olson, and Datameer and Cloudera were sort of founded the same year, I believe late 2009, early 2010. Then, he told me there were two open source projects with MapReduce and Hadoop, basically, and Datameer was founded to actually enable customers to do something with it, as an entry platform to help getting data in, create the data and doing something with it. And now, if you walk the show floor, it's a completely different landscape now. >> We've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on as the CEO to kind of take it to the next level. Give us an update on what's going on at Datameer. Obviously, the shirt says "Show me the data." Show me the money kind of play there, I get that. That's where the money is, the data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market, so there's not a lot of tolerance for hype even though there's a lot of AI watching going on. What's going on with you guys? >> I would say, interesting enough I met with a customer, prospective customer, this morning, and this was a very typical organization. So, this is a customer that was an insurance company, and they're just about to spin up their first Hadoop cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. There's 27 open source projects, there's dozens and dozens of other different tools that try to basically, they try best of reach approaches and certain layers of the stack for specific applications, and they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake, like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they used from their old EDW that they still have in production. And so, everybody's now going out there and trying to figure out how to get value out of the data lake for the business users, right? There's a lot of approaches that these companies are trying. There's SQL-on-Hadoop that supposedly doesn't perform properly. There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs, and we believe these are the wrong approaches, so we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from interesting the data into the data lake to creation, preparation of the data, and ultimately, building the data pipelines for the business users, and this is certainly something-- >> Here's more of a play for the business users now, not the data scientists and statistical modelers. I thought the data scientists were your core market. Is that not true? >> So, our primary user base as Datameer used to be like, until last week, we were the data engineers in the companies, or basically the people that built the data lake, that created the data and built these data pipelines for the business user community no matter what tool they were using. >> Jim, I want to get your thoughts on this for Christian's interest. Last year, so these guys can fix your microphone. I think you guys fix the microphone for us, his earpiece there, but I want to get a question to Chris, and I ask to redirect through you. Gartner, another analyst firm. >> Jim: I've heard of 'em. >> Not a big fan personally, but you know. >> Jim: They're still in business? >> The magic quadrant, they use that tool. Anyway, they had a good intro stat. Last year, they predicted through 2017, 60% of big data projects will fail. So, the question for both you guys is did that actually happen? I don't think it did, I'm not hearing that 60% have failed, but we are seeing the struggle around analytics and scaling analytics in a way that's like a dev ops mentality. So, thoughts on this 60% data projects fail. >> I don't know whether it's 60%, there was another statistic that said there's only 14% of Hadoop deployments, or production or something, >> They said 60, six zero. >> Or whatever. >> Define failure, I mean, you've built a data lake, and maybe you're not using it immediately for any particular application. Does that mean you've failed, or does it simply mean you haven't found the killer application yet for it? I don't know, your thoughts. >> I agree with you, it's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right, they build the infrastructure, now it's about the next step data lake 2.0 to figure out how do I get value out of the data, how do I go after the right applications, how do I build a platform and tools that basically promotes the use of that data throughout the business community in a meaningful way. >> Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the latest and greatest. >> Absolutely. I think we were very strong in data creation, data preparation and the entire data governance around it, and we are using, as a user interface, we are using this spreadsheet-like user interface called a workbook, it really looks like Excel, but it's not. It operates at completely different scale. It's basically an Excel spreadsheet on steroids. Our customers built a data pipeline, so this is the data engineers that we discussed before, but we also have a relatively small power user community in our client base that use that spreadsheet for deep data exploration. Now, we are lifting this to the next level, and we put up a visualization layer on top of it that runs natively in the stack, and what you get is basically a visual experience not only in the data curation process but also in deep data exploration, and this is combined with two platform technologies that we use, it's based on highly scalable distributed search in the backend engine of our product, number one. We have also adopted a columnar data store, Parquet, for our file system now. In this combination, the data exploration capabilities we bring to the market will allow power analysts to really dig deep into the data, so there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows, it could be thousands of different attributes and columns that you are looking at, and you will get a response time of sub-second as we create indices on demand as we run this through the analytic process. >> With these fast queries and visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-cluster? >> Yeah, absolutely. We, also there's a second trend that we discussed right before we started the live transmission here. Things are also moving into the cloud, so what we are seeing right now is the EDW's not going away, the on prem is data lake, that prevail, right, and now they are thinking about moving certain workload types into the cloud, and we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together, and it enables business. >> On the trends, we weren't on camera, we'll bring it up here, the impact of cloud to the data world. You've seen this movie before, you have extensive experience in this space going back to the origination, you'd say Teradata. When it was the classic, old-school data warehouse. And then, great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff, and then tried to, it was a hammer that turned into a lawnmower, right? Then they started going down the path, and really, it wasn't workable for what people were looking at, but everyone was still trying to be the Teradata of whatever. Fast forward, so things have evolved and things are starting to shake out, same picture of data warehouse-like stuff, now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's same about now that's, from the old days? What's the similarities of the old-school, and what's different that people are missing? >> I think it's a lot related to cloud, just in general. It is extremely important to fast adoptions throughout the organization, to get performance, and service-level agreements without customers. This is where we clearly can help, and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question to 60% fail, or why is it failing or working. I think there's a lot of really interesting projects out, and our customers are betting big time on the data lake projects whether it being on premise or in the cloud. And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization, and I spoke to one of these-- >> Not 32 data lakes, 32 projects that involve tapping into the data lake. >> 32 projects that involve various data lakes. >> Okay. (chuckling) >> And I spoke to one of the chief data officers there, and they said they are data center infrastructure just by having kick-started these projects will explode. And they're not in the business of operating all the hardware and things like this, and so, a major bank like them, they made an announcement recently, a public announcement, you can read about it, started moving the data assets into the cloud. This is clearly happening at rapid pace, and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up, when you run a compliance report at quota end or something like this, so this will certainly help with adoption and creating business value for our customers. >> We talk about all the time real-time, and there's so many examples of how data science has changed the game. I mean, I was talking about, from a cyber perspective, how data science helped capture Bin Laden to how I can get increased sales to better user experience on devices. Having real-time access to data, and you put in some quick data science around things, really helps things in the edge. What's your view on real-time? Obviously, that's super important, you got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real-time seems to be super hot right now. >> Real-time is a relative term, right, so there's certainly applications like IOT applications, or machine data that you analyze that require real-time access. I would call it right-time, so what's the increment of data load that is required for certain applications? We are certainly not a real-time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch-oriented platform. We can do. >> Which, by the way, is not going away any time soon. It's like super important. >> No, it's not going away at all, right. It can do many batches at relatively frequent increments, which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move this forward. >> What do the customer architectures look like? Because you brought up the good point, we talk about this all the time, batch versus real-time. They're not mutually exclusive, obviously, good architectures would argue that you decouple them, obviously will have a good software elements all through the life cycle of data. >> Through the stack. >> And have the stack, and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them, the problem that you're solving today and the benefits to them? >> Absolutely, so our true value is that there's no breakages in the stack. We enter, and we can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines, and analyzing the data. And all this we do in a highly secure and governed environment, so if you stitch it together, as a customer, the customer this morning asked me, "Whom do you compete with?" I keep getting this question all the time, and we really compete with two things. We compete with build-your-own, which customers still opt to do nowadays, while our things are really point and click and highly automated, and we compete with a combination of different products. You need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process, and this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace, where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free-form type of exploration process. >> Are there more power users now than there were when you started as a company? It seemed like tools like Datameer have brought people into the sort of power user camp, just simply by the virtue of having access to your tool. What are your thoughts there? >> Absolutely, it's definitely growing, and you see also different companies exploiting their capability in different ways. You might find insurance or financial services customers that have a very sophisticated capability building in that area, and you might see 1,000 to 2,000 users that do deep data exploration, and other companies are starting out with a couple of dozen and then evolving it as they go. >> Christian, I got to ask you as the new CEO of Datameer, obviously going to the next level, you guys have been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth in terms of CUBE coverage, we've seen the waves come and go of hype, but now there's not a lot of tolerance for hype. You guys are one of the companies, I will say, that stay to your knitting, you didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value, and so, you didn't overplay your hand, the company didn't really overplay their hand, in my opinion. But now, there's really the hand is value. >> Absolutely. >> As the new CEO, you got to kind of put a little shiny new toy on there, and you know, rub the, keep the car lookin' shiny and everything looking good with cutting edge stuff, the same time scaling up what's been working. The question is what are you doubling down on, and what are you investing in to keep that innovation going? >> There's really three things, and you're very much right, so this has become a mature company. We've grown with our customer base, our enterprise features and capabilities are second to none in the marketplace, this is what our customers achieve, and now, the three investment areas that we are putting together and where we are doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures, we don't believe the customers move their entire stack right into the cloud. There's a few that are going to do this and that are looking into these things, but we will, we believe in the idea that they will still have to EDW their on premise data lake and some workload capabilities in the cloud which will be growing, so this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plug-in earlier in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest. >> It's a directionally correct kind of vector, you're looking and say, it's a good sign, let's kick the tires on that and play around. >> Absolutely. >> 'Cause machine learning's got to learn, too. You got to learn from somewhere. >> And quite frankly, deep learning, machine learning tools for the rest of us, there aren't really all that many for the rest of us power users, they're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? >> That is an area where we will not engage in a way. We will stick with our platform play where we focus on building the data pipelines into those tools. >> Jim: Gotcha. >> In the last area where we invest is ecosystem integration, so we think with our visual explorer backend that is built on search and on a Parquet file format is, or columnar store, is really a key differentiator in feeding or building data pipelines into the incumbent BRE ecosystems and accelerating those as well. We've currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. >> What are some the ecosystem partners do you guys have? I know partnering is a big part of what you guys have done. Can you name a few? >> I mean, the biggest one-- >> Everybody, Switzerland. >> No, not really. We are focused on staying true to our stack and how we can provide value to our customers, so we work actively and very important on our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors throughout that you know about, right, and we definitely have a play also with some of the big SIs and IBM is a more popular one. >> So, BI guys mostly on the tool visualization side. You said you were a pipeline. >> On tool and visualization side, right. We have very effective integration for our data pipelines into the BI tools today we support TD for Tableau, we have a native integration. >> Why compete there, just be a service provider. >> Absolutely, and we have more and better technology come up to even accelerate those tools as well in our big data stuff. >> You're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a Datameer customer or have not yet become a customer, what's the outlook, what's the new Datameer look like under your leadership? What should they expect? >> Yeah, absolutely, so I think they can expect utmost predictability, the way how we roll out the division and how we build our product in the next couple of releases. The next five, six months are critical for us. We have launched Visual Explorer here at the conference. We're going to launch our native cloud solution probably middle of November to the customer base. So, these are the big milestones that will help us for our next fiscal year and provide really great value to our customers, and that's what they can expect, predictability, a very solid product, all the enterprise-grade features they need and require for what they do. And if you look at it, we are really enterprise play, and the customer base that we have is very demanding and challenging, and we want to keep up and deliver a capability that is relevant for them and helps them create values from the data lakes. >> Christian Rodatus, technology enthusiast, passionate, now CEO of Datameer. Great to have you on theCUBE, thanks for sharing. >> Thanks so much. >> And we'll be following your progress. Datameer here inside theCUBE live coverage, hashtag BigDataNYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop, Hadoop World, eight years covering this space. I'm John Furrier with Jim Kobielus here inside theCUBE. More after this short break. >> Christian: Thank you. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to by SiliconANGLE Media and its ecosystem sponsors. I'm John Furrier, the co-host, with Jim Kobielus, So well established, I barely think of you create the data and doing something with it. You've come on as the CEO to kind of and the service level agreements that they used Here's more of a play for the business users now, that created the data and built these data pipelines and I ask to redirect through you. So, the question for both you guys is the killer application yet for it? the next step data lake 2.0 to figure out Okay, so what's going on with you guys and columns that you are looking at, and we understand ourselves as a platform play the impact of cloud to the data world. and that resembles what they were used to tapping into the data lake. and being able to satisfy peak workload requirements and you put in some quick data science around things, or machine data that you analyze Which, by the way, is not going away any time soon. more streaming types of capability as we move this forward. What do the customer architectures look like? and the stack's only going to get more robust. and analyzing the data. just simply by the virtue of having access to your tool. and you see also different companies and so, you didn't overplay your hand, the company and what are you investing in to keep that innovation going? and now, the three investment areas let's kick the tires on that and play around. You got to learn from somewhere. for the rest of us power users, We will stick with our platform play and depth of analytic capability to some of What are some the ecosystem partners do you guys have? and how we can provide value to our customers, on the tool visualization side. into the BI tools today we support TD for Tableau, Absolutely, and we have more and better technology Share with the folks that are a Datameer customer and the customer base that we have is Great to have you on theCUBE, here in conjunction with Strata Data, Christian: Thank you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
HSBC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Christian Rodatus	PERSON	0.99+
Stefan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
60%	QUANTITY	0.99+
2017	DATE	0.99+
Datameer	ORGANIZATION	0.99+
2010	DATE	0.99+
32 projects	QUANTITY	0.99+
Last year	DATE	0.99+
United Kingdom	LOCATION	0.99+
1,000	QUANTITY	0.99+
New York City	LOCATION	0.99+
14%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Excel	TITLE	0.99+
eighth year	QUANTITY	0.99+
late 2009	DATE	0.99+
early 2010	DATE	0.99+
Mike Olson	PERSON	0.99+
60	QUANTITY	0.99+
27 open source projects	QUANTITY	0.99+
last week	DATE	0.99+
thousands	QUANTITY	0.99+
two things	QUANTITY	0.99+
Kafka	TITLE	0.99+
seven	QUANTITY	0.99+
second trend	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
yesterday	DATE	0.99+
Christian	PERSON	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
two open source projects	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
two platform technologies	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
billions of rows	QUANTITY	0.98+
first	QUANTITY	0.98+
MapReduce	ORGANIZATION	0.98+
2,000 users	QUANTITY	0.98+
Bin Laden	PERSON	0.98+
NYC	LOCATION	0.97+
Strata Data	ORGANIZATION	0.97+
32 data lakes	QUANTITY	0.97+
six	QUANTITY	0.97+
Hadoop	TITLE	0.97+
secondly	QUANTITY	0.96+
next fiscal year	DATE	0.96+
three things	QUANTITY	0.96+
today	DATE	0.95+
four different products	QUANTITY	0.95+
Teradata	ORGANIZATION	0.95+
Christian	ORGANIZATION	0.95+
this morning	DATE	0.95+
TD	ORGANIZATION	0.94+
EDW	ORGANIZATION	0.94+
BigData	EVENT	0.92+

Day One Kickoff | BigData NYC 2017

(busy music) >> Announcer: Live from Midtown Manhattan, it's the Cube, covering Big Data New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello, and welcome to the special Cube presentation here in New York City for Big Data NYC, in conjunction with all the activity going on with Strata, Hadoop, Strata Data Conference right around the corner. This is the Cube's special annual event in New York City where we highlight all the trends, technology experts, thought leaders, entrepreneurs here inside the Cube. We have our three days of wall to wall coverage, evening event on Wednesday. I'm John Furrier, the co-host of the Cube, with Jim Kobielus, and Peter Burris will be here all week as well. Kicking off day one, Jim, the monster week of Big Data NYC, which now has turned into, essentially, the big data industry is a huge industry. But now, subsumed within a larger industry of AI, IoT, security. A lot of things have just sucked up the big data world that used to be the Hadoop world, and it just kept on disrupting, and creative disruption of the old guard data warehouse market, which now, looks pale in comparison to the disruption going on right now. >> The data warehouse market is very much vibrant and alive, as is the big data market continuing to innovate. But the innovations, John, have moved up the stack to artificial intelligence and deep learning, as you've indicated, driving more of the Edge applications in the new generation of mobile and smart appliances and things that are coming along like smart, self-driving vehicles and so forth. What we see is data professionals and developers are moving towards new frameworks, like TensorFlow and so forth, for development of the truly disruptive applications. But big data is the foundation. >> I mean, the developers are the key, obviously, open source is growing at an enormous rate. We just had the Linux Foundation, we now have the Open Source Summit, they have kind of rebranded that. They're going to see explosion from code from 64 million lines of code to billions of lines of code, exponential growth. But the bigger picture is that it's not just developers, it's the enterprises now who want hybrid cloud, they want cloud technology. I want to get your reaction to a couple of different threads. One is the notion of community based software, which is open source, extending into the enterprise. We're seeing things like blockchain is hot right now, security, two emerging areas that are overlapping in with big data. You obviously have classic data market, and then you've got AI. All these things kind of come in together, kind of just really putting at the center of all that, this core industry around community and software AI, particular. It's not just about machine learning anymore and data, it's a bigger picture. >> Yeah, in terms of a community, development with open source, much of what we see in the AI arena, for example, with the up and coming, they're all open source tools. There's TensorFlow, there's Cafe, there's Theano and so forth. What we're seeing is not just the frameworks for developing AI that are important, but the entire ecosystem of community based development of capabilities to automate the acquisition of training data, which is so critically important for tuning AI, for its designated purpose, be it doing predictions and abstractions. DevOps, what are coming into being are DevOps frameworks to span the entire life cycle of the creation and the training and deployment and iteration of AI. What we're going to see is, like at the last Spark Summit, there was a very interesting discussion from a Stanford researcher, new open source tools that they're developing out in, actually, in Berkeley, I understand, for, related to development of training data in a more automated fashion for these new challenges. The communities are evolving up the stack to address these requirements with fairly bleeding edge capabilities that will come in the next few years into the mainstream. >> I had a chat with a big time CTO last night, he worked at some of the big web scale company, I won't say the name, give it away. But basically, he asked me a question about IoT, how real is it, and obviously, it's hyped up big time, though. But the issue in all this new markets like IoT and AI is the role of security, because a lot of enterprises are looking at the IoT, certainly in the industrial side has the most relevant low hanging fruit, but at the end of the day, the data modeling, as you're pointing out, becomes a critical thing. Connecting IoT devices to, say, an IP network sounds trivial in concept, but at the end of the day, the surface area for security is oak expose, that's causing people to stop what they're doing, not deploying it as fast. You're seeing kind of like people retrenching and replatforming at the core data centers, and then leveraging a lot of cloud, which is why Azure is hot, Microsoft Ignite Event is pretty hot this week. Role of cloud, role of data in IoT. Is IoT kind of stalled in your mind? Or is it bloating? >> I wouldn't say it's stalled or that it's bloating, but IoT is definitely coming along as the new development focus. For the more disruptive applications that can derive more intelligence directly to the end points that can take varying degrees of automated action to achieve results, but also to very much drive decision support in real time to people on their mobiles or in whatever. What I'm getting at is that IoT is definitely a reality in the real world in terms of our lives. It's definitely a reality in terms of the index generation of data applications. But there's a lot of the back end in terms of readying algorithms and in training data for deployment of really high quality IoT applications, Edge applications, that hasn't come together yet in any coherent practice. >> It's emerging, it's emerging. >> It's emerging. >> It's a lot more work to do. OK, we're going to kick off day one, we've got some great guests, we see Rob Bearden in the house, Rob Thomas from IBM. >> Rob Bearden from Hortonworks. >> Rob Bearden from Hortonworks, and Rob Thomas from IBM. I want to bring up, Rob wrote a book just recently. He wrote Big Data Revolution, but he also wrote a new book called, Every Company is a Tech Company. But he mentions, he kind of teases out this concept of a renaissance, so I want to get your thoughts on this. If you look at Strata, Hadoop, Strata Data, the O'Reilly Conference, which has turned into like a marketing machine, right. A lot of hype there. But as the community model grows up, you're starting to see a renaissance of real creative developers, you're starting to see, not just open source, pure, full stack developers doing all the heavy lifting, but real creative competition, in a renaissance, that's really the key. You're seeing a lot more developer action, tons outside of the, what was classically called the data space. The role of data and how it relates to the developer phenomenon that's going on right now. >> Yeah, it's the maker culture. Rob, in fact, about a year or more ago, IBM, at one of their events, they held a very maker oriented event, I think they called it Datapalooza at one point. What it's looking at, what's going on is it's more than just classic software developers are coming to the fore. When you're looking at IoT or Edge applications, it's hardware developers, it's UX developers, it's developers and designers who are trying to change and drive data driven applications into changing the very fabric of how things are done in the real world. What Peter Burris, we had a wiki about him called Programming in the Real World. What that all involves is there's a new set of skill sets that are coming together to develop these applications. It's well beyond just simply software development, it's well beyond simply data scientists. Maker culture. >> Programming in the real world is a great concept, because you need real time, which comes back down to this. I'm looking for this week from the guests we talked to, what their view is of the data market right now. Because if you want to get real time, you've got to move from that batch world to the real time world. I'm not saying batch is over, you've still got to store data, and that's growing at an exponential rate as well. But real time data, how do you use data in real time, how do the modelings work, how do you scale that. How do you take a DevOps culture to the data world is what I'm looking for. What are you looking for this week? >> What I'm looking for this week, I'm looking for DevOps solutions or platforms or environments for teams of data scientists who are building and training and deploying and evaluating, iterating deep learning and machine learning and natural language processing applications in a continuous release pipeline, and productionizing them. At Wikibon, we are going deeper in that whole notion of DevOps for data science. I mean, IBM's called it inside ops, others call it data ops. What we're seeing across the board is that more and more of our customers are focusing on how do we bring it all together, so the maker culture. >> Operationalizing it. >> Operationalizing it, so that the maker cultures that they have inside their value chain can come together and there's a standard pattern workflow of putting this stuff out and productionizing it, AI productionized in the real world. >> Moving in from the proof of concept notion to actually just getting things done, putting it out in the network, and then bringing it to the masses with operational support. >> Right, like the good folks at IBM with Watson data platform, on some levels, is a DevOPs for data science platform, but it's a collaborative environment. That's what I'm looking to see, and there's a lot of other solution providers who are going down that road. >> I mean, to me, if people have the community traction, that is the new benchmark, in my opinion. You heard it here on the Cube. Community continues to scale, you can start seeing it moving out of open source, you're seeing things like blockchain, you're seeing a decentralized Internet now happening everywhere, not just distributed but decentralized. When you have decentralization, community and software really shine. It's the Cube here in New York City all week. Stay with us for wall to wall coverage through Thursday here in New York City for Big Data NYC, in conjunction with Strata Data, this is the Cube, we'll be back with more coverage after this short break. (busy music) (serious electronic music) (peaceful music) >> Hi, I'm John Furrier, the Co-founder of SiliconANGLE Media, and Co-host of the Cube. I've been in the tech business since I was 19, first programming on mini computers in a large enterprise, and then worked at IBM and Hewlett Packard, a total of nine years in the enterprise, various jobs from programming, training, consulting, and ultimately, as an executive sales person, and then started my first company in 1997, and moved to Silicon Valley in 1999. I've been here ever since. I've always loved technology, and I love covering emerging technology. I was trained as a software developer and love business. I love the impact of software and technology to business. To me, creating technology that starts a company and creates value and jobs is probably one of the most rewarding things I've ever been involved in. I bring that energy to the Cube, because the Cube is where all the ideas are, and where the experts are, where the people are. I think what's most exciting about the Cube is that we get to talk to people who are making things happen, entrepreneurs, CEO of companies, venture capitalists, people who are really, on a day in and day out basis, building great companies. In the technology business, there's just not a lot real time live TV coverage, and the Cube is a non-linear TV operation. We do everything that the TV guys on cable don't do. We do longer interviews, we ask tougher questions. We ask, sometimes, some light questions. We talk about the person and what they feel about. It's not prompted and scripted, it's a conversation, it's authentic. For shows that have the Cube coverage, it makes the show buzz, it creates excitement. More importantly, it creates great content, great digital assets that can be shared instantaneously to the world. Over 31 million people have viewed the Cube, and that is the result of great content, great conversations. I'm so proud to be part of the Cube with a great team. Hi, I'm John Furrier, thanks for watching the Cube. >> Announcer: Coming up on the Cube, Tekan Sundar, CTO of Wine Disco. Live Cube coverage from Big Data NYC 2017 continues in a moment. >> Announcer: Coming up on the Cube, Donna Prlich, Chief Product Officer at Pentaho. Live Cube coverage from Big Data New York City 2017 continues in a moment. >> Announcer: Coming up on the Cube, Amit Walia, Executive Vice President and Chief Product Officer at Informatica. Live Cube coverage from Big Data New York City continues in a moment. >> Announcer: Coming up on the Cube, Prakash Nodili, Co-founder and CEO of Pexif. Live Cube coverage from Big Data New York City continues in a moment. (serious electronic music)

Published Date : Sep 27 2017

SUMMARY :

it's the Cube, covering Big Data New York City 2017, and creative disruption of the old guard as is the big data market continuing to innovate. kind of just really putting at the center of all that, and the training and deployment and iteration of AI. and replatforming at the core data centers, in the real world in terms of our lives. It's a lot more work to do. in a renaissance, that's really the key. in the real world. Programming in the real world is a great concept, so the maker culture. Operationalizing it, so that the maker cultures Moving in from the proof of concept notion Right, like the good folks at IBM that is the new benchmark, in my opinion. and that is the result of great content, continues in a moment. continues in a moment. continues in a moment. Prakash Nodili, Co-founder and CEO of Pexif.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Donna Prlich	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Amit Walia	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Peter Burris	PERSON	0.99+
Prakash Nodili	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Jim	PERSON	0.99+
1997	DATE	0.99+
Berkeley	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
1999	DATE	0.99+
Hewlett Packard	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Thursday	DATE	0.99+
New York City	LOCATION	0.99+
John	PERSON	0.99+
nine years	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Wednesday	DATE	0.99+
Rob	PERSON	0.99+
Pexif	ORGANIZATION	0.99+
Tekan Sundar	PERSON	0.99+
Linux Foundation	ORGANIZATION	0.99+
first company	QUANTITY	0.99+
first	QUANTITY	0.99+
three days	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
Datapalooza	EVENT	0.99+
64 million lines	QUANTITY	0.98+
NYC	LOCATION	0.98+
Midtown Manhattan	LOCATION	0.98+
Big Data	ORGANIZATION	0.98+
19	QUANTITY	0.98+
this week	DATE	0.97+
Over 31 million people	QUANTITY	0.97+
Spark Summit	EVENT	0.97+
last night	DATE	0.97+
Open Source Summit	EVENT	0.97+
Strata	EVENT	0.96+
One	QUANTITY	0.96+
Programming in the Real World	TITLE	0.96+
Big Data	EVENT	0.96+
Informatica	ORGANIZATION	0.96+
day one	QUANTITY	0.96+
Strata Data	ORGANIZATION	0.95+
two emerging areas	QUANTITY	0.95+
billions of lines	QUANTITY	0.93+
Microsoft	ORGANIZATION	0.93+
TensorFlow	TITLE	0.92+
Strata Data Conference	EVENT	0.92+
Day One	QUANTITY	0.92+
Live Cube	COMMERCIAL_ITEM	0.92+
Cube	ORGANIZATION	0.91+
Every Company is a Tech Company	TITLE	0.9+
Azure	TITLE	0.9+
about a year or more ago	DATE	0.9+
Cube	COMMERCIAL_ITEM	0.9+
2017	EVENT	0.89+
Wine Disco	ORGANIZATION	0.89+
Big Data Revolution	TITLE	0.88+
Strata	ORGANIZATION	0.88+
Theano	TITLE	0.88+
Watson	ORGANIZATION	0.85+
DevOps	TITLE	0.84+
Ignite Event	EVENT	0.84+

Jeff Veis, Actian | BigData NYC 2017

>> Live from Midtown Manhattan, it's the Cube. Covering big data, New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone, live here in New York City it's the Cube special annual presentation of BIGDATA NYC. This is our annual event in New York City where we talk to all the fall leaders and experts, CEOs, entrepreneurs and anyone making shaping the agenda with the Cube. In conjunction with STRATA DATA which was formally called STRATA HEDUP. HEDUP world, the Cube's NYC event. BIGDATA I want to see you separate from that when we're here. Which of these, who's the chief marketing acting of Cube alumni. Formerly with HPE, been on many times. Good to see you. >> Good to see you. >> Well you're a marketing genius we've talked before at HPE. You got so much experience in data and analytics, you've seen the swath of spectrum across the board from classic. I call classic enterprise to cutting edge. To now full on cloud, AI, machine learning, IOT. Lot of stuff going on, on premise seems to be hot still. There's so much going on from the large enterprises dealing with how to better use your analytics. At Acting you're heading up to marketing, what's the positioning? What're you doing there? >> Well the shift that we see and what's unique about Acting. Which has just a very differentiated and robust portfolio is the shift to what we refer to as hybrid data. And it's a shift that people aren't talking about, most of the competition here. They have that next best mouse trap, that one thing. So it's either move your database to the cloud or buy this appliance or move to this piece of open source. And it's not that they don't have interesting technologies but I think they're missing the key point. Which is never before have we seen the creation side of data and the consumption of data becoming more diverse, more dynamic. >> And more in demand too, people want both sides. Before we go any deeper I just want you to take a minute to define what is hybrid data actually mean. What does that term mean for the people that want to understand this term deeper. >> Well it's understanding that it's not just the location of it. Of course there's hybrid computing which is premised in cloud. And that's an important part of it. But there's also about where and how is that data created. What time domain is that data going to be consumed and used and that's so important. A lot of analytics, a lot of the guys across the street are kind of thinking about reporting in analytics and that old world way of. We collect lots of data and then we deliver analytics. But increasingly analytics is being used almost in real time or near real time. Because people are doing things with the data in the moment. Then another dimension of it is AdHawk discovery. Where you can have not one or two or three data scientists but dozens if not hundreds of people. All with copies of Tableau and Click attacking and hitting that data. And of course it's not one data source but multiple as they find adjacencies with data. A lot of the data may be outside of the four walls. So when you look at consumption ad creation of data the net net is you need not one solution but a collection of best fits. >> So a hybrid between consumption and creation so that's the two hybrids. I mean hybrid implies, you know little bit of this little bit of that. >> That's the bridge that you need to be able to cross. Which is where do I get that data? And then where's that data going? >> Great so lets get into Acting. Give us the update, obviously Acting has got a huge portfolio. We've covered you guys know best. Been on the Cube many times. They've cobbled together all these solutions that can be very affective for customers. Take us through the value proposition that this hybrid data enables with Acting. >> Well if you decompose it from our view point there's three pillars. That you kind of needed since the test of time in one sense. They're critical, which is the ability to manage the data. The ability to connect the data. In the old days we said integrate but now I think basically all apps, all kind of data sources are connected in some sense. Sometimes very temporal. And then finally the analytics. So you need those three pillars and you need to be able to orchestrate across them. And what we have is a collection of solutions that span that. They can do transactional data, they can do graph data and object oriented data. Today we're announcing a new generation of our analytics, specifically on HEDUP. And that's Vector H. Love to be able to talk to that today with the native spark integration. >> Lets get into the news. Hard news here at BIGDATA NYC is you guys announced the latest support for Apachi Spark so with Vector H. So Acting Vector in HEDUP, hence the H. What is it? >> Is Spark glue for hybrid data environments or is it something you layer over different underlying databases? >> Well I think it's fair to say it is becoming the glue. In fact we had a previous technology that did a humans job at doing some of the work. Now that we spark and that community. The thing though is if you wanted to take advantage of spark it was kind of like the old days of HEDUP. Assembly was required and that is increasingly not what organizations are looking for. They want to adopt the technology but they want to use it and get on with their day job. What we have done... >> Machine learning, putting algorithms in place, managing software. >> It could be very exonerate things such as predictive machines learning. Next generation AI. But for everyone of those there's an easy a dozen if not a hundred uses of being able to reach and extract data in their native formats. Be able to grab a Parke file and without any transformation being analyze it. Or being able to talk to an application and being able to interface with that. With being able to do reads and writes with zero penalty. So the asset compliance component of databases is critical and a lot of the traditional HEDUP approaches, pretty much read only vehicles. And that meant they were limited on the use cases they could use it. >> Lets talk about the hard news. What specifically was announced? >> Well we have a technology called Vector. Vector does run, just to establish the baseline here. It runs single node, Windows, Linux, and there's a community edition. So your users can download and use that right now. We have Vector H which was designed for scale out for HEDUP and it takes advantage of Yarn. And allows you to scale out across your HEDUP cluster petabytes if you like. What we've added to that solution is now native spark integration and that native spark integration gives you three key things. Number one, zero penalty for real time updates. We're the only ones to the best of our knowledge that can do that. In other words you can update the data and you will not slow down your analytics performance. Every other HEDUP based analytic tool has to, if you will stop the clock. Fresh out the new data to be able to do updates. Because of our architecture and our deep knowledge with transactional processing you don't slow down. That means you can always be assured you'll have fresh data running. The second thing is spark powered direct query access. So we can get at not just Vector formats we have an optimized data format. Which it is the fastest as you'd find in analytic databases but what's so important is you can hit, ORC, Parke and other data file formats through spark and without any transformation. Be it to ingest and analyze an information. The third one and certainly not the least is something that I think you're going to be talking a lot more about. Which is native spark data frame support. Data frames. >> What's the impact of that? >> Well data frames will allow you to be able to talk to spark SQL, spark R based applications. So now that you're not just going to the data you're going to other applications. And that means that you're able to interface directly to the system of record applications that are running. Using this lingua franca of data frames that now has hit a maturity point where you're seeing pretty broad adoption. And by doing native integration with that we've just simplified the ability to connect directly to dozens of enterprise applications and get the information you need. >> Jeff would you be describing what you're offering now. As a form of data, sort of a data virtualization layer that sits in front of all these back end databases. But uses data frames from spark or am I misconstruing. >> Well it's a little less a virtualization layer as maybe a super highway. That we're able to say this analytics tool... You know in the old days it was one of two things. Either you had to do a formal traditional integration and transform that data right so? You had to go from French to German, once it was in German you could read it. Or what you had to do was you had to be able to query and bring in that information. But you had to be able to slow down your performance because that transformation had not occurred. Now what we're able to use is use this park native connector. So you can have the best of both worlds and if you will, it is creating an abstraction layer but it's really for connectivity as opposed to an overall one. What we're not doing is virtualizing the data. That's the key point, there are some people that are pushing data cataloging and cleansing products and abstracting the entire data from you. You're still aware of where the native format is, you're still able to write to it with zero penalty. And that's critical for performance. When you start to build lots of abstraction layers truly traditional ones. You simplify some things but usually you pay a performance penalty. And just to make a point, in the benchmarks we're running compared to Hive and Polor for example. We're used cases against Vector H may take nearly two hours we can do it in less than two minutes. And we've been able to uphold that for over a year. That is because Vector in its core technology has calmer capabilities and, this is a mouthful. But multi level in memory capability. And what does that mean? You ask. >> I was going to ask but keep going. >> I can imagine the performance latency is probably great. I mean you have in memory that everyone kind of wants. >> Well a lot of in memory where it is you used is just held at the RAM level. And it's the ability to breed data in RAM and take advantage of it. And we do that and of course that's a positive but we go down to the cash level. We get down much much lower because we would rather that data be in the CPU if at all possible. And with these high performance cores it's quite possible. So we have some tricks that are special and unique to Vector so that we actually optimize the in memory capability. The other last thing we do is you know HEDUP and HTFS is not particularly smart about where it places the data. And the last thing you want is your data rolling across lots of different data nodes. That just kills performance. What we're able to do is think about the core location of the data. Look at the jobs and look at the performance and we're able to squeeze optimization in there. And that's how we're able to get 50, 100 sometimes an excess of 500 times faster than some of the other well known SQL and HEDUP performances. So that combined now with this spark integration this native spark integration. Means people don't have to do the plumbing they can get out of the basement and up to the first floor. They can take care of, advantage of open source innovation yet get what we're claiming is the fastest HEDUP analytics database in HEDUP. >> So, I got to ask you. I mean you've been, and I mentioned on the intro, industry veteran. CMO, chief marketing officer. I mean challenging with Acting cause there's so many things to focus on. How are you attacking the marketing of Acting because you have a portfolio that hybrid data is a good position. I like that how you bring that to the forefront kind of give it a simple positioning. But as you look at Acting's value proposition and engage you customer base and potentially prospective customers. How are you iterating the marketing message the position and engaging with clients? >> Well it's a fair question and it is daunting when you have multiple products. And you got to have a simple compelling message, less is more to get signal above noise today. At least that's how I feel. So we're hanging our hats on hybrid data. And we're going to take it to the moon or go down with the ship on that. But we've been getting some pretty good feedback. >> What's been the hit one feedback on the hybrid data because, I'm a big fan of hybrid cloud but I've been saying it's a methodology it's not a product. On premise cloud is growing and so is public so hybrid hangs together in the cloud thing. So with data, you're bridging two worlds. Consumption and creation. >> Well what's interesting when you say hybrid data. People put their own definitions around it. In an unaided way and they say you know with all the technology and all the trends, that's actually at the end of the day nets out my situation. I do have data that's hybrid data and it's becoming increasingly more hybrid. And god knows the people that are demanding wanting to use it aren't using it or doing it. And the last thing I need, and I'm really convinced of this. Is a lot of people talk about platforms we love to use the P word. Nobody buys a platform because people are trying to address their use cases. But they don't wat to do it in this siloed kind of brick wall way where I address one use case but it won't function elsewhere. What are they looking for is a collection of best fits solutions that can cooperate together. The secret source for us is we have a cloud control plane. All our technologies, whether it's on premise or in the cloud touch that. And it allows us to orchestrate and do things together. Sometimes it's very intimate and sometimes it's broader. >> Or what exactly is the control plane? >> It does everything from administration, it can do down to billing and it can also be scheduling transactional performance. Now on one extreme we use it for a back up recovery for our transactional database. And we have a cloud based back up recovery service and it all gets administered through the control plane. So it knows exactly when it's appropriate to backup because it understands that database and it takes care of it. It was relatively simple for us to create. On the more intimate sense we were the first company and it was called Acting X which I know we were talking before. We named our product after X before our friends at Apple did. So I like to think we were pioneers. >> San Francisco had the iPhone don't get confused there remember. >> I got to give credit where credit's due. >> And give it up. >> But what Acting X is, and we announced it back in April. Is it takes the same vector technology I just talked about. So it's material and we combined it with our integrated transactional database. Which has over 10,000 users around the world. And what we did is we dropped in this high performance calmer database for free. I'm going to say that again, for free in our transactional part from system. So everyone one of our customers, soon as they upgraded to now Acting X. Got a rocket ship of a calmer high performance database inside their transactional database. The data is fresh, it moves over into the calmer format. And the reporting takes off. >> Jeff to end this statement I'll give you the last word. A lot of people look at Acting also a product I mentioned earlier. Is it product leadership that's winning, is it the values of the customer? Where is Acting and winning for the folks that aren't yet customers that you'd like to talk to. What is the Acting success formula? What's the differentiation, where is it, where does it jump off the page? Is it the product, is it the delivery? Where's the action. >> Is it innovation? >> Well let me tell you about, I would answer with two phrases. First is our tag line, our tag line is "activate your data". And that resonated with a lot of people. A lot of people have a lot of data and we've been in this big data era where people talked about the size of their data. Literally I have 5 petabytes you have 6 petabytes. I think people realized that kind of missed the entire picture. Sometimes smaller data, god forbid 1 terabyte can be amazingly powerful depending on the use case. So it's obviously more than size what it is about is activating it. Are you actually using that data so it's making a meaningful difference. And you're not putting it in a data pond, puddle or lake to be used someday like you're storing it in an attic. There's a lot of data getting dusty in attics today because it is not being activated. And that would bring me to the, not the tag line but what I think what's driving us and why customers are considering us. They see we are about the technology of the future but we're very much about innovation that actually works. Because of our heritage, because we have companies that understand for over 20 years how to run on data. We get what acid compliance is, we get what transactional systems are. We get that you need to be able to not just read but write data. And we bring the methodology to our innovation and so for people, companies, animals, any form of life. That is interested in. >> So it's the product platform that activates and then the result is how you guys roll with customers. >> In the real world today where you can have real concurrency, real enterprise, great performance. Along with the innovation. >> And the hybrid gives them some flexibility that's the new tag line, that's the kind of main. I understand you currently hybrid data means basically flexibility for the customer. >> Yeah it's use the data you need for what you use it for and have the systems work for you. Rather than you work for the systems. >> Okay check out Acting, Jeff Viece friend of the Cube, alumni now. The CMO at Acting, we following your progress so congratulations on the new opportunity. More Cube coverage after this strip break. I'm John Furrier, James Kobielus here inside the Cube in New York City for our BIGDATA NYC event all week. In conjunction with STRATA Data right next door we'll be right back. (tech music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media and anyone making shaping the agenda There's so much going on from the large enterprises is the shift to what we refer to as hybrid data. What does that term mean for the people that the net net is you need not one solution so that's the two hybrids. That's the bridge that you need to be able to cross. Been on the Cube many times. and you need to be able to orchestrate across them. So Acting Vector in HEDUP, hence the H. it is becoming the glue. and being able to interface with that. Lets talk about the hard news. and you will not slow down your analytics performance. and get the information you need. Jeff would you be describing and abstracting the entire data from you. I can imagine the performance latency And the last thing you want is your data rolling across I like that how you bring that to the forefront and it is daunting when you have multiple products. on the hybrid data because, and they say you know with all the technology So I like to think we were pioneers. San Francisco had the iPhone And the reporting takes off. is it the values of the customer? We get that you need to be able to not just read and then the result is how you guys roll with customers. where you can have real concurrency, And the hybrid gives them some flexibility and have the systems work for you. Jeff Viece friend of the Cube, alumni now.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Jeff Viece	PERSON	0.99+
Jeff Veis	PERSON	0.99+
John Furrier	PERSON	0.99+
April	DATE	0.99+
Jeff	PERSON	0.99+
New York City	LOCATION	0.99+
6 petabytes	QUANTITY	0.99+
two	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
one	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
5 petabytes	QUANTITY	0.99+
dozens	QUANTITY	0.99+
less than two minutes	QUANTITY	0.99+
50	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
First	QUANTITY	0.99+
STRATA Data	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
1 terabyte	QUANTITY	0.99+
two phrases	QUANTITY	0.99+
first floor	QUANTITY	0.99+
over 20 years	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Vector	ORGANIZATION	0.99+
Linux	TITLE	0.99+
both sides	QUANTITY	0.99+
one sense	QUANTITY	0.99+
Acting X	TITLE	0.99+
San Francisco	LOCATION	0.98+
over a year	QUANTITY	0.98+
Windows	TITLE	0.98+
Cube	ORGANIZATION	0.98+
third one	QUANTITY	0.98+
Today	DATE	0.98+
500 times	QUANTITY	0.98+
today	DATE	0.98+
NYC	LOCATION	0.98+
over 10,000 users	QUANTITY	0.98+
three data scientists	QUANTITY	0.98+
two worlds	QUANTITY	0.98+
three pillars	QUANTITY	0.98+
hundreds of people	QUANTITY	0.98+
Tableau	TITLE	0.97+
second thing	QUANTITY	0.97+
STRATA HEDUP	EVENT	0.97+
two hours	QUANTITY	0.97+
both worlds	QUANTITY	0.96+
HEDUP	ORGANIZATION	0.96+
SQL	TITLE	0.96+
two things	QUANTITY	0.96+
a dozen	QUANTITY	0.95+
one data source	QUANTITY	0.95+
first company	QUANTITY	0.95+
one solution	QUANTITY	0.94+
100	QUANTITY	0.93+
BIGDATA	ORGANIZATION	0.91+
two hybrids	QUANTITY	0.9+
BIGDATA	EVENT	0.9+
STRATA DATA	ORGANIZATION	0.9+
2017	DATE	0.89+
Vector H	TITLE	0.88+
spark	ORGANIZATION	0.88+
HEDUP	TITLE	0.88+
German	LOCATION	0.87+
one extreme	QUANTITY	0.86+
four walls	QUANTITY	0.86+
dozens of enterprise applications	QUANTITY	0.85+
single	QUANTITY	0.84+
Acting X.	TITLE	0.82+
three key things	QUANTITY	0.8+

Jacque Istok, Pivotal | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube, covering big data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. >> Welcome back everyone, we're here live in New York City for the week, three days of wall to wall coverage of big data NYC, it's big data week here in conjunction with Strata Adup, Strata Data which is an event running right around the corner, this is the Cube, I'm John Furrier with my cohost, Peter Burris, our next guest Jacque Istok who's the head of data at Pivotal. Welcome to the Cube, good to see you again. >> Likewise. >> You guys had big news we covered at VMware, obviously the Kubernetes craze is fantastic, you're starting to see cloud native platforms front and center even in some of these operational worlds like in cloud, data you guys have been here a while with Green Plum and Pivotal's been adding more to the data suite, so you guys are a player in this ecosystem. >> Correct. >> As it grows to be much more developer-centric and enterprise-centric and AI-centric, what's the update? >> I'd like to talk about a couple things, just three quick things here, one focused primarily on simplicity, first and foremost as you said, there's a lot of things going on on the cloud foundry side, a lot of things that we're doing with Kubernetes, etc., super exciting. I will say Tony Berge has written a nice piece about Green Plum in Zitinet, essentially calling Green Plum the best kept secret in the analytic database world. Why I think that's important is, what isn't really well known is that over the period of Pivotal's history, the last four and a half years, we focused really heavily on the cloud foundry side, on dev/ops, on getting users to actually be able to publish code. What we haven't talked about as much is what we're doing on the data side and I find it very interesting to repeatedly tell analysts and customers that the Green Plum business has been and continues to be a profitable business unit within Pivotal, so as we're growing on the cloud foundry side, we're continuing to grow a business that many of the organizations that I see here at Strata are still looking to get to, that ever forgotten profitability zone. >> There's a legacy around Green Plum, I'm not going to say they pivoted, pun intended, Pivotal. There's been added stuff around Green Plum, Green Plum might get lost in the messaging because it's been now one of many ingredients, right? >> It's true and when we formed Pivotal, I think there were 34 some different skews that we have now focused in on over the last two years or so. What's super exciting is again, over that time period, one of the things that we took to heart within the Green Plum side is this idea of extreme agile. As you guys know, Pivotal Labs being the core part of the Pivotal mission helps our customers figure out how to actually build software. We finally are drinking our own champagne and over the last year and a half of Green Plum R&D, we're shipping code, a complete data platform, we're shipping that on a cadence of about four to five weeks which again, a little bit unheard of in the industry, being able to move at that pace. We work through the backlog and what is also super exciting and I'm glad that you guys are able to help me tell the world, we released version five last week. Version five is actually the only parallel open source data platform that actually has native ANSI compliance SQL and I feel a little bit like I've rewound the clock 15 years where I have to actually throw in the ANSI compliance, but I think that in a lot of ways, there are SQL alternatives that are out there in the world. They are very much not ANSI compliant and that hurts. >> It's a nuance but it's table stakes in the enterprise. ANSI compliance is just, >> There's a reason you want to be ANSI compliant, because there's a whole swath of analytic applications mainly in the data warehouse world, that were built using ANSI compliant SQL, so why do this with version five? I presume it's got to have something to do with you want to start capturing some of those applications and helping customers modernize. >> That is correct. I think the SQL piece is one part of the data platform, of really a modern data platform. The other parts are again, becoming table stakes. Being able to do text analytics, we've backed Apache Solar within Green Plum, being able to do graph analytics or spatial analytics, anything from classifications, regressions, all of that, actually becomes table stakes and we feel that enterprises have suffered a little bit over the last five or six years. They've had this promise of having a new platform that they can leverage for doing interesting new things, machine learning, AI, etc. but the existing stuff that they were trying to do has been super, super hard. What we're trying to do is bridge those together and provide both in the same platform, out of the gate so that customers can actually use it immediately and I think one of the things we've seen is there's about 1000 to one SQL experienced individuals within the enterprise versus say Haduk experience in individuals. The other thing that I think is actually super important and almost bigger than everything else I talked about is we're the, a lot of the old school postgres deriviants of MBD databases forked their databases at some point in postgres's history, for a variety of reasons from licensing to when they started. Green Plum's no different. We forked right around eight dot too with this last release of version five, we've actually up leveled the postgres base within Green Plum's 8.3. Now in and of itself, it doesn't sound, >> What does that mean? >> We are now taking a 100% commitment both to open source and both to the postgres community. I think if you look at postgres today, in its latest versions, it is a full fledged, mission critical database that can be used anywhere. What we feel is that if we can bring our core engineering developments around parallelism, around analytics and combine that with postgres itself, then we don't have to implement all of the low level database things that a lot of our competitors have to do. What's unique about it is one, Green Plum continues to be open source, which again most of our competitors are not, two if you look at primarily what they're doing, nobody's got that level of commitment to the postgres community which means all of their resources are going to be stuck building core database technology, even building that ANSI SQL compliance in, which we'll get "for free" which will let us focus on things like machine learning, artificial intelligence. >> Just give a quick second and tell about the relevance of postgres because of the success, first of all it's massive, it's everywhere, but it's not going anywhere. Just give a quick, for the audience watching, what's the relevance of it. >> Sure like you said, it is everywhere. It is the most full featured, actual database in the open source community. Arguably my SQL has "more" market share, but my SQL projects that generally leverage them are not used for mission critical enterprise applications. Being able to have parity allows us not only to have that database technology baked into Green Plum, but it also gives us all of the community stuff with it. Everything from being able to leverage the most recent ODBC and JDBC libraries, but also integrations into everything from the post GIS travert for geospatial to being able to connect to other types of data sources, etc. >> It's a big community, shows that it's successful, but again, >> And it doesn't come in a red box. >> It does not come in a red box, that is correct. >> Which is not a bad thing. Look, postgres as a technology was developed a long time ago, largely in response to think about analytics and transaction, or analytics and operating applications might have actually come to and we're now living in a world where we can actually see the hardware and a lot of practices, etc. are beginning to find ways where this may start to happen. With Green Plum and postgres both MPP based, so your, by going to this, you're able to stay more modern, more up to date on all the new technology that's coming together to support these richer, more complex classes of applications. >> You're spot on, I suppose I would argue that postgres, I feel came up with as a response to Oracle in the past of, we need an open source alternative to Oracle, but other than that, 100% correct. >> There was always a difference between postgres and MySQL, MySQL always was okay, that's that, let's do that open source, postgres coming out of Berkeley and coming out of some other places, always had a slightly different notion of the types of problems it was going to take on. >> 100% correct, 100%. But to your question before, what does this all mean to customers, I think the one thing that version five really gives us the confidence to say is, and a lot of times I hate lobbing when the ball's out like this, but we welcome and embrace with open arms any terradata customers out there that are looking to save millions if not tens of millions of dollars on a modern platform that can actually run not only on premise, not only on bare metal, but virtually and off premise. We're truly the only MPP platform, the only open source MPP data platform that can allow you to build analytics and move those analytics from Amazon to Azure to back on prem. >> Talk about this, the terradata thing for a second, I want to get down and double click on that. Customers don't want to change code, so what specifically are you guys offering terradata customers specifically. With the release of version five, with a lot of the development that we've done and some of the partnering that we've done, we are now able to take without changing a line of code of your terradata applications, you load the data within the Green Plum platform, you can point those applications directly to Green Plum and run them unchanged, so I think in the past, the reticence to move to any other platform was really the amount of time it would take to actually redevelop all of the stuff that you had. We offer an ability to go from an immediate ROI to a platform that again, bridges that gap, allows you to really be modern. >> Peter, I want to talk to you about that importance that we just said because you've been studying the private cloud report, true private cloud which is on premises, coming from a cloud operating model, automating away undifferentiated labor and shipping that to differentiated labor, but this brings up what customers want in hybrid cloud and ultimately having public cloud and private cloud so hybrid sits there. They don't want to change their code basis, this is a huge deal. >> Obviously a couple things to go along with what Jacque said. The first thing is that you're right, people want the data to run where the data naturally needs to run or should run, that's the big argument about public versus hybrid versus what we call true private cloud. The idea that decreasing the workload needs to be located where the data, where it naturally should be located because of the physical, legal, regulatory, intellectual property attributes of the data, being able to do that is really really important. The other thing that Jacque said that goes right into this question John, is that ultimately in too many domains in this analytics world, which is fundamentally predicated on the idea of breaking data out of applications so that you can use it in new and novel and more value creating ways, is that the data gets locked up in a data warehouse. What's valuable in a data warehouse is not the hardware. It's the data. By providing the facility for being able to point an application at a couple of different data source including one that's more modern, or which takes advantage of more modern technology, that can be considerably cheaper, it means the shop can elevate the story about the asset and the asset here is the data and the applications that run against it, not the hardware and the system where the data's stored and located. One of the biggest challenges, we talked earlier just to go on for a second, we talked earlier with a couple of other guests about the fact that the industry still, what your average person still doesn't understand how to value data. How to establish a data asset and one of the reasons is because it's so constantly co-mingled with the underlying hardware. >> And actually I'd even further go on, I think the advent of some of these cloud data warehouses forgets that notion of being able to run it different places and provides one of the things that customers are really looking for which is simplicity. The ability to spin up a quick MPP SQL system within say Amazon for example, almost without a doubt, a lot of the business users that I speak to are willing to sacrifice capabilities within the platform which they are for the simplicity of getting up and going. One of the things that we really focused on in V5 is being able to give that same turnkey feel and so Green Plum exists within the Amazon marketplace, within the Azure marketplace, Google later this quarter, and then in addition to the simplicity, it has all of the functionality that is missing in those platforms, again, all the analytics, all the ability to reach out and federate queries against different types of data, I think it's exciting as we continue to progress in our releases, Green Plum has, for a number of years, had this ability to seamlessly query HGFS. Like a lot of the competitors, but HGFS isn't going away, neither is a generic object store like S3. But we continue to extend that to things like Spark for example, so now the ability to actually house your data within a data platform and seamlessly integrate with Spark back and forth, if you want to use Spark, use Spark, but somewhere that data needs to be materialized so that other applications can leverage it as well. >> But even then people have been saying well, if you want to put it on this disk, then put it on this disk. Given the question about Spark versus another database manager is a higher level conversation than many of the shops who are investing millions and millions and millions of dollars in their analytic application portfolio and all you're trying to do is, as I interpret it, is trying to say look, the value in the portfolio is the applications and the data. It's not the underlying elements. There's a whole bunch of new elements we can use, you can put it in the cloud, you can put it on premise if that's where the data belongs. Use some of these new and evolving technologies, but you're focused on how the data and the applications continue to remain valuable to the business over time and not the traditional hardware assets. >> Correct and I'll again leverage a notion that we get from labs, which is this idea of user centric design and so everything that we've been putting into the Green Plum database is around, ideally the four primary users of our system. Not just the analysts and not just the data scientists, but also the operators and the IT folks. That is where I'd say the last tenant of where we're going really is this idea of coopetition. I would, as the Pivotal Green Plum guy that's been around for 10 plus years, I would tell you very straight up that we are again, an open source MPP data platform that can rival any other platform out there, whether it's terradata, whether it's Haduke, we can beat that platform. >> Why should customers call you up? Why should they call you? There's all this other stuff out there, you got legacy, you got terradata, might have other things, people are knocking at my door, they're getting pounded with sales messages, buy me I'm better than the other guy. Why Pivotal data? >> The first thing I would say is, the latest reviews from Gardner for example, well actually let me rewind. I will easily argue that terradata has been the data warehouse platform for the last 30 years that everyone has tried to emulate. I'd even argue so much as that when Haduke came on the scene eight years ago, what they did was they changed the dynamics and what they're doing now is actually trying to emulate the terradata success through things like SQL on top of Haduke. What that has basically gotten us to is we're looking for a terradata replacement at Haduke like prices, that's what Green Plum has to offer in spades. Now, if you actually extend that just a little bit, I still recognize that not everybody's going to call us, there are still 200 other vendors out there that are selling a similar product or similar kinds of stories. What I would tell you in response to those folks is that Green Plum has been around in production for the last 10 plus years, we're a proven technology for solving problems, many of those are not. We work very well in this cooperative spirit of, Green Plum can be the end all be all, but I recognize it's not going to be the end all be all so this is why we have to work within the ecosystem. >> You have to, open source is dominating. At the Linux event, we just covered open source summit, 90% of software written will be open source libraries, 10% is where the value's being added. >> For sure, if you were to start up a new star up right now, would you go with a commercial product? >> No, just postgres database is good. All right final question to end the segment. This big data space that's now being called data, certainly Strata, Haduke is now Strata Data, just trying to keep that show going longer. But you got Microsoft Azure making a lot of waves going on right now with Microsoft Ignite, so cloud is into the play here, data's changed, so the question is how has this industry changed over the past eight years. You go back to 2010, I saw Green Plum coming prior to even getting bought out, but they were kicking ass, same product evolved. Where has the space gone? What's happened, how would you summarize it to someone who's walking in for the first year like hey back in the old days, we used to walk to school in the snow with no shoes on both ways. Now it's like get off my lawn you young developers. Seriously what is the evolution of that, how would you explain it? >> Again, I would start with terradata started the industry, by far and then folks like Netease and Green Plum came around to really give a lower cost alternative. Haduke came on the scene eight some years ago, and what I pride myself in being at Green Plum for this long and Green Plum implemented the map produced paradigm as Haduke was starting to build and as it continued to build, we focused on building our own distribution and SQL and Haduke, I think what we're getting down to is the brass tacks of the business is tired of technological science experiments and they just want to get stuff done. >> And a cost of ownership that's manageable. >> And sustainable. >> And sustainable and not in a spot where they're going to be locked into a single vendor, hence the open source. >> The ones that are winning today employed what strategy that ended up working out and what strategy didn't end up working out, if you go back and say, the people who took this path failed, people who took this approach won. What's the answer there? >> Clearly anybody who was an appliance that has long since drifted. I'd also say Green Plum's in this unique position where, >> An appliance too though. >> Well, pseudo appliance yes, I still have to respond to that, we were always software. >> You pivoted luckily. >> But putting that aside, the hardware vendors have gone away, all of the software competitors that we had have actually either been sunset, sold off and forgotten and so Green Plum, here we sit as the sole standard or person that's been around for the long haul. We are now seeing a spot where we have no competition other than the forgotten really legacy guys like terradata. People are longing to get off of legacy and onto something modern, the trick will be whether that modern is some of these new and upcoming players and technologies, or whether it really focuses on solving problems. >> What's the strategy with the winning strategy? Stick to your knitting, stick to what you know or was it more of, >> For us it was two fold, one it was continuing to service our customers and make them successful so that was how we built a profitable data platform business and then the other was to double down on the strategies that seemed to be interesting to organizations which were cloud, open source, and analytics and like you said, I talked to one of the folks over at the Air Force and he was mentioning how to him, data's actually more important than fuel, being able to understand where the airplanes are, where the fuel is, where the people are, where the missiles are etc., that's actually more important than the fuel itself. Data is the thing that powers everything. >> Data's currency of everything now, great Jacque thinks so much for coming on the Cube, Pivotal Data Platform, Data Suite, Green Plum now with all these other adds, that's great congratulations. Stay on the path helping customers, you can't lose. >> Exactly. >> The Cube here helping you figure out the big data noise, we're obviously in big data New York City event for our annual, the annual Cube Wikibon event, in conjunction with Strata Data across the street, more live coverage here for three days here in New York City I'm John Furrier, Peter Burris, we'll be back after this short break. (electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by Silicon Angle Media Welcome to the Cube, good to see you again. to the data suite, so you guys analysts and customers that the Green Plum Green Plum might get lost in the messaging and over the last year and a half of Green Plum R&D, It's a nuance but it's table stakes in the enterprise. I presume it's got to have something to do with and provide both in the same platform, and both to the postgres community. of postgres because of the success, It is the most full featured, and operating applications might have actually come to in the past of, we need an open source alternative of the types of problems it was going to take on. MPP data platform that can allow you the reticence to move to any other platform and shipping that to differentiated labor, is that the data gets locked up in a data warehouse. all the ability to reach out and federate queries and the applications continue to remain valuable but also the operators and the IT folks. Why should customers call you up? I still recognize that not everybody's going to call us, At the Linux event, we just covered open source summit, in the snow with no shoes on both ways. and Green Plum implemented the map produced paradigm And sustainable and not in a spot where they're going to be the people who took this path failed, that has long since drifted. to respond to that, we were always software. But putting that aside, the hardware on the strategies that seemed to be interesting Stay on the path helping customers, you can't lose. for our annual, the annual Cube Wikibon event,

ENTITIES

Entity	Category	Confidence
Jacque	PERSON	0.99+
Peter Burris	PERSON	0.99+
Green Plum	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Jacque Istok	PERSON	0.99+
John Furrier	PERSON	0.99+
Tony Berge	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
John	PERSON	0.99+
New York City	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
90%	QUANTITY	0.99+
NYC	LOCATION	0.99+
Berkeley	LOCATION	0.99+
Pivotal	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
2010	DATE	0.99+
first	QUANTITY	0.99+
Spark	TITLE	0.99+
Microsoft	ORGANIZATION	0.99+
eight years ago	DATE	0.99+
10%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Haduke	ORGANIZATION	0.99+
last week	DATE	0.99+
both	QUANTITY	0.98+
Strata	ORGANIZATION	0.98+
Netease	ORGANIZATION	0.98+
eight	DATE	0.98+
One	QUANTITY	0.98+
Strata Adup	ORGANIZATION	0.98+
first thing	QUANTITY	0.98+
terradata	ORGANIZATION	0.98+
Oracle	ORGANIZATION	0.97+
15 years	QUANTITY	0.97+
first year	QUANTITY	0.97+
200 other vendors	QUANTITY	0.97+
Strata Data	ORGANIZATION	0.97+
two	QUANTITY	0.97+
tens of millions of dollars	QUANTITY	0.97+
millions of dollars	QUANTITY	0.97+
one part	QUANTITY	0.97+

Basil Faruqui, BMC Software | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan its theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. >> His name is Jim Kobielus. >> Jim: That right, John Furrier is actually how I pronounce his name for the record. But he is Basil Faruqui. >> Basil Faruqui who's the solutions marketing manager at BMC, welcome to theCUBE. >> Basil: Thank you, good to be back on theCUBE. >> So, first of all, I heard you guys had a tough time in Houston, so hope everything's getting better and best wishes. >> Basil: Definitely in recovery mode now. >> Hopefully that can get straightened out. What's going on BMC, give us a quick update and in context to BigData NYC what's happening, what is BMC doing in the the big data space now? The AI space now, the IoT space now, the cloud space? >> Like you said you know the data space, the IoT space. the AI space. There are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline a suggestion, storage. processing and analytics. What keeps changing around it is the infrastructure, the types of data, the volume of data and the applications that surround it. The rate of change has picked up immensely over the last few years with Hadoop coming into the picture, public cloud providers pushing it. It's obviously created a number of challenges, but one of the biggest challenges that we are seeing in the market and we're helping customers address is the challenge of automating this. And obviously the benefit of automation is in scalability as well as reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex. How do you automate all of this from a single point of control? How do you continue to absorb new technologies and not re-architect your automation strategy every time. Whether it's Hadoop, whether it's bringing in machine learning from a cloud provider. And that is the the issue we've been solving for customers. >> All right, let me jump into it. So first of all you mention some things some things that never change, ingestion storage, and what was the third one? >> Ingestions, storage, processing and eventual analytics. >> So OK, so that's cool, totally buy that. Now if you move and say hey okay so you believe that's standard but now in the modern era that we live in, which is complex, you want breadth of data, and also you want the specialization when you get down the machine learning. That's highly bound, that's where the automation it is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Basil: Correct. >> How do you architect that? If I'm a CXO to I'm a CDO, what's in it for me? How do I architect this because that's really the number one thing is I know what the building blocks are but they've changed in their dynamics to the marketplace. >> So the way I look at it is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot and you spend, you know, three months on it and you deliver some results. But if you cannot roll it out worldwide, nationwide, whatever it is essentially the project has failed. The analogy often give is Walmart has been testing the pick up tower, I don't know if you seen, so this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about hundred stores today. Now that's a success and Walmart wants to roll this out nationwide. How much time do you think their IT departments can have? Is this is a five year project, ten year project? No, the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> You're describing a very complex automation scenario. How can you automate in a hurry without sacrificing you know, the details of what needs to be, In other words, you seem to call for re purposing or reusing prior automation scripts and rules and so forth. How how can the Walmart's of the world do that fast, but also do it well? >> So we do it we go about it in two ways. One is that out of the box we provide a lot of pre built integrations to some of the most commonly used systems in an enterprise. All the way up from the mainframes, Oracle's, SAP's Hadoop, Tableau's, of the world. They're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago, was that the automation was something that was considered close to the project becoming operational. And that's where a lot of rework happened because developers have been writing their own scripts, using point solutions. So we said all right, it's time to shift automation left and allow companies to build automation as an artifact very early in the development lifecycle. About a month ago we released what we call Control-M Workbench which is essentially a Community Edition of Control-M targeted towards developers. So that instead of writing their own scripts they can use a Control-M in a completely offline manner without having to connect to an enterprise system. As they build and test and iterate, they're using Control-M to do that. So as the application progresses the development lifecycle, and all of that work can then translate easily into an Enterprise Edition of Control-M. >> So quickly, just explain what shift-left means for the folks that might not know software methodologies, left political or left alt-right, this is software development so please take a minute explain what shift-left means, and the importance of it. >> Correct, so the if you if you think of software development and as a straight line continuum you can start with building some code, you will do some testing, then unit testing, than user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. You know, developers would come in and deliver the application to ops, and ops would say, well hang on a second all this CRON tab and all these other point solutions have been using for automation, that's not what we use in production. And we need you to now. >> To test early and often. >> Test early and often. The challenge was the developers, the tools they use, we're not the tools that were being used on the production end of the cycle. And there was good reason for it because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Control-M Workbench is a very light version which is targeted at developers and focuses on the needs that they have when they're building and developing as the application progresses through its life cycle. >> How much are you seeing Waterfall and then people shifting-left becoming more prominent now. What percentage of your customers have moved to Agile and shifting-left percentage wise? >> So we survey our customers on a regular basis. In the last survey showed that 80% of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it. And that's the other. >> And getting upfront costs as possible, a tipping point is reached. >> What is driving all of that is the need from the business, you know, the days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. And we have also innovated in that space and the approach we call Jobs-as-Code where you can build entire, complex data pipelines in code formats so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and then I'll let you take the floor and got to learn to get a word in soon. But I have one final question on this BMC methodology thing. You guys have a history obviously BMC goes way back. Remember Max Watson CEO, and then in Palm Beach back in 97 we used to chat with him. Dominated that landscape, but we're kind of going back to a systems mindset, so the question for you is how do you view the issue of the this holy grail, the promised land of AI and machine learning. Where, you know, end-to-end visibility is really the goal, right. At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So it's a trade off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market because customers want the end-to-end promise, but they don't want to try to get there too fast as a dis-economies of scale potentially. How do you talk about that? >> And that's exactly the approach we've taken with Control-M Workbench the Community Edition. Because early on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build, and test and show value, OK. And they don't need something that, as you know, with all the bells and whistles. We're allowing you to handle that piece in that manner, through Control-M Workbench. As things progress, and the application progresses, the needs change as well. Now I'm closer to delivering this to the business, I need to be able to manage this within an SLA. I need to be able to manage this end-to-end and connect this other systems of record and streaming data and click stream data, all of that. So that we believe that there it doesn't have to be a trade off. That you don't have to compromise speed and quality and visibility and enterprise grade automation. >> You mention trade-offs so the Control-M Workbench the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation, when it's when the tool is off line? I mean it simply seems like the more development they do off line, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate that risk. >> Sure, we spent a lot of time observing how developers work and very early in the development stage, all they're doing is working off of their Mac or their laptop and they're not really connecting to any. And that is where they end up writing a lot of scripts because whatever code, business logic, that they've written the way they're going to make it run is by writing scripts. And that essentially becomes a problem because then you have scripts managing more scripts and as the the application progresses, you have this complex web of scripts and CRON tabs and maybe some open source solutions. trying to make, simply make, all of this run. And by doing this I don't know offline manner that doesn't mean that they're losing all of the other controlling capabilities. Simply, as the application progresses whatever automation that they've built in Control-M can seamlessly now flow into the next stage. So when you are ready take an application into production there is essentially no rework required from an automation perspective. All of that that was built can now be translated into the enterprise grade Control-M and that's where operations can then go in and add the other artifacts such as SLA management forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives because you're like an analyst here. So Jim, I want you guys to comment, my question to both of you would be you know, looking at this time in history, obviously on the BMC side, mention some of the history. You guys are transforming on a new journey and extending that capability in this world. Jim, you're covering state of the art AI machine learning. What's your take of the space now? Strata Data which is now Hadoop World, which is, Cloudera went public, Hortonworks is now public. Kind of the big, the Hadoop guys kind of grew up, but the world has changed around them. It's not just about Hadoop anymore. So I want to get your thoughts on this kind of perspective. We're seeing a much broader picture in BigData NYC versus the Strata Hadoop, which seems to be losing steam. But, I mean, in terms of the focus, the bigger focus is much broader horizontally scalable your thoughts on the ecosystem right now. >> Let Basil answer first unless Basil wants me to go first. >> I think the reason the focus is changing is because of where the projects are in their life cycle. You know now what we're seeing is most companies are grappling with how do I take this to the next level. How do I scale, how do I go from just proving out one or two use cases to making the entire organization data driven and really inject data driven decision making in all facets of decision making. So that is, I believe, what's driving the change that we're seeing, that you know now you've gone from Strata Hadoop to being Strata Data, and focus on that element. Like I said earlier, these difference between success and failure is your ability to scale and operationalize. Take machine learning for example. >> And really it's not a hype market. Show me the meat on the bone, show me scale, I got operational concerns of security and whatnot. >> And machine learning you know that's one of the hottest topics. A recent survey I read which polled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of the data scientists time, and that is exactly one of the problems we're solving for our customers around the world. >> And it needs to be automated to the hilt to help them to be more productive delivering fast results. >> Ecosystem perspective, Jim whats you thoughts? >> Yes everything that Basil said, and I'll just point out that many of the core use cases for AI are automation of the data pipeline. You know it's driving machine learning driven predictions, classifications, you know abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's going on. The path, the history historical data, what's going on in terms of current streaming data to drive optimal outcomes, you know, using predictive models and so forth, in line to applications. So really, fundamentally then, what's going on is that automation is an artifact that needs to be driven into your application architecture as a re-purposeful resource for a variety of jobs. >> How would you even know what to automate? I mean that's the question. >> You're automating human judgment, your automating effort. Like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that need can be automated because those are patterned, structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on his with a glass company, GSK, with that scale, and his attitude is we see the results from the users then we double down and pay for it and automate it. So the automation question, it's a rhetorical question but this begs the question, which is you know who's writing the algorithms as machines get smarter and start throwing off their own real time data. What are you looking at, how do you determine you're going to need you machine learning for machine learning? You're going to need AI for AI? Who writes the algorithms for the algorithms? >> Automated machine learning is a hot hot, not only research focus, but we're seeing it more and more solution providers like Microsoft and Google and others, are going deep down doubling down and investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion, so you're starting to see some things with blockchain some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now, because people might have a historical view of BMC. What's the latest, what should they know, what's the new Instagram picture of BMC? What should they know about you guys? >> I think what I would say people should know about BMC is that you know all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very very mature solution. An example of which is Navistar and their CIO is actually speaking at the keynote tomorrow. They've had Control-M for 15, 20 years and have automated virtually every business function through Control-M. And when they started their predictive maintenance project where there ingesting data from about 300 thousand vehicles today, to figure out when this vehicle might break and do predictive maintenance on it. When they started their journey they said that they always knew that they were going to use Control-M for it because that was the enterprise standard. And they knew that they could simply now extend that capability into this area. And when they started about three four years ago there were ingesting data from about a hundred thousand vehicles, that has now scaled over 325 thousand vehicles and they have not had to re-architect their strategy as they grow and scale. So, I would say that is one of the key messages that we are are taking to market, is that we are bringing innovation that has spanned over 25 years and evolving it. >> Modernizing it. >> Modernizing it and bringing it to newer platforms. >> Congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here. On BigData NYC this is theCUBE. I'm John Furrier, Jim Kobielus here in New York City, more live coverage the three days we will be here, today, tomorrow and Thursday at BigData NYC. More coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media how I pronounce his name for the record. Basil Faruqui who's the solutions marketing manager So, first of all, I heard you guys The AI space now, the IoT space now, the cloud space? And that is the the issue we've been solving So first of all you mention some things some things the specialization when you get down the machine learning. the number one thing is I know what the building blocks are the pick up tower, I don't know if you seen, How how can the Walmart's of the world One is that out of the box we provide for the folks that might not know software methodologies, Correct, so the if you if you think and developing as the application progresses How much are you seeing Waterfall And that's the other. And getting upfront costs as possible, What is driving all of that is the need from At the same time, you want bounded experiences And that's exactly the approach we've taken with I mean it simply seems like the more development and as the the application progresses, Kind of the big, the Hadoop guys kind of grew up, that we're seeing, that you know now you've gone Show me the meat on the bone, show me scale, of the data scientists time, and that is exactly And it needs to be automated to the hilt that many of the core use cases for AI are automation I mean that's the question. Like the judgments that a working data engineer makes So the automation question, it's a rhetorical question and more solution providers like Microsoft What's the latest, what should they know, is that you know all the work that we've done and bringing it to newer platforms. the core things. more live coverage the three days we will be here,

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
John Furrier	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Basil	PERSON	0.99+
Google	ORGANIZATION	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
15	QUANTITY	0.99+
80%	QUANTITY	0.99+
Palm Beach	LOCATION	0.99+
one	QUANTITY	0.99+
ten months	QUANTITY	0.99+
five year	QUANTITY	0.99+
ten year	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
six months	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
over 325 thousand vehicles	QUANTITY	0.99+
Mac	COMMERCIAL_ITEM	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
tomorrow	DATE	0.99+
two ways	QUANTITY	0.99+
Thursday	DATE	0.99+
GSK	ORGANIZATION	0.99+
about 300 thousand vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
SAP	ORGANIZATION	0.98+
one quick question	QUANTITY	0.98+
third one	QUANTITY	0.98+
Strata Hadoop	TITLE	0.98+
four years ago	DATE	0.98+
over 25 years	QUANTITY	0.98+
single point	QUANTITY	0.98+
about a hundred thousand vehicles	QUANTITY	0.97+
one final question	QUANTITY	0.97+
About a month ago	DATE	0.96+
Max Watson	PERSON	0.96+
Instagram	ORGANIZATION	0.96+
BigData	ORGANIZATION	0.95+
four components	QUANTITY	0.95+
about hundred stores	QUANTITY	0.95+
first	QUANTITY	0.95+
two use cases	QUANTITY	0.95+
NYC	LOCATION	0.94+
Navistar	ORGANIZATION	0.94+
BMC Software	ORGANIZATION	0.93+
97	DATE	0.93+
Agile	TITLE	0.89+

Rob Bearden, Hortonworks & Rob Thomas, IBM | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE media, and its ecosystem sponsor. >> Okay, welcome back, everyone. We're here live in New York City for BigData NYC, our annual event with SiliconANGLE Media, theCUBE, and Wikibon, in conjunction with Strata Hadoop, which is now called Strata Data as that show evolves. I'm John Furrier, cohost of theCUBE, with Peter Burris, head of research for SiliconANGLE Media, and General Manager of Wikibon. Our next two guests are two legends in the big data industry, Rob Bearden, the CEO of Hortonworks, really one of the founders of the big data movement, you know, got Cloudaire and Hortonworks, really kind of built that out, and Rob Thomas, General Manager of IBM Analytics. Big-time investments have made both of them. Congratulations for your success, guys. Welcome back to theCUBE, great to see you guys! >> Great to see you. >> Great, yeah. >> And got an exciting partnership to talk about, as well. >> So, but let's do a little history, you guys, obviously, I want to get to that, and get clarified on the news in a second, but you guys have been there from the beginning, kind of looking at the market, developing it, almost from the embryonic state to now. I mean, what a changeover. Give a quick comparison of where we've come from and what's the current landscape now, because you have, it evolved into so much more. You got IOT, you got AI, you have a lot of things in the enterprise. You've got cloud computing. A lot of tailwinds for this industry. It's gotten bigger. It's become big and now it's huge. What's your thoughts, guys? >> You know I, so you look at arcs and really all this started with Hadoop, and Rob and I met early in the days of that. You kind of gone from the early few years is about optimizing operations. Hadoop is a great way for a company to become more efficient, take out costs in their data infrastructure, and so that put huge momentum into this area, and now we've kind of fast-forwarded to the point where now it's about, "So how "am I actually going to extract insight?" So instead of just getting operational advantages, how am I going to get competitive advantage, and that's about bringing the world of data science and machine learning, run it natively on Hadoop, that's the next chapter, and that's what Rob and I are working closely together on. >> Rob, your thoughts, too? You know, we've been talking about data in motion. You guys were early on in that, seeing that trend. Real time is still hot. Data is still the core asset people are trying to figure out and move from wrangling to actually enabling that data. >> Right. Well, you know, in the early days of Big Data, it was, to Rob's point, it was very much about bringing operational leverage and efficiency and being able to aggregate very siloed data sets, and unlocking that data and bringing it into a central platform. In the early days in resources, and Hadoop went to making Hadoop an enterprise-viable data platform, with security, governance, operations, management capability, that mirrored any of the proprietary transactional or EDW platforms, and what the lessons learned in that were, is that by bringing all that data together in a central data set, we now can understand what's happening with our customers, and with our other assets pre-transaction, and so they can become very prescriptive in engaging in new business models, and so what we've learned now is the further upstream we can get in the world of IOT and bring that data under management from the point of origination and be able to manage that all the way through its life cycle, we can create new business models with higher velocity of engagement and a lot more rapid value that gets created. It, though, creates a number of new challenges in all the areas of how you secure that data, how you bring governance across that entire life cycle from a common stream set. >> Well, let's talk about the news you guys have. Obviously, the partnership. Partnerships become the new normal in an open source era that we're living in. We're seeing open source software grow really exponentially in the forecast coming in the next five years and ten years and exponential growth in new code. Just new people coming on board, new developers, dev ops is mainstream. Partnerships are key for communities. 90% of the code is going to be open source, 10%, as they say, the Code Sandwich as Jim Zemlin, the executive director of Linux Foundation, wants to, and you're seeing that work. You guys have worked together with Apache Atlas. What's the news, what's the relationship with Hortonworks and IBM? Share the news. >> So, a lot of great work's been happening there, and generally in the open source community, around Apache Atlas, and making sure that we're bringing missing critical governance capabilities across the big data sets and environments. As we then get into the complexity of now multiple data lakes, multiple tiers of data coming from multiple sources, that brings a higher level of requirement in both the security and governance aspects, and that's where the partnership with IBM is continuing to drive Apache Atlas into mission critical enterprise viability, but then when we get into the distributed models and enterprise requirements, the IBM platforms leveraging Atlas and what we're doing together then take that into the mission critical enterprise capability. >> You got the open source, and now you got the enterprise. Rob, we've talked many times about the enterprise as a hard, hard environment to crack for say, a start up, but even now, they're becoming reliant on open source, but yet, they have a lot of operational challenges. How does this relate to the challenge of, you know, CIO and his staff, now new personas coming in, you seeing the data science role, you see it expanding from analytics to dev ops. A day of challenges. >> Look, enterprises are getting better at this. Clearly we've seen progress the last five years on that, but to kind of go back and link the points, there's a phrase I heard I like. It says, "There's no AI without IA," meaning information architecture. Fundamentally, what our partnership is about is delivering the right information architecture. So it's Hadoop federated with whatever you have in terms of warehouses and databases. We partner around IBM common sequel for that. It's meta data for your core governance because without governance you don't have compliance, you can't offer self-service analytics, so we are forming what I would call the fluid data layer for an enterprise that enables them to get to this future of AI, and my view is there's a stop in between, which is data science, machine learning, applications that are ready today that clients can put into production and improve the outcomes they're getting. That's what we're focused on right now is how do we take the information architecture we've been able to establish, and then help clients on this journey? That's what enterprises want, because that's how they're going to build differentiation in their businesses. >> But the definition of an information architecture is closest to applications, and maybe this informs your perspective, it's close to the applications that the business is running on. Goes back to your observation about, "We used to be focusing, optimizing operations." As you move away from those applications, your information architecture becomes increasingly diffuse. It's not as crystal clear. How do you drive that clarity, as the data moves to derived new applications? >> Rob and I have talked about this. I think we're at the dawn of probably a new era in application development. Much more agile, flexible applications that are taking advantage of data wherever it resides. We are really early in that. Right now we are in the let's actually put into practice, machine learning and data science, let's extract value the data we got, that will then inform a new set of applications, which is related to the announcements that Hortonworks made this week around data plane, which is looking at multi-cloud environments and how would you manage applications and data across those? Rob, you can speak to that better than I can, I think. >> Well, the data plan thing, this information architecture, I think you're 100% right on. The data that we're hearing from customers in the enterprise is, they see the IOT buzz, oh, of course they're going to connect with IOT devices down the road, but when they see the security challenges, when they see the operational challenges around hiring people to actually run the dev ops, they have to then re-architect. So there's certainly a conversation we see on what is the architecture for the data, but also a little bit bigger than that, the holistic architecture of, say, cloud. So a lot of people are like, trying to clean up their house, if you will, to be ready for this new era, and I think Wikibon, your private cloud report you guys put out really amplified that by saying, "Yeah, they see these trends, "but they got to kind of get their act together." They got to look at who the staff is, what the data architecture's going to be, what apps are being developed, so doing a lot more retrenching. Given that, if we agree, what does that mean for the data plane, and then your vision of having that data architecture so that this will be a solid foundational transition? >> I think we all hit on the same point, which is it is about enabling a next generation IT architecture, of which, sort of the X and the Y axis or network, and generally what Big Data's been able to do, and Hadoop specifically, was over the last five years, enabling the existing applications architected, and I like the term that's been coined by you, is they were known processes with known technology, and that's how applications in the last 20 years have been enabled. Big Data and Hadoop generally have unlocked that ability to now be able to move all the way out to the edge and incorporate IOT, data at rest, data in motion, on-prem and cloud hybrid architecture. What that's done is said, "Now we know how to build an "application that takes advantage of an event or an "occurrence and then can drive outcome in a variety of ways. "We don't have to wait for a static programming model "to automate a function." >> And in fact, if we are wait, we're going to fail. That's one of the biggest challenges. I mean, IBM, I will tell you guys, or I'll tell you, Rob, that one of the craziest days I've ever spent is I flew from Japan to New York City for the IBM Information Architecture Announcement back in like 1994, and it was the most painful two days I've ever experienced in my entire life. That's a long time ago. It's ancient history. We can't use information architecture as a way of slowing things down. What we need to be able to do is we need to be able to introduce technology that again, allows the clarity of information architecture close to these core applications to move, and that may involve things like machine learning itself being embedded directly into how we envision data being moved, how we envision optimization, how we envision the data plane working. So, as you guys think about this data plane, everybody ends up asking themselves, "Is there a natural place for data to be?" What's going to be centralized, what's going to be decentralized, and I'm asking you, is increasingly the data going to be decentralized but the governance and securities and policies that we put in place going to be centralized and that's what's going to inform the operation of the data plane? What do you guys think? >> It's our view, very specifically from Hortonworks' perspective, that we want to give the ability for the data to exist and reside wherever the physics dictate, whether that be on-prem, whether that be in the cloud, and we want to give the ability to process and take action on an event or an occurrence or drive and outcome as early in the cycle as possible. >> Describe what you mean by "early in the cycle." >> So, as we see conditions emerge. A machine part breaking down. A customer taking an action. A supply chain inventory outage. >> So as close as possible to the event that's generating the data. >> As it's being generated, or as the processes are leading up to the natural outcome and we can maybe disintermediate for a better outcome, and so, that means that we have to be able to engage with the data irrespective of where it is in its cycle, and that's where we've enabled, with data plane, the ability to extract out the requirement of where that data is, and to be able to have a common plane, pun intended, for the operations and managing and provisioning of the environment, for being able to govern that and secure it, which are increasingly becoming intertwined, because you have to deal with it from point of origin through point at rest. >> The new phrase, "The single plane of glass." All joking aside, I want to just get your thoughts on this, Rob, too. "What's in it for me? "I'm the customer. "Right now I have a couple challenges." This is what we hear from the market. "I need data consistency because things are happening in "real time; whatever events are going on with data, we know "more data's going to be coming out from the edge and "everywhere else, faster and more volume, so I need "consistency of my data, and I don't want "to have multiple data silos," and then they got to integrate the data, so on the application developer side, a dev ops-like ethos is emerging where, "Hey, if there's data being done, I need to integrate that "into my app in real time," so those are two challenges. Does the data plane address that concern for customers? That's the question. >> Today it enables the ops world. >> So I can integrate my apps into the data plane. >> My apps and my other data assets, irrespective of where they reside, on-prem, cloud, or out to the edge, and all points in between. >> Rob, for enterprise, is this going to be the single pane of glass for data governance? Is that how the vision that you guys see this, because that's a benefit. If that could happen, that's essentially one step towards the promised land, if you will, for more data flowing through apps and app developers. >> So let me reshape a little bit. There's two main problems that collectively we have to address for enterprises: one is they want to apply machine learning and data science at scale, and they're struggling with that, and two is they want to get the cloud, and it's not talked about nearly enough, but most clients are really struggling with that. Then you fast forward on that one, we are moving to a multi-cloud world, absolutely. I don't think any enterprise is going to standardize on a single cloud, that's pretty clear. So you need things like data plane that acknowledge it's a multi-cloud world, and even as you move to multi clouds, you want a single focus for your data governance, a single strategy for your data governance, and then what we're doing together with IBM Data Science Experience with Hortonworks, let's say, whatever data you have in there, you can now do your machine learning right where that data is. You don't need to move it around. You can if you want, but you don't have to move it around, 'cause it's built in, and it's integrated right into the Hadoop ecosystem. That solves the two main enterprise pain points, which is help me get the cloud, help me apply data science and machine learning. >> Well we'll have to follow up and we'll have to do just a segment just on that. I think multi-cloud is clearly the direction, but what the hell does that mean? If I run 365 on Azure, that's one app. If I run something else on Amazon, that's multiple clouds, not necessarily moving workloads across. So the question I want to ask here is, it's clear from customers they want single code bases that run on all clouds seamlessly so I don't have to scale up on things on Amazon, Azure, and Google. Not all clouds are created equal in how they do things. Storage, through ever, inside the data factories of how they process. That's a challenge. How do you guys see that playing out of, you have on-premise activities that have been bootstrapped. Now you have multiple clouds with different ways of doing things, from pipelining, ingestion and processing, and learning. How do you see that playing out? Clouds just kind of standardizing around data plane? >> There's also the complexity of even within the multi-clouds, you're going to have multiple tiers within the clouds, if you're running in one data center in Asia, versus one in Latin America, maybe a couple across the Americas. >> But as a customer, do I need to know the cloud internals of Amazon, Azure, and Google? >> You do. In a stand-alone world, yes you do. That's where we have to bring and abstract the complexity of that out, and that's the goal with data plane, is to be able to extract, whether it's, which tier it's in, on-prem, or whether it's on, irrespective of which cloud platform. >> But Rob Thomas, I really like the way you put it. There may be some other issues that users have to worry about, certainly there are some that we think, but the two questions of, "Where am I going to run the machine learning," and "How am I going to get that to the cloud appropriately," I really like the way you put that. At the end of the day, what users need to focus on is less where the application code is, and more where the data is, so that they can move the application code or they can move the work to the data. That's fundamentally the perspective. We think that businesses don't take their business to the cloud, they bring the cloud to their business. So, when you think about this notion of increasingly looking at a set of work that needs to be performed, where the data exists, and what acts you're going to take in that data, it does suggest that data is going to become more of a centerpiece asset within the business. How does some of the things that you guys are doing lead customers to start to acknowledge data as an asset so they're making the appropriate investments in their data as their business evolves, and partly in response to data as an asset? What do you think? >> We have to do our job to build to common denominators, and that's what we're doing to make this easy for clients. So today we announced the IBM integrated analytics system. Same code base on private cloud as on a hardware system as on public cloud, all of it federates to Hortonworks through common sequel. That's what clients need, 'cause it solves their problem. Click of a button, they can get the cloud, and by the way, on private cloud it's based on Kubernetes, which is aligned with what we have on public cloud. We're working with Hortonworks to optimize Yarn and Kubernetes working together. These are the meaty issues that if we don't solve it, then clients have to deal with the bag of bolts, and so that's the kind of stuff we're solving together. So think about it: one single code base for managing your data, federates to Hadoop, machine learning is built into the system, and it's based on Kubernetes, that's what clients want. >> And the containers is just great, too. Great cloud-native trend. You guys been great, active in there. Congratulations to both of you guys. Final question, get you guys the last word: How does the relationship between Hortonworks and IBM evolve? How do you guys see this playing out? More of the same? Keep integrating in code? Is there any new thing you see on the horizon that you're going to be knocking down in the future? >> I'll take the first shot. The goal is to continue to make it simple and easy for the customer to get to the cloud, bring those machine learning and data science models to the data, and make it easy for the consumption of the new next generation of applications, and continue to make our customer successful and drive value, but to do it through transparently enabling the technology platforms together, and I think we've acknowledged the things that IBM is extraordinarily good at, the things that Hortworks is good at, and bring those two together with virtually no overlap. >> Rob, you've been very partner-centric. Your thoughts on this partnership? >> Look, it's what clients want. Since we announced this, the results and the response has been fantastic, and I think it's for one simple reason. So, Hortonworks' mission, we all know, is open source, and delivering in the community. They do a fantastic job of that. We also know that sometimes, clients need a little bit more, and so, when you bring those two things together, that's what clients want. That's very different than what other people in the industry do that say, "We're going to create a proprietary wrapper "around your Hadoop environment and lock your data in." That's the opposite of what we're doing. We're saying we're giving you full freedom of open source, but we're enabling you to augment that with machine learning, data science capabilities. This is what clients want. That's why the partnership's working. I think that's why we've gotten the response that we have. >> And you guys have been multiple years into the new operating model of being much more aggressive within the Big Data community, which has now morphed into much larger landscape. You pleased with some of the results you're seeing on the IBM side and more coding, more involvement in these projects on your end? >> Yeah, I mean, look, we were certainly early on Spark, created a lot of momentum there. I think it actually ended up helping both of our interests in the market. We built a huge community of developers at IBM, which is not something IBM had even a few years ago, but it's great to have a relationship like this where we can continue to augment our skills. We make each other better, and I think what you'll see in the future is more on the governance side; I think that's the piece that's still not quite been figured out by most enterprises yet. The need is understood. The implementation is slow, so you'll see more from us collectively there. >> Well, congratulations in the community work you guys have done. I think the community's model's evolving mainstream as well. Open source will continue to grow. Congratulations. Rob Bearden and Rob Thomas here inside theCUBE, more coverage here in Big Data NYC with theCUBE, after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE media, of the big data movement, you know, almost from the embryonic state to now. You kind of gone from the early few years Data is still the core asset people are trying to figure out and be able to manage that all the way through its 90% of the code is going to be open source, and generally in the open source community, How does this relate to the challenge of, you know, CIO the fluid data layer for an enterprise that enables them to But the definition of an information architecture is the data we got, that will then inform a new set Well, the data plan thing, this information architecture, and that's how applications in the last 20 years of the data plane? to give the ability to process and take action on an event So, as we see conditions emerge. So as close as possible to the event and provisioning of the environment, and then they got to integrate the data, they reside, on-prem, cloud, or out to the edge, Is that how the vision that you guys see this, I don't think any enterprise is going to standardize So the question I want to ask here is, There's also the complexity of even within the of that out, and that's the goal with data plane, How does some of the things that you guys are doing and so that's the kind of stuff we're solving together. Congratulations to both of you guys. for the customer to get to the cloud, bring those machine Rob, you've been very partner-centric. and delivering in the community. on the IBM side and more coding, more involvement in these in the market. Well, congratulations in the community work

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Japan	LOCATION	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Peter Burris	PERSON	0.99+
John Furrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Jim Zemlin	PERSON	0.99+
1994	DATE	0.99+
100%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Americas	LOCATION	0.99+
Wikibon	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Latin America	LOCATION	0.99+
two	QUANTITY	0.99+
Hortworks	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
New York City	LOCATION	0.99+
10%	QUANTITY	0.99+
both	QUANTITY	0.99+
Cloudaire	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
NYC	LOCATION	0.99+
two challenges	QUANTITY	0.99+
one	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.98+
two days	QUANTITY	0.98+
two main problems	QUANTITY	0.98+
Apache Atlas	ORGANIZATION	0.98+
first shot	QUANTITY	0.98+
one step	QUANTITY	0.98+
ibon	PERSON	0.98+
one app	QUANTITY	0.98+
Today	DATE	0.97+
this week	DATE	0.97+
two guests	QUANTITY	0.97+
today	DATE	0.97+
Yarn	ORGANIZATION	0.96+
BigData	ORGANIZATION	0.96+
SiliconANGLE media	ORGANIZATION	0.95+
Hortonworks'	PERSON	0.94+
single cloud	QUANTITY	0.94+

Day One Wrap | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering BigData New York City 2017. Brought to you by SiliconANGLE Media, and its ecosystem sponsors. >> Hello everyone, welcome back to our day one, at Big Data NYC, of three days of wall to wall coverage. This is theCUBE. I'm John Furrier, with my co-hosts Jim Kobielus and Peter Burris. We do this event every year, this is theCUBE's BigData NYC. It's our event that we run in New York City. We have a lot of great content, we have theCUBE going live, we don't go to Strata anymore. We do our own event in conjunction, they have their own event. You can go pay over there and get the booth space, but we do our media event and attract all the influencers, the VIPs, the executives, the entrepreneurs, we've been doing it for five years, we're super excited, and thank our sponsors for allowing us to get here and really appreciate the community for continuing to support theCUBE. We're here to wrap up day one what's going on in New York, certainly we've had a chance to check out the Strata situations, Strata Data, which is Cloudera, and O'Reilly, mainly O'Reilly media, they run that, kind of old school event, guys. Let's kind of discuss the impact of the event in context to the massive growth that's going outside of their event. And their event is a walled garden, you got to pay to get in, they're very strict. They don't really let a lot of people in, but, okay. Outside of that the event it going global, the activity around big data is going global. It's more than Hadoop, we certainly thought about that's old news, but what's the big trend this year? As the horizontally scalable cloud enters the equation. >> I think the big trend, John, is the, and we've talked about in our research, is that we have finally moved away from big data, being associated with a new type of infrastructure. The emergence of AI, deep learning, machine learning, cognitive, all these different names for relatively common things, are an indications that we're starting to move up into people thinking about applications, people thinking about services they can use to get access, or they can get access to build their applications. There's not enough skills. So I think that's probably the biggest thing is that the days of failure being measured by whether or not you can scale your cluster up, are finally behind us. We're using the cloud, other resources, we have enough expertise, the technologies are becoming simpler and more straightforward to do that. And now we're thinking about how we're going to create value out of all of this, which is how we're going to use the data to learn something new about what we're doing in the organization, combine it with advanced software technologies that actually dramatically reduce the amount of work that's necessary to make a decision. >> And the other trend I would say, on top of that, just to kind of put a little cherry on top of that, kind of the business focus which is again, not the speeds and feeds, although under the hood, lot of great innovation going on from deep learning, and there's a ton of stuff. However, the conversation is the business value, how it's transforming work and, but the one thing that nobody's talking about is, this is why I'm not bullish on these one shows, one show meets all kind of thing like O'Reilly Media does, because there's multiple personas in a company now in the ecosystem. There are now a variety of buyers of some products. At least in the old days, you'd go talk to the IT CIO and you're in. Not anymore. You have an analytics person, a Chief Data Officer, you might have an IT person, you might have a cloud person. So you're seeing a completely broader set of potential buyers that are driving the change. We heard Paxata talk about that. And this is a dynamic. >> Yeah, definitely. We see a fair amount of, what I'm sensing about Strata, how it's evolving these big top shows around data, it's evolving around addressing a broader, what we call maker culture. It's more than software developers. It's business analysts, it's the people who build the hardware for the internet of things into which AI and machine learning models are being containerized and embedded. I've, you know, one of the takeaways from today so far, and the keynotes are tomorrow at Strata, but I've been walking the atrium at the Javits Center having some interesting conversations, in addition, of course, to the ones we've been having here at theCUBE. And what I'm notic-- >> John: What are those hallway conversations that you're having? >> Yeah. >> What's going on over there? >> Yeah, what I've, the conversations I've had today have been focused on, the chief trend that I'm starting to sense here is that the productionization of the machine learning development process or pipeline, is super hot. It spans multiple data platforms, of course. You've got a bit of Hadoop in the refinery layer, you've got a bit of in-memory columnar databases, like the Act In discussed at their own, but the more important, not more important, but just as important is that what users are looking at is how can we build these DevOps pipelines for continuous management of releases of machine learning models for productionization, but also for ongoing evaluation and scoring and iteration and redeployment into business applications. You know there's, I had conversations with Mapbar, I had conversations with IBM, I mean, these were atrium conversations about things that they are doing. IBM had an announcement today on the wires and so forth with some relevance to that. And so I'm seeing a fair, I'm hearing, I'm sensing a fair amount of It's The Apps, it's more than just Hadoop. But it's very much the flow of these, these are the core pieces, like AI, core pieces of intellectual property in the most disruptive applications that are being developed these days in all manner, in business and industry in the consumer space. >> So I did not go over to the show floor yet, I've not been over to the Atrium. But, I'll bet you dollars to donuts this is indicative of something that always happens in a complex technology environment. And again, this is something we've thought about particularly talked about here on theCUBE, in fact we talked to Paxata about it a little bit as well. And that is, as an organization gains experience, it starts to specialize. But there's always moments, there' always inflection points in the process of gaining that experience. And by that, or one of the indications of that is that you end up with some people starting to specialize, but not quite sure what they're specializing in yet. And I think that's one of the things that's happening right now is that the skills gap is significant. At the same time that the skills gap is being significant, we're seeing people start to declare their specializations that they don't have skills, necessarily, to perform yet. And the tools aren't catching up. So there's still this tension model, open source, not necessarily focusing on the core problem. Skills looking for tools, and explosion in the number of tools out there, not focused on how you simplify, streamline, and put into operation. How all these things work together. It's going to be an interesting couple of years, but the good news, ultimately, is that we are starting to see for the first time, even on theCUBE interviews today, the emergence of a common language about how we think about the characteristics of the problem. And I think that that heralds a new round of experience and a new round of thinking about what is all the business analysts, the data scientists, the developer, the infrastructure person, business person. >> You know, you bring up that comment, those comments, about the specialists and the skills. We talked, Jim and I talked on the segment this morning about tool shed. We're talking about there are so many tools out there, and everyone loves a good tool, a hammer. But the old expression is if you're a hammer, everything looks like a nail, that's cliche. But what's happened is there are a plethora of tools, right, and tools are good. Platforms are better. As people start to replatformize everything they could have too many tools. So we asked the C Chief Data Officer, he goes yeah, I try to manage the tool tsunami, but his biggest issue was he buys a hammer, and it turns into a lawnmower. That's a vendor mentality of-- >> What a truck. Well, but that's a classic example of what I'm talking about. >> Or someone's trying to use a hammer to mow the lawn right? Again, so this is what you're getting at. >> Yeah! >> The companies out there are groping for relevance, and that's how you can see the pretenders from the winners. >> Well, a tool, fundamentally, is pedagogical. A tool describes the way work is going to be performed, and that's been a lot of what's been happening over the course of the past few years. Now, businesses that get more experience, they're describing their own way of thinking throughout a problem. And they're still not clear on how to bring the tools together because the tools are being generated, put into the marketplace by an expanding array of folks and companies, and they're now starting to shuffle for position. But I think ultimately, what we're going to see happen over the next year and I think this is an inflection point, going back to this big tent notion, is the idea that ultimately we are going to see greater specialization over the next few years. My guess is that this year will probably, should get better, or should get bigger, I'm not certain it will because it's focused on the problems that we already solved and not moving into the problems that we need to focus on. >> Yeah, I mean, a lot of the problems I have with the O'Reilly show is that they try to throw default leadership out there, and there's some smart people that go to that, but the problem is is that it's too monetization, they try to make too much money from the event when this action's happening. And this is where the tool becomes, the hammer becomes a lawnmower, because what's happening is that the vendor's trying to stay alive. And you mentioned this earlier, to your point, the customers that are buyers of the technology don't want to have something that's not going to be a fit, that's going to be agile from us. They don't want the hammer that they bought to turn into something that they didn't buy it for. And sometimes, teams can't make that leap, skillset-wise, to literally pivot overnight. Especially as a startup. So this is where the selection of the companies makes a big difference. And a lot of the clients, a lot of customers that we're serving on the end user side are reaching the conclusion that the tools themselves, while important, are clearly not where the value is. The value is in how they put them together for their business. And that's something that's going to have to, again, that's a maturation process, roles, responsibilities, the chief data officer, they're going to have a role in that or not, but ultimately, they're going to have to start finding their pipelines, their process for ingestion out to analysis. >> Let me get your reaction, you guys, your reactions to this tape. Because one of the things that I heard today, and I think this validates a bigger trend as we talk about the landscape of the markup from the event to how people are behaving and promoting and building products and companies. The pattern that I'm hearing, we said it multiple times on theCUBE today and one from the guy who's basically reading the script, is, in his interview, explaining 'cause it's so factual, I asked him the straight-up question, how do you deal with suppliers? What's happening is the trend is don't show me sizzle. I want to see the steak. Don't sell me hype, I got too many business things to work on right now, I need to nail down some core things. I got application development, I got security to build out big time, and then I got all those data channels that I need, I don't have time for you to sell me a hammer that might not be a hammer in the future! So I need real results, I need real performance that's going to have a business impact. That is the theme, and that trumps the hype. I see that becoming a huge thing right now. Your thoughts, reactions, guys-- >> Well I'll start-- >> What's your reaction then? True or false on the trend? Be-- >> Peter: True! >> Get down to business. >> I'll say that much, true, but go ahead. >> I'll say true as well, but let me just add some context. I think a show like O'Reilly Strata is good up to a point, especially to catalyze an industry, a growing industry like big data's own understanding of it, of the value that all these piece parts, Hadoop and Spark and so forth, can add, can provide when deployed in a unit according to some emerging patterns, whatever. But at a certain point where a space like this becomes well-established, it just becomes a pure marketing event. And customers, at a certain point say, you know, I come here for ideas about things that I can do in my environ, my business, that could actually many ways help me to do new things. You know, you can't get that at a marketing-oriented, you can get that, as a user, more at a research-oriented show. When it's an emerging market, like let's say Spark has been, like the Spark Summit was in the beginning, those are kind of like, when industries go through the phase those are sort of in the beginning, sort of research-focused shows where industry, the people who are doing the development of this new architecture, they talk ideas. Now I think in 2017, where we're at now, is what the idea is everybody's trying to get their heads around, they're all around AI, what the heck that is. For a show like an O'Reilly Ready show to have relevance in a market that's in this much ferment of really innovation around AI and deep learning, there needs to be a core research focus that you don't get at this point in the lifecycle of Strata, for example. So that's my take on what's going on. >> So, my take is this. And first of all, I agree with everything you said, so it's not in opposition to anything. Many years ago I had this thought that I think still is very true. And that is the value of industry, the value of infrastructure is inversely correlated with the degree to which anybody knows anything about it. So if I know a lot about my infrastructure, it's not creating a lot of business value. In fact, more often than not, it's not working, which is why people end up knowing more about it. But the problem is, the way that technology has always been sold is as a differentiated, some sort of value-add thing. So you end up with this tension. And this is an application domain, a very, very complex application domain like big data. The tension is, my tool is so great that, and it's differentiating all those other stuff, yeah but it becomes valuable to me if and only if nobody knows it exists. So I think, and one of the reasons why I bring this up, John, is many of the companies that are in the big data space today that are most successful are companies that are positioning themselves as a service. There's a lot of interesting SaaS applications for big data analysis, pipeline management, all the other things you can talk about, that are actually being rendered as a service, and not as a product. So that all you need to know is what the tool does. You don't need to know the tool. And I don't know that that's necessarily going to last, but I think it's very, very interesting that a lot of the more successful companies that we're talking to are themselves mere infrastructure SaaS companies. >> Because-- >> AtScale is interesting, though. They came in as a service. But their service has an interesting value proposition. They can allow you to essentially virtualize the data to play with it, so people can actually sandbox data. And if it gets traction, they can then double-down on it. So to me that's a freebie. To me, I'm a customer, I got to love that kind of environment because you're essentially giving almost a developer-like environment-- >> Peter: Value without necessarily-- >> Yeah, the cost, and the guy gets the signal from the marketplace, his customer, of what data resolves. To me that's a very cool scene. I don't, you saying that's bad, or? >> No, no, I think it's interesting. I think it's-- >> So you're saying service is-- >> So what I'm saying is, what I'm saying is, that the value of infrastructure is inversely proportional to the degree to which anybody knows anything about it. But you've got a bunch of companies who are selling, effectively, infrastructure software, so it's a value-add thing, and that creates a problem. And a lot of other companies not only have the ability to sell something as a service as opposed to a product, they can put the service froward, and people are using the service and getting what they need out of it without knowing anything about the tool. >> I like that. Let me just maybe possibly restate what you just said. When a market goes toward a SaaS go-to-market delivery model for solutions, the user, the buyer's focus is shifted away from what the solution can do, I mean, how it works under the cover. >> Peter: Quote, value-add-- >> To what it can do potentially for you. >> The business, that's right. >> But you're not going to, don't get distracted by the implementation details. You have then as a user become laser-focused on, wow, there's a bunch of things that this can do for me. I don't care how it works, really. You SaaS provider, you worry about that stuff. I can worry now about somehow extracting the value. I'm not distracted. >> This show, or this domain, is one of the domains where SaaS has moved, just as we're thinking about moving up the stack, the SaaS business model is moving down the stack in the big data world. >> All right, so, in summary, the stack is changing. Predictions for the next few days. What are we going to see come out of Strata Data, and our BigData NYC? 'Cause remember, this show was always a big hit, but it's very clear from the data on our dashboards, we're seeing all the social data. Microsoft Ignite is going on, and Microsoft Azure, just in the past few years, has burst on the scene. Cloud is sucking the oxygen out of the big data event. Or is it? >> I doubt it was sucking it out of the event, but you know, theCUBE is in, theCUBE is not at Ignite. Where's theCUBE right now? >> John: BigData NYC. >> No, it's here, but it's also at the Splunk show. >> John: That's true. >> And isn't it interesting-- >> John: We're sucking the data out of two events. >> Did a lot of people coming in, exactly. A lot of people coming-- >> We're live streaming in a streaming data kind of-- >> John just said we suck, there's that record saying that. >> We're sucking all the data. >> So we are-- >> We're sharing data. These videos are data-driven. >> Yeah, absolutely, but the point is, ultimately, is that, is that Splunk is an example of a company that's putting forward a service about how you do this and not necessarily a product focus. And a lot of the folks that are coming on theCUBE here are also going on to theCUBE down in Washington D.C., which is where the Splunk show's at. And so I think one of the things, one of the predictions I'll make, is that we're going to hear over the next couple of days more companies talk about their SaaS trash. >> Yeah, I mean I just think, I agree with you, but I also agree with the comments about the technology coming together. And here's one thing I want to throw on the table. I've gotten the sense a few times about connecting the dots on it, we'll put it out publicly for comment right now. The role that communities will play outside of developer, is going to be astronomical. I think we're seeing signals, certainly open-source communities have been around for a long time. They continue to grow shoulders of giants before them. Even these events like O'Reilly, which are a small community that they rely on is now not the only game in town. We're seeing the notion of a community strategy in things like Blockchain, you're seeing it in business, you're seeing people rolling out their recruitment to say, data scientists. You're seeing a community model developing in business, yes or no? >> Yes, but I would say, I would put it this way, John. That it's always been there. The difference is that we're now getting enough experience with things that have occurred, for example, collaboration, communal, communal collaboration in open-source software that people are now saying, and they've developed a bunch of social networking techniques where they can actually analyze how those communities work together, but now they're saying, hmm, I've figured out how to do an assessment analysis understanding that community. I'm going to see if I can take that same concept and apply it over here to how sales works, or how B-to-B engagement works, or how marketing gets conducted, or how sales and marketing work together. And they're discovering that the same way of thinking is actually very fruitful over there. So I totally agree, 100%. >> So they don't rely on other people's version of a community, they can essentially construct their own. >> They are, they are-- >> John: Or enabling their own. >> That's right, they are bringing that approach to thinking about a community-driven business and they're applying it to a lot of new ways, and that's very exciting. >> As the world gets connected with mobile and internet of things as we're seeing, it's one big online community. We're seeing things, I'm writing a post right now, what you could, what B-to-B markets should learn from the fake news problem. And that is content and infrastructure are now contextually tied together. >> Peter: Totally. >> And related. The payload of the fake news is also related to the gamification of the network effect, hence the targeting, hence the weaponization. >> Hey, we wrote the three Cs, we wrote a piece on the three Cs of strategy a year and a half ago. Content, community, context. And at the end of the day, the most important thing to what you're saying about, is that there is, you know, right now people talk about social networking. Social media, you think Facebook. Facebook is a community with a single context, stay in touch with your friends. >> Connections. >> Connections. But what you're really saying is that for the first time we're now going to see an enormous amount of technology being applied to the fullness of all the communities. We're going to see a lot more communities being created with the software, each driven by what content does, creates value, against the context of how it works, where the community's defined in terms of what do we do? >> Let me focus on the fact that bringing, using community as a framework for understanding how the software world is evolving. The software world is evolving towards, I've said this many times in my work about a resurge, the data scientists or data people, data science skills are the core developers in this new era. Now, what is data science all about at its heart? Machine learning, building, and training machine learning models. And so training machine learning models is everything towards making sure that they are fit for their predicted purpose of classification. Training data, where you get all the training data from to feed all, to train all these models? Where do you get all the human resources to label, to do the labeling of the data sets, and so forth, that you need communities, crowdsourcing and whatnot, and you need sustainable communities that can supply the data and the labeling services, and so forth, to be able to sustain the AI and machine learning revolution. So content, creating data and so forth, really rules in this new era, like-- >> The interest in machine learning is at an all-time high, I guess. >> Jim: Yeah, oh yeah, very much so. >> Got it, I agree. I think the social grab, interest grab, value grab is emerging. I think communities, content, context, communities are relevant. I think a lot of things are going to change, and that the scuttlebutt that I'm hearing in this area now is it's not about the big event anymore. It's about the digital component. I think you're seeing people recognize that, but they still want to do the face-to-face. >> You know what, that's right. That's right, they still want, let's put it this way. That there are, that the whole point of community is we do things together. And there are some things that are still easier to do together if we get together. >> But B-to-B marketing, you just can't say, we're not going to do events when there's a whole machinery behind events. Legion batch marketing, we call it. There's a lot of stuff that goes on in that funnel. You can't just say hey, we're going to do a blog post. >> People still need to connect. >> So it's good, but there's some online tools that are happening, so of course. You wanted to say something? >> Yeah, I just want to say one thing. Face to face validates the source of expertise. I don't really fully trust an expert, I can't in my heart engage with them, 'til I actually meet them and figure out in person whether they really do have the goods, or whether they're repurposing some thinking that they got from elsewhere and they gussy it up. So face, there's no substitute for face-to-face to validate the expertise. The expertise that you value enough to want to engage in your solution, or whatever it might be. >> Awesome, I agree. Online activities, the content, we're streaming the data, theCUBE, this is our annual event in New York City. We've got three days of coverage, Tuesday, Wednesday, Thursday, here, theCUBE in Manhattan, right around the corner from Strata Hadoop, the Javits Center of influencers. We're here with the VIPs, with the entrepreneurs, with the CEOs and all the top analysts from WikiBon and around the community. Be there tomorrow all day, day one wrap up is done. Thanks for watching, see you tomorrow. (rippling music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media, of the event in context to the massive growth is that the days of failure being measured by of potential buyers that are driving the change. and the keynotes are tomorrow at Strata, is that the productionization of the machine learning is that the skills gap is significant. But the old expression is if you're a hammer, of what I'm talking about. Again, so this is what you're getting at. and that's how you can see the pretenders from the winners. is the idea that ultimately we are going to see And a lot of the clients, a lot of customers from the event to how people are behaving of it, of the value that all these piece parts, And that is the value of industry, So to me that's a freebie. from the marketplace, his customer, of what data resolves. I think it's-- And a lot of other companies not only have the ability for solutions, the user, the buyer's focus To what it can do by the implementation details. is one of the domains where SaaS has moved, Cloud is sucking the oxygen out of the big data event. I doubt it was sucking it out of the event, but you know, Did a lot of people coming in, exactly. We're sharing data. And a lot of the folks that are coming on theCUBE here is now not the only game in town. and apply it over here to how sales works, of a community, they can essentially construct their own. and they're applying it to a lot of new ways, from the fake news problem. hence the targeting, hence the weaponization. And at the end of the day, the most important thing We're going to see a lot more communities being created that can supply the data and the labeling services, is at an all-time high, I guess. and that the scuttlebutt that I'm hearing And there are some things that are still easier to do There's a lot of stuff that goes on in that funnel. that are happening, so of course. The expertise that you value enough to want to engage and around the community.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Jim	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
2017	DATE	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Peter	PERSON	0.99+
Washington D.C.	LOCATION	0.99+
New York	LOCATION	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
two events	QUANTITY	0.99+
100%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
today	DATE	0.99+
Wednesday	DATE	0.99+
a year and a half ago	DATE	0.99+
Thursday	DATE	0.99+
one	QUANTITY	0.99+
Spark Summit	EVENT	0.99+
three days	QUANTITY	0.99+
Tuesday	DATE	0.98+
Javits Center	LOCATION	0.98+
Splunk	ORGANIZATION	0.98+
Paxata	ORGANIZATION	0.98+
Facebook	ORGANIZATION	0.98+
next year	DATE	0.97+
this year	DATE	0.97+
SaaS	TITLE	0.97+
day one	QUANTITY	0.96+
NYC	LOCATION	0.96+
first	QUANTITY	0.96+
one thing	QUANTITY	0.96+
WikiBon	ORGANIZATION	0.95+
one show	QUANTITY	0.94+
one shows	QUANTITY	0.94+
BigData	ORGANIZATION	0.94+
Many years ago	DATE	0.93+
Strata	LOCATION	0.93+
Strata Hadoop	LOCATION	0.92+
each	QUANTITY	0.91+
three Cs	QUANTITY	0.9+
Javits Center	ORGANIZATION	0.89+
midtown Manhattan	LOCATION	0.88+
theCUBE	ORGANIZATION	0.87+
Strata	TITLE	0.87+
past few years	DATE	0.87+

Prakash Nanduri, Paxata | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. (upbeat techno music) >> Hey, welcome back, everyone. Here live in New York City, this is theCUBE from SiliconANGLE Media Special. Exclusive coverage of the Big Data World at NYC. We call it Big Data NYC in conjunction also with Strata Hadoop, Strata Data, Hadoop World all going on kind of around the corner from our event here on 37th Street in Manhattan. I'm John Furrier, the co-host of theCUBE with Peter Burris, Head of Research at SiliconANGLE Media, and General Manager of WikiBon Research. And our next guest is one of our famous CUBE alumni, Prakash Nanduri co-founder and CEO of Paxata who launched his company here on theCUBE at our first inaugural Big Data NYC event in 2013. Great to see you. >> Great to see you, John. >> John: Great to have you back. You've been on every year since, and it's been the lucky charm. You guys have been doing great. It's not broke, don't fix it, right? And so theCUBE is working with you guys. We love having you on. It's been a pleasure, you as an entrepreneur, launching your company. Really, the entrepreneurial mojo. It's really what it's all about. Getting access to the market, you guys got in there, and you got a position. Give us the update on Paxata. What's happening? >> Awesome, John and Peter. Great to be here again. Every time I come here to New York for Strata I always look forward to our conversations. And every year we have something exciting and new to share with you. So, if you recall in 2013, it was a tiny little show, and it was a tiny little company, and we came in with big plans. And in 2013, I said, "You know, John, we're going to completely disrupt the way business consumers and business analysts turn raw data into information and they do self-service data preparation." That's what we brought to the market in 2013. Ever since, we have gone on to do something really exciting and new for our customers every year. In '14, we came in with the first Apache Spark-based platform that allowed business analysts to do data preparation at scale interactively. Every year since, last year we did enterprise grade and we talked about how Paxata is going to be delivering our self-service data preparation solution in a highly-scalable enterprise grade deployment world. This year, what's super exciting is in addition to the recent announcements we made on Paxata running natively on the Microsoft Azure HDI Spark system. We are truly now the only information platform that allows business consumers to turn data into information in a multi-cloud hybrid world for our enterprise customers. In the last few years, I came and I talked to you and I told you about work we're doing and what great things are happening. But this year, in addition to the super-exciting announcements with Microsoft and other exciting announcements that you'll be hearing. You are going to hear directly from one of our key anchor customers, Standard Chartered Bank. 150-year-old institution operating in over 46 countries. One of the most storied banks in the world with 87,500 employees. >> John: That's not a start up. >> That's not a start up. (John laughs) >> They probably have a high bar, high bar. They got a lot of data. >> They have lots of data. And they have chosen Paxata as their information fabric. We announced our strategic partnership with them recently and you know that they are going to be speaking on theCUBE this week. And what started as a little experiment, just like our experiment in 2013, has actually mushroomed now into Michael Gorriz, and Shameek Kundu, and the entire leadership of Standard Chartered choosing Paxata as the platform that will democratize information in the bank across their 87,500 employees. We are going in a very exciting way, a very fast way, and now delivering real value to the bank. And you can hear all about it on our website-- >> Well, he's coming on theCUBE so we'll drill down on that, but banks are changing. You talk about a transformation. What is a teller? An Internet of Things device. The watch potentially could be a terminal. So, the Internet of Things of people changes the game. Are the ATMs going to go away and become like broadcast points? >> Prakash: And you're absolutely right. And really what it is about is, it doesn't matter if you're a Standard Chartered Bank or if you're a pharma company or if you're the leading healthcare company, what it is is that everyone of our customers is really becoming an information-inspired business. And what we are driving our customers to is moving from a world where they're data-driven. I think being data-driven is fine. But what you need to be is information-inspired. And what does that mean? It means that you need to be able to consume data, regardless of format, regardless of source, regardless of where it's coming from, and turn it into information that actually allows you to get inside in decisions. And that's what Paxata does for you. So, this whole notion of being information-inspired, I don't care if you're a bank, if you're a car company, or if you're a healthcare company today, you need to have-- >> Prakash, for the folks watching that might not know our history as you launched on theCUBE in 2013 and have been successful every year since. You guys have really deploying the classic entrepreneurial success formula, be fast, walk the talk, listen to customers, add value. Take a minute quickly just to talk about what you guys do. Just for the folks that don't know you. >> Absolutely, let's just actually give it in the real example of you know, a customer like Standard Chartered. Standard Chartered operates in multiple countries. They have significant number of lines of businesses. And whether it's in risk and compliance, whether it is in their marketing department, whether it's in their corporate banking business, what they have to do is, a simple example could be I want to create a customer list to be able to go and run a marketing campaign. And the customer list in a particular region is not something easy for a bank like Standard Charter to come up with. They need to be able to pull from multiple sources. They need to be able to clean the data. They need to be able to shape the data to get that list. And if you look at what is really important, the people who understand the data are actually not the folks in IT but the folks in business. So, they need to have a tool and a platform that allows them to pull data from multiple sources to be able to massage it, to be able to clean it-- >> John: So, you sell to the business person? >> We sell to the business consumer. The business analyst is our consumer. And the person who supports them is the chief data officer and the person who runs the Paxata platform on their data lake infrastructure. >> So, IT sets the data lake and you guys just let the business guys go to town on the data. >> Prakash: Bingo. >> Okay, what's the problem that you solve? If you can summarize the problem that you solve for the customers, what is it? >> We take data and turn it into information that is clean, that's complete, that's consumable and that's contextual. The hardest problem in every analytical exercise is actually taking data and cleaning it up and getting it ready for analytics. That's what we do. >> It's the prep work. >> It's the prep work. >> As companies gain experience with Big Data, John, what they need to start doing increasingly is move more of the prep work or have more of the prep work flow closer to the analyst. And the reason's actually pretty simple. It's because of that context. Because the analyst knows more about what their looking for and is a better evaluator of whether or not they get what they need. Otherwise, you end up in this strange cycle time problem between people in back end that are trying to generate the data that they think they want. And so, by making the whole concept of data preparation simpler, more straight forward, you're able to have the people who actually consume the data and need it do a better job of articulating what they need, how they need it and making it presentable to the work that they're performing. >> Exactly, Peter. What does that say about how roles are starting to merge together? Cause you've got to be at the vanguard of seeing how some of these mature organizations are working. What do you think? Are we seeing roles start to become more aligned? >> Yes, I do think. So, first and foremost, I think what's happening is there is no such thing as having just one group that's doing data science and another group consuming. I think what you're going to be going into is the world of data and information isn't all-consuming and that everybody's role. Everybody has a role in that. And everybody's going to consume. So, if you look at a business analyst that was spending 80% of their time living in Excel or working with self-service BI tools like our partner's Tableau and Power BI from Microsoft, others. What you find is these people today are living in a world where either they have to live in coding scripting world hell or they have to rely on IT to get them the real data. So, the role of a business analyst or a subject matter expert, first and foremost, the fact that they work with data and they need information that's a given. There is no business role today where you can't deal with data. >> But it also makes them real valuable, because there aren't a lot of people who are good at dealing with data. And they're very, very reliant on these people to turn that data into something that is regarded as consumable elsewhere. So, you're trying to make them much more productive. >> Exactly. So, four years years ago, when we launched on theCUBE, the whole premise was that in order to be able to really drive towards a world where you can make information and data-driven decisions, you need to ensure that the business analyst community, or what I like to call the business consumer needs to have the power of being able to, A, get access to data, B, make sense of the data, and then turn that data into something that's valuable for her or for him. >> Peter: And others. >> And others, and others. Absolutely. And that's what Paxata is doing. In a collaborative, in a 21st Century world where I don't work in a silo, I work collaboratively. And then the tool, and the platform that helps me do that is actually a 21st Century platform. >> So, John, at the beginning of the session you and Jim were talking about what is going to be one of the themes here at the show. And we observed that it used to be that people were talking about setting up the hardware, setting up the clutters, getting Hadoop to work, and Jim talked about going up the stack. Well, this is one of the indicators that, in fact, people were starting to go up the stack because they're starting to worry more about the data, what it can do, the value of how it's going to be used, and how we distribute more of that work so that we get more people using data that's actually good and useful to the business. >> John: And drives value. >> And drives value. >> Absolutely. And if I may, just put a chronological aspect to this. When we launched the company we said the business analyst needs to be in charge of the data and turning the data into something useful. Then right at that time, the world of create data lakes came in thanks to our partners like Cloudera and Hortonworks, and others, and MapR and others. In the recent past, the world of moving from on premise data lakes to hybrid, multicloud data lakes is becoming reality. Our partners at Microsoft, at AWS, and others are having customers come in and build cloud-based data lakes. So, today what you're seeing is on one hand this complete democratization within the business, like at Standard Chartered, where all these business analysts are getting access to data. And on the other hand, from the data infrastructure moving into a hybrid multicloud world. And what you need is a 21st Century information management platform that serves the need of the business and to make that data relevant and information and ready for their consumption. While at the same time we should not forget that enterprises need governance. They need lineage. They need scale. They need to be able to move things around depending on what their business needs are. And that's what Paxata is driving. That's why we're so excited about our partnership with Microsoft, with AWS, with our customer partnerships such as Standard Chartered Bank, rolling this out in an enterprise-- >> This is a democratization that you were referring to with your customers. We see this-- >> Everywhere. >> When you free the data up, good things happen but you don't want to have IT be the constraint, you want to let them enable-- >> Peter: And IT doesn't want to be the constraint. >> They don't. >> This is one of the biggest problems that they have on a daily basis. >> They're happy to let it go free as long as it's in they're mind DevOps-like related, this is cool for them. >> Well, they're happy to let it go with policy and security in place. >> Our customers, our most strategic customers, the folks who are running the data lakes, the folks who are managing the data lakes, they are the first ones that say that we want business to be able to access this data, and to be able to go and make use out of this data in the right way for the bank. And not have us be the impediment, not have us be the roadblock. While at the same time we still need governance. We still need security. We still need all those things that are important for a bank or a large enterprise. That's what Paxata is delivering to the customers. >> John: So, what's next? >> Peter: Oh, I'm sorry. >> So, really quickly. An interesting observation. People talk about data being the new fuel of business. That really doesn't work because, as Bill Schmarzo says, it's not the new fuel of business, it's new sunlight of business. And the reason why is because fuel can only be used once. >> Prakash: That's right. >> The whole point of data is that it can be used a lot, in a lot of different ways, and a lot of different contexts. And so, in many respects what we're really trying to facilitate or if someone who runs a data lake when someone in the business asks them, "Well, how do you create value for the business?" The more people, the more users, the more context that they're serving out of that common data, the more valuable the resource that they're administering. So, they want to see more utilization, more contexts, more data being moved out. But again, governance, security have to be in place. >> You bet, you bet. And using that analogy of data, and I've heard this term about data being the new oil, etc. Well, if data is the oil, information is really the refined fuel or sunlight as we like to call it. >> Peter: Yeah. >> John: Well, you're riffing on semantics, but the point is it's not a one trick pony. Data is part of the development, I wrote a blog post in 1997, I mean 2007 that said data's the new development kit. And it was kind of riffing on this notion of the old days >> Prakash: You bet. >> Here's your development kit, SDK, or whatever was how people did things back then Enter the cloud, >> Prakash: That's right. >> And boom, there it is. The data now is in the process of the refinery the developers wanted. The developers want the data libraries. Whatever that means. That's where I see it. And that is the democratization where data is available to be integrated in to apps, into feeds, into ... >> Exactly, and so it brings me to our point about what was the exciting, new product innovation announcement we made today about Intelligent Ingest. You want to be able to access data in the enterprise regardless of where it is, regardless of the cloud where it's sitting, regardless of whether it's on-premise, in the cloud. You don't need to as a business worry about whether that is a JSON file or whether that's an XML file or that's a relational file. That's irrelevant. What you want is, do I have the access to the right data? Can I take that data, can I turn it into something valuable and then can I make a decision out of it? I need to do that fast. At the same time, I need to have the governance and security, all of that. That's at the end of the day the objective that our customers are driving towards. >> Prakash, thanks so much for coming on and being a great member of our community. >> Fantastic. >> You're part of our smart network of great people out there and entrepreneurial journey continues. >> Yes. >> Final question. Just observation. As you pinch yourself and you go down the journey, you guys are walking the talk, adding new products. We're global landscape. You're seeing a lot of new stuff happening. Customers are trying to stay focused. A lot of distractions whether security or data or app development. What's your state of the industry? How do you view the current market, from your perspective and also how the customer might see it from their impact? >> Well, the first thing is that I think in the last four years we have seen significant maturity both on the providers off software technology and solutions, and also amongst the customers. I do think that going forward what is really going to make a difference is one really driving towards business outcomes by leveraging data. We've talked about a lot of this over the last few years. What real business outcomes are you delivering? What we are super excited is when we see our customers each one of them actually subscribes to Paxata, we're a SAS company, they subscribe to Paxata not because they're doing the science experiment but because they're trying to deliver real business value. What is that? Whether that is a risk in compliance solution which is going to drive towards real cost savings. Or whether that's a top line benefit because they know what they're customer 360 is and how they can go and serve their customers better or how they can improve supply chains or how they can optimize their entire efficiency in the company. I think if you take it from that lens, what is going to be important right now is there's lots of new technologies coming in, and what's important is how is it going to drive towards those top three business drivers that I have today for the next 18 months? >> John: So, that's foundational. >> That's foundational. Those are the building blocks-- >> That's what is happening. Don't jump... If you're a customer, it's great to look at new technologies, etc. There's always innovation projects-- >> RND, GPOCs, whatever. Kick the tires. >> But now, if you are really going to talk the talk about saying I'm going to be, call your word, data-driven, information-driven, whatever it is. If you're going to talk the talk, then you better walk the walk by delivering the real kind of tools and capabilities that you're business consumers can adopt. And they better adopt that fast. If they're not up and running in 24 hours, something is wrong. >> Peter: Let me ask one question before you close, John. So, you're argument, which I agree with, suggests that one of the big changes in the next 18 months, three years as this whole thing matures and gets more consistent in it's application of the value that it generates, we're going to see an explosion in the number users of these types of tools. >> Prakash: Yes, yes. >> Correct? >> Prakash: Absolutely. >> 2X, 3X, 5X? What do you think? >> I think we're just at the cusp. I think is going to grow up at least 10X and beyond. >> Peter: In the next two years? >> In the next, I would give that next three to five years. >> Peter: Three to five years? >> Yes. And we're on the journey. We're just at the tip of the high curve taking off. That's what I feel. >> Yeah, and there's going to be a lot more consolidation. You're going to start to see people who are winning. It's becoming clear as the fog lifts. It's a cloud game, a scale game. It's democratization, community-driven. It's open source software. Just solve problems, outcomes. I think outcome is going to be much faster. I think outcomes as a service will be a model that we'll probably be talking about in the future. You know, real time outcomes. Not eight month projects or year projects. >> Certainly, we started writing research about outcome-based management. >> Right. >> Wikibon Research... Prakash, one more thing? >> I also just want to say that in addition to this business outcome thing, I think in the last five years I've seen a lot of shift in our customer's world where the initial excitement about analytics, predictive, AI, machine-learning to get to outcomes. They've all come into a reality that none of that is possible if you're not able to handle, first get a grip on your data, and then be able to turn that data into something meaningful that can be analyzed. So, that is also a major shift. That's why you're seeing the growth we're seeing-- >> John: Cause it's really hard. >> Prakash: It's really hard. >> I mean, it's a cultural mindset. You have the personnel. It's an operational model. I mean this is not like, throw some pixie dust on it and it magically happens. >> That's why I say, before you go into any kind of BI, analytics, AI initiative, stop, think about your information management strategy. Think about how you're going to democratize information. Think about how you're going to get governance. Think about how you're going to enable your business to turn data into information. >> Remember, you can't do AI with IA? You can't do AI without information architecture. >> There you go. That's a great point. >> And I think this all points to why Wikibon's research have all the analysts got it right with true private cloud because people got to take care of their business here to have a foundation for the future. And you can't just jump to the future. There's too much just to come and use a scale, too many cracks in the foundation. You got to do your, take your medicine now. And do the homework and lay down a solid foundation. >> You bet. >> All right, Prakash. Great to have you on theCUBE. Again, congratulations. And again, it's great for us. I totally have a great vibe when I see you. Thinking about how you launched on theCUBE in 2013, and how far you continue to climb. Congratulations. >> Thank you so much, John. Thanks, Peter. That was fantastic. >> All right, live coverage continuing day one of three days. It's going to be a great week here in New York City. Weather's perfect and all the players are in town for Big Data NYC. I'm John Furrier with Peter Burris. Be back with more after this short break. (upbeat techno music).

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE with Peter Burris, and it's been the lucky charm. In the last few years, I came and I talked to you That's not a start up. They got a lot of data. and Shameek Kundu, and the entire leadership Are the ATMs going to go away and turn it into information that actually allows you Take a minute quickly just to talk about what you guys do. And the customer list in a particular region and the person who runs the Paxata platform and you guys just let the business guys and that's contextual. is move more of the prep work or have more of the prep work are starting to merge together? And everybody's going to consume. to turn that data into something that is regarded to be able to really drive towards a world And that's what Paxata is doing. So, John, at the beginning of the session of the business and to make that data relevant This is a democratization that you were referring to This is one of the biggest problems that they have They're happy to let it go free as long as Well, they're happy to let it go with policy and to be able to go and make use out of this data And the reason why is because fuel can only be used once. out of that common data, the more valuable Well, if data is the oil, I mean 2007 that said data's the new development kit. And that is the democratization At the same time, I need to have the governance and being a great member of our community. and entrepreneurial journey continues. How do you view the current market, and also amongst the customers. Those are the building blocks-- it's great to look at new technologies, etc. Kick the tires. the real kind of tools and capabilities in it's application of the value that it generates, I think is going to grow up at least 10X and beyond. We're just at the tip of Yeah, and there's going to be a lot more consolidation. Certainly, we started writing research Prakash, one more thing? and then be able to turn that data into something meaningful You have the personnel. to turn data into information. Remember, you can't do AI with IA? There you go. And I think this all points to Great to have you on theCUBE. Thank you so much, John. It's going to be a great week here in New York City.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2013	DATE	0.99+
Peter	PERSON	0.99+
Prakash	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Prakash Nanduri	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
1997	DATE	0.99+
New York	LOCATION	0.99+
Three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Michael Gorriz	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
2007	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
87,500 employees	QUANTITY	0.99+
Paxata	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
last year	DATE	0.99+
37th Street	LOCATION	0.99+
SAS	ORGANIZATION	0.99+
WikiBon Research	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Excel	TITLE	0.99+
24 hours	QUANTITY	0.99+
One	QUANTITY	0.99+
this year	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
This year	DATE	0.99+
21st Century	DATE	0.99+
one	QUANTITY	0.99+
eight month	QUANTITY	0.99+
one question	QUANTITY	0.99+
four years years ago	DATE	0.99+
3X	QUANTITY	0.99+
5X	QUANTITY	0.99+
first	QUANTITY	0.99+
three years	QUANTITY	0.99+

Itamar Ankorian, Attunity | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsor. >> Okay, welcome back, everyone, to our live special CUBE coverage in New York City in Manhattan, we're here in Hell's Kitchen for theCUBE's exclusive coverage of our Big Data NYC event and Strata Data, which used to be called Strata Hadoop, used to be Hadoop World, but our event, Big Data NYC, is our fifth year where we gather every year to see what's going on in big data world and also produce all of our great research. I'm John Furrier, the co-host of theCUBE, with Peter Burris, head of research. Our next guest, Itamar Ankorion, who's the Chief Marketing Officer at Attunity. Welcome back to theCUBE, good to see you. >> Thank you very much. It's good to be back. >> We've been covering Attunity for many, many years. We've had many conversations, you guys have had great success in big data, so congratulations on that. But the world is changing, and we're seeing data integration, we've been calling this for multiple years, that's not going away, people need to integrate more. But with cloud, there's been a real focus on accelerating the scale component with an emphasis on ease of use, data sovereignty, data governance, so all these things are coming together, the cloud has amplified. What's going on in the big data world, and it's like, listen, get movin' or you're out of business has pretty much been the mandate we've been seeing. A lot of people have been reacting. What's your response at Attunity these days because you have successful piece parts with your product offering? What's the big update for you guys with respect to this big growth area? >> Thank you. First of all, the cloud data lakes have been a major force, changing the data landscape and data management landscape for enterprises. For the past few years, I've been working closely with some of the world's leading organizations across different industries as they deploy the first and then the second and third iteration of the data lake and big data architectures. And one of the things, of course, we're all seeing is the move to cloud, whether we're seeing enterprises move completely to the cloud, kind of move the data lakes, that's where they build them, or actually have a hybrid environment where part of the data lake and data works analytics environment is on prem and part of it is in the cloud. The other thing we're seeing is that the enterprises are starting to mix more of the traditional data lake, the cloud is the platform, and streaming technologies is the way to enable all the modern data analytics that they need, and that's what we have been focusing on on enabling them to use data across all these different technologies where and when they need it. >> So, the sum of the parts is worth more if it's integrated together seems to be the positioning, which is great, it's what customers want, make it easier. What is the hard news that you guys have, 'cause you have some big news? Let's get to the news real quick. >> Thank you very much. We did, today, we have announced, we're very excited about it, we have announced a new big release of our data integration platform. Our modern platform brings together Attunity Replicate, Attunity Compose for Hive, and Attunity Enterprise Manager, or AEM. These are products that we've evolved significantly, invested a lot over the last few years to enable organizations to use data, make data available, and available in the real time across all these different platforms, and then, turn this data to be ready for analytics, especially in Hive and Hadoop environments on prem and now also in the cloud. Today, we've announced a major release with a lot of enhancements across the entire product line. >> Some people might know you guys for the Replicate piece. I know that this announcement was 6.0, but as you guys have the other piece part to this, really it's about modernization of kind of old-school techniques. That's really been the driver of your success. What specifically in this announcement makes it, you know, really work well for people who move in real time, they want to have good data access. What's the big aha for the customers out there with Attunity on this announcement? >> That's a great question, thank you. First of all is that we're bringing it all together. As you mentioned, over the past few years, Attunity Replicate has emerged as the choice of many Fortune 100 and other companies who are building modern architectures and moving data across different platforms, to the cloud, to their lakes, and they're doing it in a very efficient way. One of the things we've seen is that they needed the flexibility to adapt as they go through their journey, to adapt different platforms, and what we give them with Replicate was the flexibility to do so. We give them the flexibility, we give them the performance to get the data and efficiency to move only the change of the data as they happen and to do that in a real-time fashion. Now, that's all great, but once the data gets to the data lake, how do you then turn it into valuable information? That's when we introduced Compose for Hive, which we talked about in our last session a few month ago, which basically takes the next stage in the pipeline picking up incremental, continuous data that is fed into the data lake and turning those into operational data store, historical data stores, data store that's basically ready for analytics. What we've done with this release that we're really excited about is putting all of these together in a more integrated fashion, putting Attunity Enterprise Manager on top of it to help manage larger scale environments so customers can move faster in deploying these solutions. >> As you think about the role that Attunity's going to play over time, though, it's going to end up being part of a broader solution for how you handle your data. Imagine for a second the patterns that your customers are deploying. What is Attunity typically being deployed with? >> That's a great question. First of all, we're definitely part of a large ecosystem for building the new data architecture, new data management with data integration being more than ever a key part of that bigger ecosystem because as all they actually have today is more islands with more places where the data needs to go, and to your point, more patterns in which the data moves. One of those patterns that we've seen significantly increase in demand and deployment is streaming. Where data used to be batch, now we're all talking about streaming. Kafka has emerged as a very common platform, but not only Kafka. If you're on Amazon Web Services, you're using Kinesis. If you're in Azure, you're using Azure Event Hubs. You have different streaming technologies. That's part of how this has evolved. >> How is that challenge? 'Cause you just bring up a good point. I mean, with the big trend that customers want is they want either the same code basis on prem and that they have the hybrid, which means the gateway, if you will, to the public cloud. They want to have the same code base, or move workloads between different clouds, multi-cloud, it seems to be the Holy Grail, we've identified it. We are taking the position that we think multi-cloud will be the preferred architecture going forward. Not necessarily this year, but it's going to get there. But as a customer, I don't want to have to rebuild employees and get skill development and retraining on Amazon, Azure, Google. I mean, each one has its own different path, you mentioned it. How do you talk to customers about that because they might be like, whoa, I want it, but how do I work in that environment? You guys have a solution for that? >> We do, and in fact, one of the things we've seen, to your point, we've seen the adoption of multiple clouds, and even if that adoption is staged, what we're seeing is more and more customers that are actually referring to the term lock-in in respect to the cloud. Do we put all the eggs in one cloud, or do we allow ourselves the flexibility to move around and use different clouds, and also mitigate our risk in that respect? What we've done from that perspective is first of all, when you use the Attunity platform, we take away all the development complexity. In the Attunity platform, it is very easy to set up. Your data flow is your data pipelines, and it's all common and consistent. Whether you're working on prem, whether you work on Amazon Web Services, on Azure, or on Google or other platforms, it all looks and feels the same. First of all, and you solve the issue of the diversity, but also the complexity, because what we've done is, this is one of the big things that Attunity is focused on was reducing the complexity, allowing to configure these data pipelines without development efforts and resources. >> One of the challenges, or one of the things you typically do to take complexity out is you do a better job of design up front. And I know that Attunity's got a tool set that starts to address some of of these things. Take us a little bit through how your customers are starting to think in terms of designing flows as opposed to just cobbling together things in a bespoke way. How is that starting to change as customers gain experience with large data sets, the ability, the need to aggregate them, the ability to present them to developers in different ways? >> That's a great point, and again, one of the things we've focused on is to make the process of developing or configuring these different data flows easy and modular. First, while in Attunity you can set up different flows in different patterns, and you can then make them available to others for consumption. Some create the data ingestion, or some create the data ingestion and then create a data transformation with Compose for Hive, and with Attunity Enterprise Manager, we've now also introduced APIs that allow you to create your own microservices, consuming and using the services enabled by the platform, so we provide more flexibility to put all these different solutions together. >> What's the biggest thing that you see from a customer standpoint, from a problem that you solve? If you had to kind of lay it out, you know the classic, hey, what problem do you solve? 'Cause there are many, so take us through the key problem, and then, if there's any secondary issues that you guys can address customers, that seems the way conversation starts. What are key problems that you solve? >> I think one of the major problems that we solve is scale. Our customers that are deploying data lakes are trying to deploy and use data that is coming, not from five or 10 or even 50 data sources, we work at hundreds going on thousands of data sources now. That in itself represents a major challenge to our customers, and we're addressing it by dramatically simplifying and making the process of setting those up very repeatable, very easy, and then providing the management facility because when you have hundreds or thousands, management becomes a bigger issue to operationalize it. We invested a lot in a management facility for those, from a monitoring, control, security, how do you secure it? The data lake is used by many different groups, so how do we allow each group to see and work only on what belongs to that group? That's part it, too. So again, the scale is the major thing there. The other one is real timeliness. We talked about the move to streaming, and a lot of it is in order to enable streaming analytics, real-time analytics. That's only as good as your data, so you need to capture data in real time. And that of course has been our claim to fame for a long time, being the leading independent provider of CDC, change data capture technology. What we've done now, and also expanded significantly with the new release, version six, is creating universal database streaming. >> What is that? >> We take databases, we take databases, all the enterprise databases, and we turn them into live streams. When you think, by the way, by the most common way that people have used, customers have used to bring data into the lake from a database, it was Scoop. And Scoop is a great, easy software to use from an open source perspective, but it's scripting and batch. So, you're building your new modern architecture with the two are effectively scripting and batch. What we do with CDC is we enable to take a database, and instead of the database being something you come to periodically to read it, we actually turn it into a live feed, so as the data changes in the database, we stream it, we make it available across all these different platforms. >> Changes the definition of what live streaming is. We're live streaming theCUBE, we're data. We're data streaming, and you get great data. So, here's the question for you. This is a good topic, I love this topic. Pete and I talk about this all the time, and it's been addressed in the big data world, but it's kind of, you can see the pattern going mainstream in society globally, geopolitically and also in society. Batch processing and data in motion are real time. Streaming brings up this use case to the end customer, which is this is the way they've done it before, certainly store things in data lakes, that's not going to go away, you're going to store stuff, but the real gain is in motion. >> Itamar: Correct. >> How do you describe that to a customer when you go out and say, hey, you know, you've been living in a batch world, but wake up to the real world called real time. How do you get to them to align with it? Some people get it right away, I see that, some people don't. How do you talk about that because that seems to be a real cultural thing going on right now, or operational readiness from the customer standpoint? Can you just talk through your feeling on that? >> First of all, this often gets lost in translation, and we see quite a few companies and even IT departments that when you talk, when they refer to real time, or their business tells them we need real time, what they understand from it is when you ask for the data, the response will be immediate. You get real time access to the data, but the data is from last week. So, we get real time access, but for last week's data. And that's what we try to do is to basically say, wait a second, when you mean real time, what does real time mean? And we start to understand what is the meaning of using last week's data versus, or yesterday's data, over the real time data, and that makes a big difference. We actually see that today the access, the availability, the availability to act on the real time data, that's the frontier of competitive differentiation. That's what makes a customer experience better, that's what makes the business more operationally efficient than the competition. >> It's the data, not so much the process of what they used to do. They're version of real time is I responded to you pretty quickly. >> Exactly, the other thing that's interesting is because we see it with, again, change of the capture becoming a critical component of the modern data architecture. Traditionally, we used to talk about different type of tools and technology, now CDC itself is becoming a critical part of it, and the reason is that it serves and it answers a lot of fundamental needs that are now becoming critical. One is the need for real-time data. The other one is efficiency. If you're moving to the cloud, and we talked about this earlier, if you're data lake is going to be in the cloud, there's no way you're going to reload all your data because the bandwidth is going to get in the way. So, you have to move only the delta. You need the ability to capture and move only the delta, so CDC becomes fundamental both in enabling the real time as well the efficient, the low-impact data integration. >> You guys have a lot of partners, technology partners, global SIs, resellers, a bunch of different partnership levels. The question I have for you, love to get your reaction and share your insight into is, okay, as the relationship to the customer who has the problem, what's in it for me? I want to move my business forward, I want to do digital business, I need to get up my real-time data as it's happening. Whether it's near real time or real time, that's evolution, but ultimately, they have to move their developers down a certain path. They'll usually hire a partner. The relationship between partners and you, the supplier to the customer, has changed recently. >> That's correct. >> How is that evolving? >> First of all, it's evolving in several ways. We've invested on our part to make sure that we're building Attunity as a leading vendor in the ecosystem of they system integration consulting companies. We work with pretty much all the major global system integrators as well as regional ones, boutique ones, that focus on the emerging technologies as well as get the modern analytic-type platforms. We work a lot with plenty of them on major corporate data center-level migrations to the cloud. So again, the motivations are different, but we invest-- >> More specialized, are you seeing more specialty, what's the trend? >> We've been a technology partner of choice to both Amazon and Microsoft for enabling, facilitating the data migration to the cloud. They of course, their select or preferred group of partners they work with, so we all come together to create these solutions. >> Itamar, what's the goals for Attunity as we wrap up here? I give you the last word, as you guys have this big announcement, you're bringing it all together. Integrating is key, it's always been your ethos in the company. Where is this next level, what's the next milestone for you guys? What do you guys see going forward? >> First of all, we're going to continue to modernize. We're really excited about the new announcement we did today, Replicate six, AEM six, a new version of Compose for Hive that now also supports small data lakes, Aldermore, Scaldera, EMR, and a key point for us was expanding AEM to also enable analytics on the data we generate as data flows through it. The whole point is modernizing data integration, providing more intelligence in the process, reducing the complexity, and facilitating the automation end-to-end. We're going to continue to solve, >> Automation big, big time. >> Automation is a big thing for us, and the point is, you need to scale. In order to scale, we want to generate things for you so you don't to develop for every piece. We automate the automation, okay. The whole point is to deliver the solution faster, and the way we're going to do it is to continue to enhance each one of the products in its own space, if it's replication across systems, Compose for Hive for transformations in pipeline automation, and AEM for management, but also to create integration between them. Again, for us it's to create a platform that for our customers they get more than the sum of the parts, they get the unique capabilities that we bring together in this platform. >> Itamar, thanks for coming onto theCUBE, appreciate it, congratulations to Attunity. And you guys bringing it all together, congratulations. >> Thank you very much. >> This theCUBE live coverage, bringing it down here to New York City, Manhattan. I'm John Furrier, Peter Burris. Be right back with more after this short break. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE, Thank you very much. What's the big update for you guys the move to cloud, whether we're seeing enterprises What is the hard news that you guys have, and available in the real time That's really been the driver of your success. the flexibility to adapt as they go through their journey, Imagine for a second the patterns and to your point, more patterns in which the data moves. We are taking the position that we think multi-cloud We do, and in fact, one of the things we've seen, the ability to present them to developers in different ways? one of the things we've focused on is What's the biggest thing that you see We talked about the move to streaming, and instead of the database being something and it's been addressed in the big data world, or operational readiness from the customer standpoint? the availability to act on the real time data, I responded to you pretty quickly. because the bandwidth is going to get in the way. the supplier to the customer, has changed boutique ones, that focus on the emerging technologies facilitating the data migration to the cloud. What do you guys see going forward? on the data we generate as data flows through it. and the point is, you need to scale. And you guys bringing it all together, congratulations. it down here to New York City, Manhattan.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
John Furrier	PERSON	0.99+
five	QUANTITY	0.99+
last week	DATE	0.99+
New York City	LOCATION	0.99+
Itamar	PERSON	0.99+
second	QUANTITY	0.99+
CDC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Today	DATE	0.99+
Pete	PERSON	0.99+
50 data sources	QUANTITY	0.99+
10	QUANTITY	0.99+
Itamar Ankorian	PERSON	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
each group	QUANTITY	0.99+
yesterday	DATE	0.99+
fifth year	QUANTITY	0.99+
One	QUANTITY	0.99+
today	DATE	0.99+
First	QUANTITY	0.99+
Attunity Replicate	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
one	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
NYC	LOCATION	0.98+
Attunity	ORGANIZATION	0.97+
Aldermore	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one cloud	QUANTITY	0.97+
this year	DATE	0.97+
EMR	ORGANIZATION	0.96+
Big Data	EVENT	0.96+
Kafka	TITLE	0.95+
each one	QUANTITY	0.95+
Scaldera	ORGANIZATION	0.95+
thousands	QUANTITY	0.94+
Azure	ORGANIZATION	0.94+
Strata Hadoop	EVENT	0.94+
New York City, Manhattan	LOCATION	0.94+
6.0	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
Azure Event Hubs	TITLE	0.91+
2017	EVENT	0.91+
a second	QUANTITY	0.91+
Hive	TITLE	0.9+
rtune 100	ORGANIZATION	0.9+
CUBE	ORGANIZATION	0.9+
few month ago	DATE	0.88+
Attunity Enterprise Manager	TITLE	0.83+
thousands of data sources	QUANTITY	0.83+
2017	DATE	0.82+
AEM	TITLE	0.8+
third iteration	QUANTITY	0.79+
version six	QUANTITY	0.78+

Tim Smith, AppNexus | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back, everyone. Live in Manhattan, New York City, in Hell's Kitchen, this is theCUBE's special event, our annual CUBE-Wikibon Research Big Data event in Manhattan. Alongside Strata, Hadoop; formerly Hadoop World, now called Strata Data, as the world continues. This is our annual event; it's our fifth year here, sixth overall, wanted to kind of move from uptown. I'm John Furrier, the co-host of theCUBE, with Peter Burris, Head of Research at SiliconANGLE and GM of Wikibon Research. Our next guest is Tim Smith, who's the SVP of technical operations at AppNexus; technical operations for large scale is an understatement. But before we get going; Tim, just talk about what AppNexus as a company, what you guys do, what's the core business? >> Sure, AppNexus is the second largest digital advertising marketplace after google. We're an internet technology company that harnessed, we harness data and machine learning to power the companies that comprise the open internet. We began by building a powerful technology platform, in which we embedded core capabilities, tools and features. With me so far? >> Yeah, we got it. >> Okay, on top of that platform, we built a core suite of cloud-based enterprise products that enable the buying and selling of digital advertising, and a scale-transparent and low-cost marketplace where other companies can transact; either using our enterprise products, or those offered by other companies. If you want to hear a little about the daily peaks, peak feeds and speeds, it is Strata, we should probably talk about that. We do about 11.8 billion impressions transacted on a daily basis. Each of those is a real-time auction conducted in a fraction of a second, well under half a second. We see about 225 billion impressions per day, and we handle about 5 million queries per second at peak load. We produce about 150 terabytes of data each day, and we move about 400 gigabits into and out of the internet at peak, all those numbers are daily peaks. Makes sense? >> Yep. >> Okay, so by way of comparison, which might be useful for people, I believe the NYSE currently does roughly 2 million trades per day. So if we round that up to 3 million trades a day and assume the NYSE were to conduct that volume every single day of the year; 7 days a week, 365 days a year, that'd be about a billion trades a year. Similarly, I believe Visa did about 28-and-a-half billion transactions in their fiscal third quarter. I'll round that up to 30 billion, and average it out to about 333 million transactions per day and annualize it to about 4 billion transactions per year. Little bit of math, but as I mentioned, AppNexus does an excess of 10 billion transactions per day. And so it seems reasonable to say that AppNexus does roughly 10 times the transaction volume in one day, than the NYSE does in a year. And similarly, it seems reasonable to say that AppNexus daily does more than two times the transaction volume that Visa does in a year. Obviously, these are all just very rough numbers based on publicly available information about the NYSE and Visa, and both the NYSE and Visa do far, far more volume than AppNexus when measured in terms of dollars. So given our volumes, it's imperative that AppNexus does each transaction with the maximum efficiency and lowest reasonable possible cost, and that is one of the most challenging aspects of my job. >> So thanks for spending the time to give the overview. There's a lot of data; I mean 10 billion a day is massive volume. I mean the internet, and you see the scale, is insane. We're in a new era right now of web-scale. We've seen it in Facebook, and it's enormous. It's only going to get bigger, right? So on the online ad tech, you guys are essentially doing like a Google model, that's not everything but Google, which is still huge numbers. Then you include Microsoft and everybody else. Really heavy lifting, IT-like situation. What's the environment like? And just talk about, you know, what's it like for you guys. Because you got a lot of opp's, I mean terms of dev opp's. You can't break anything, because that 10 billion transaction or near, it's a significant impact. So you have to have everything buttoned-up super tight, yet you got to innovate and grow with the future growth. What's the IT environment like? >> It's interesting. We have about 8,000 servers spread across about seven data centers on three continents, and we run, as you mentioned, around the clock. There's no closing bell; downtime is not acceptable. So when you look at our environment, you're talking about four major categories of server complexes. We have real-time processing, which is the actual ad serving. We have a data pipeline, which is what we call our big data environment. We also have client-facing environment and an infrastructure environment. So we use a lot of different tools and applications, but I think the most relevant ones to this discussion are Hadoop and its friends HDFS, and Hive and Spark. And then we use the Vertica Analytics Platform. And together Hadoop and its friends, and Vertica comprise our entire data pipeline. They're both very disk-intensive. They're cluster based applications, and it's a lot of challenge to keep them up and running. >> So what are some of those challenges? Just explain a little bit, because you also have a lot of opportunity. I mean, it's money flowing through the air, basically; digital air, if you will. I mean, they got a lot of stuff happening. Take us through the challenges. >> You know, our biggest apps are all clustered. And all of our clusters are built with commodity servers, just like a lot of other environments. The big data app clusters traditionally have had internal disks, while almost all of our other servers are very light on disk. One of the biggest challenges is, since the server is the fundamental building block of a cluster, then regardless of whether you need more compute or more storage, you always have to add more servers to get it. That really limits flexibility and creates a lot of inefficiencies, and I really, really am obsessive about reducing and eliminating inefficiencies. So, with me so far? >> Yep. >> Great. The inefficiencies result from two major factors. First, not all workloads require the same ratio of compute to storage. Some workloads are more compute-intensive, and others are really less dependent on storage, while other workloads require a lot more storage. So we have to use standard server configurations and as a result, we wind up with underutilized compute and storage. This is undesirable, it's inefficient, yet given our scale, we have to use standardized configurations. So that's the first big challenge. The second is the compute to disk ratio. It's generally fixed when you buy the servers. Yes, we can certainly add more disks in the field, but that's a labor intensive, and it's complicated from a logistics and an asset management standpoint, and you're fundamentally limited by the number of disk slots in the server. So now you're right back into the trap of more storage requires more servers, regardless of whether you need more compute or not. And then you compound the inefficiencies. >> Couldn't you just move the resources from, unused resources, from one cluster to the other? >> I've been asked that a lot; and no, it's just not that simple. Each application cluster becomes a silo due to its configuration of storage and compute. This means you just can't move servers from clusters because the clusters are optimized for the workloads, and the fact that you can't move resources from one cluster to another, it's more inefficiencies. And then they're compounded over time since workloads change, and the ideal ratio of compute-to-storage changes. And the end result is unused resources trapped in silos and configurations that are no longer optimized for your workload. And there's only really one solution that we've been able to find. And to paraphrase an orator far, far more talented than I am, namely Ronald Reagan, we need to open this gate, tear down these silos. The silos just have to go away. They fundamentally limit flexibility and efficiency. >> What were some of the other issues caused by using servers with internal drives? >> You have more maintenance, you've got to deal with the logistics. But the biggest problem is service and storage have significantly different life cycles. Servers typically have a three year life cycle before they're obsolete. Storage typically is four to six years. You can sometimes stretch that a little further with the storage. Inside the servers that are replaced every 3 years, we end up replacing storage before the end of its effective lifetime; that's inefficient. Further, since the storage is inside the servers, we have to do massive data migrations when we replace servers. Migrations, they're time consuming, they're logistically difficult, and they're high risk. >> So how did DriveScale help you guys? Because you guys certainly have a challenging environment, you laid out the the story, and we appreciate that. How did DriveScale help you with the challenges? >> Well, what we really wanted to do was disaggregate storage from servers, and DriveScale enables us to do that. Disaggregating resources is a new term in the industry, but I think lot of people are focusing on it. I can explain it if you think that would make sense. >> What do you mean by disaggregating resources? Can you explain that, and how it works? >> Sure, so instead of buying servers with internal drives, we now buy diskless servers with JBODs. And DriveScale lets us easily compose servers with whatever amount of disk storage we need, from the server resource pool and the disk resource pool; and they're separate pools. This means we have the right balance of compute and storage for each workload, and we can easily adjust it over time. And all of this is done via software, so it's easy to do with a GUI or in our case, at our scale, scripting. And it's done on demand, and it's much more efficient. >> How does it help you with the underutilized resource challenge you mentioned earlier? >> Well, since we can add and remove resources from each cluster, we can manage exactly how much compute power and storage is deployed for each workload. Since this is all done via software, it can be done quickly and easily. We don't have to send a technician into a data center to physically swap drives, add drives, move drives. It's all done via software and it's very, very efficient. >> Can you move resources between silos? >> Well, yes and no. First off, our goal is no more silos. That said, we still have clusters, and once we completely migrate to DriveScale, all of our compute and storage resources will be consolidated into just a few common pools. And disk storage will no longer differentiate pools; thus, we have fewer pools. For more, we have fewer pools and can use the resources in each pool for more workloads. And when our needs change and they always do, we can reallocate resources as needed. >> What of the life cycle management challenge? How you guys address that? >> Well that's addressed with DriveScale. The compute and the storage are now disaggregated or separated into diskless servers and JBODs, so we can upgrade one without touching the other. We want to upgrade servers to take advantage of new processors or new memory architectures, we just replace the servers, re-combine the disks with the new servers, and we're back up and operating. It saves the cost of buying new disks when we don't need to, and it also simplifies logistics and reduces risk, as we no longer have to run the old plant and the new plant concurrently, and do a complicated data migration. >> What about this qualifying server and storage vendors? Do you still do that? Or how's that impact -- >> We actually don't have to do it. We're still using the same server vendor. We've used Dell for many, many years, we continue to use them. We are using them for storage and there was no real work, we just had to add DriveScale into the mix. >> What's it like working with DriveScale? >> They're really wonderful to work with. They have a really seasoned team. They were at Sun Microsystems and Cisco, they built some of the really foundational products that changed the internet, that the internet was built on. They're really talented, they really bright, and they're really focused on customer success. >> Great story, thanks for sharing that. My final question for you is, you guys have a very big, awesome environment, you've got a lot of scale there. It's great for a startup to get into an environment like this, because one, they could get access to the data, work with a good team like you have. What's it like working with a startup? >> You know it's always challenging at first; too many things to do. >> They got talented guys. Most of the startups, those early day startups, they got all their A players out there. >> They have their A players, and we've been very pleased working with them. We're dealing with the top talent, some of the top talent in the industry, that created the industry. They have a proven track record. We really don't have any concerns, we know they're committed to our success and they have a great team, and great investors. >> A final, final question. For your friends out there are watching, and other practitioners who are trying to run things at scale with a cloud. What's your advice to them? You've been operating at scale, and a lot of, billions of transactions, I mean huge; it's only going to get bigger. Put your IT friendly advice hat on. What's the mindset of operators out there, technical op's, as dev op's comes in seeing a lot of that. What do people need to be thinking about to run at scale? >> There's no magic silver bullet. There's no magic answers. The public cloud is very helpful in a lot of ways, but you really have to think hard about your economics, you have to think about your scale. You just have to be sure that you're going into each decision knowing that you've looked at the costs and the benefits, the performance, the risks, and you don't expect there to be simple answers. >> Yeah, there's no magic beans as they say. You've got to make it work for the business. >> No magic beans, I wish there were. >> Tim, thanks so much for the story. Appreciate the commentaries. Live coverage at Big Data NYC, it's theCUBE. Be back with more after this short break. (upbeat techno music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media and GM of Wikibon Research. Sure, AppNexus is the second largest of the internet at peak, all those numbers are daily peaks. and that is one of the most challenging aspects of my job. I mean the internet, and you see the scale, is insane. and we run, as you mentioned, around the clock. because you also have a lot of opportunity. One of the biggest challenges is, The second is the compute to disk ratio. and the fact that you can't move resources Further, since the storage is inside the servers, Because you guys certainly have a challenging environment, I can explain it if you think that would make sense. and we can easily adjust it over time. We don't have to send a technician into a data center and once we completely migrate to DriveScale, and the new plant concurrently, We actually don't have to do it. that changed the internet, that the internet was built on. you guys have a very big, awesome environment, You know it's always challenging at first; Most of the startups, those early day startups, that created the industry. What's the mindset of operators out there, and you don't expect there to be simple answers. You've got to make it work for the business. Tim, thanks so much for the story.

ENTITIES

Entity	Category	Confidence
NYSE	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Tim Smith	PERSON	0.99+
four	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
AppNexus	ORGANIZATION	0.99+
SiliconANGLE	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Ronald Reagan	PERSON	0.99+
10 times	QUANTITY	0.99+
Visa	ORGANIZATION	0.99+
three year	QUANTITY	0.99+
one day	QUANTITY	0.99+
First	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
second	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
each workload	QUANTITY	0.99+
One	QUANTITY	0.99+
each cluster	QUANTITY	0.99+
google	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
sixth	QUANTITY	0.99+
six years	QUANTITY	0.99+
one	QUANTITY	0.99+
each pool	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
fiscal third quarter	DATE	0.99+
Each	QUANTITY	0.99+
7 days a week	QUANTITY	0.99+
one solution	QUANTITY	0.99+
each transaction	QUANTITY	0.98+
one cluster	QUANTITY	0.98+
365 days a year	QUANTITY	0.98+
Facebook	ORGANIZATION	0.98+
each day	QUANTITY	0.98+
a year	QUANTITY	0.98+
10 billion a day	QUANTITY	0.98+
Hell's Kitchen	LOCATION	0.98+
three continents	QUANTITY	0.98+
both	QUANTITY	0.98+
about 28-and-a-half billion transactions	QUANTITY	0.98+
about 150 terabytes	QUANTITY	0.97+
Manhattan, New York City	LOCATION	0.97+
more than two times	QUANTITY	0.97+
Big Data	ORGANIZATION	0.97+
New York City	LOCATION	0.97+
two major factors	QUANTITY	0.97+
about 11.8 billion impressions	QUANTITY	0.96+
about 8,000 servers	QUANTITY	0.96+
about 400 gigabits	QUANTITY	0.96+
Each application cluster	QUANTITY	0.96+
billions	QUANTITY	0.96+
up to 30 billion	QUANTITY	0.96+
NYC	LOCATION	0.95+
under half a second	QUANTITY	0.94+
Strata Data	EVENT	0.93+
each decision	QUANTITY	0.92+
SiliconANGLE Media	ORGANIZATION	0.92+
2017	DATE	0.91+
Vertica	ORGANIZATION	0.91+
about 4 billion transactions per year	QUANTITY	0.9+
Spark	TITLE	0.9+
theCUBE	ORGANIZATION	0.9+
about a billion trades a year	QUANTITY	0.9+
up to 3 million trades a day	QUANTITY	0.9+
10 billion transaction	QUANTITY	0.88+
DriveScale	ORGANIZATION	0.88+
about 333 million transactions per day	QUANTITY	0.87+
Hive	TITLE	0.87+
HDFS	TITLE	0.87+
CUBE-Wikibon Research Big Data	EVENT	0.86+
DriveScale	TITLE	0.86+
10 billion transactions per day	QUANTITY	0.86+
GM	PERSON	0.83+
2 million trades per day	QUANTITY	0.82+

Jagane Sundar, WANdisco | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone here live in New York City. This is theCUBE special presentation of our annual event with theCUBE and Wikibon Research called BigData NYC, it's our own event that we have every year, celebrating what's going on in the big data world now. It's evolving to all data, cloud applications, AI, you name it, it's happening. In the enterprise, the impact is huge for developers, the impact is huge. I'm John Furrier, cohost of the theCUBE, with Peter Burris, Head of Research, SiliconANGLE Media and General Manager of Wikibon Research. Our next guest is Jagane Sundar, who's the CTO of WANdisco, Cube alumni, great to see you again as usual here on theCUBE. >> Thank you John, thank you Peter, it's great to be back on theCUBE. >> So we've been talking the big data for many years, certainly with you guys, and it's been a great evolution. I don't want to get into the whole backstory and history, we covered that before, but right now is a really, really important time, we see you know the hurricanes come through, we see the floods in Texas, we've seen Florida, and Puerto Rico now on the main conversation. You're seeing it, you're seeing disasters happen. Disaster recovery's been the low hanging fruit for you guys, and we talked about this when New York City got flooded years and years ago. This is a huge issue for IT, because they have to have disaster recovery. But now it's moving more beyond just disaster recovery. It's cloud. What's the update from WANdisco? You guys have a unique perspective on this. >> Yes, absolutely. So we have capabilities to replicate between the cloud and Hadoop multi data centers across geos, so disasters are not a problem for us. And we have some unique technologies we use. One of the things we do is we can replicate in an active-active mode between different cloud vendors, between cloud and on-prem Hadoop, and we are the only game in town. Nobody else can do that. >> So okay let me just stop right there. When you say the only game in town I got a little skeptic here. Are you saying that nobody does active-active replication at all? >> That is exactly what I'm saying. We had some wonderful announcements from Hortonworks, they have a great product called the Dataplane. But if you dig deep, you'll find that it's actually an active-passive architecture, because to do active-active, you need this capability called the Paxos algorithm for resolving conflict. That's a very hard algorithm to implement. We have over 10 years' experience in that. That's what gives us our ability to do this active-active replication, between clouds, between on-prem and cloud. >> All right so just to take that a step further, I know we're having a CTO conversation, but the classic cliche is skate to where the puck is going to be. So you kind of didn't just decide one morning you're going to be the active-active for cloud. You kind of backed into this. You know the world spun in your direction, the puck came to you guys. Is that a fair statement? >> That is a very fair statement. We've always known there's tremendous value in this technology we own, and with the global infrastructure trends, we knew that this was coming. It wasn't called the cloud when we started out, but that's exactly what it is now, and we're benefiting from it. >> And the cloud is just a data center, it's just, you don't own it. (mumbles) Peter, what's your reaction to this? Because when he says only game in town, implies some scarcity. >> Well, WANdisco has a patent, and it actually is very interesting technology, if I can summarize very quickly. You do continuous replication based on writes that are performed against the database, so that you can have two writers and two separate databases and you guarantee that they will be synchronized at some point in time because you guarantee that the writing of the logs and the messaging to both locations >> Absolutely. >> in order, which is a big issue. You guys put a stamp on the stuff, and it actually writes to the different locations with order guaranteed, and that's not the way most replication software works. >> Yes, that's exactly right. That's very hard to do, and that's the only way for you to allow your clients in different data centers to write to the same data store, whether it's a database, a Hadoop folder, whether it's a bucket in a cloud object store, it doesn't matter. The core fact remains, the Paxos algorithm is the only way for you to do active-active replication, and ours is the only Paxos implementation that can work over the >> John: And that's patented by you guys? >> Yes, it's patented. >> And so someone to replicate that, they'd have to essentially reverse engineer and have a little twist on it to not get around the patents. Are you licensing the technology or are you guys hoarding it for yourselves? >> We have different ways of engaging with partners. We are very reasonable with that, and we work with several powerful partners >> So you partner with the technology. >> Yes. >> But the key thing, John, in answer to your question is that it's unassailable. I mean there's no argument, that is, companies move more towards a digital way of doing things, largely driven by what customers want, your data becomes more of an asset. As you data becomes more of an asset, you make money by using that data in more places, more applications and more times. That is possible with data, but the problem you end up with consistency issues, and for certain applications, it's not an issue, you're basically writing, or if you're basically reading data it's not an issue. But the minute that you're trying to write on behalf of a particular business event or a particular value proposition, then now you have a challenge, you are limited in how you can do it unless you have this kind of a technology. And so this notion of continuous replication in a world that's going to become increasingly dependent upon data, data that is increasingly distributed, data that you want to ensure has common governance and policy in place, technologies like WANdisco provides are going to be increasingly important to the overall way that a business organizes itself, institutes its work and makes sure it takes care of its data assets. >> Okay, so my next question then, thanks for the clarification, it's good input there and thanks for summarizing it like that, 'cause I couldn't have done that. But when we last talked, I always was enamored by the fact that you guys have the data center replication thing down. I always saw that as a great thing for you guys. Okay, I get that, that's an on-premise situation, you have active-active, good for disaster recovery, lot of use cases, people should be beating down your door 'cause you have a better mousetrap, I get that. Now how does that translate to the cloud? So take me through why the cloud now fits nicely with that same paradigm. >> So, I mean, these are industry trends, right. What we've found is that the cloud object stores are very, very cost effective and efficient, so customers are moving towards that. They're using their Hadoop applications but on cloud object stores. Now it's trivial for us to add plugins that enable us to replicate between a cloud object store on one side, and a Hadoop on the other side. It could also be another cloud object store from a different cloud provider on the other side. Once you have that capability, now customers are freed from lock-in from either a cloud vendor or a Hadoop vendor, and they love that, they're looking at it as another way to leverage their data assets. And we enable them to do that without fear of lock-in from any of these vendors. >> So on the cloud side, the regions have always been a big thing. So we've heard Amazon have a region down here, and there was fix it. We saw at VMworld push their VMware solution to only one western region. What's the geo landscape look like in the cloud? Does that relate to anything in your tech? >> So yes, it does relate, and one of the things that people forget is that when you create an Amazon S3 bucket, for example, you specify a region. Well, but this is the cloud, isn't it worldwide? Turns out that object store actually resides in one region, and you can use some shaky technologies like cross-region replication to eventually get the data to the other region. >> Peter: Which just boosts the prices you pay. >> Yes, not just boost the price. >> Well they're trying to save price but then they're exposed on reliability. >> Reliability, exactly. You don't know when the data's going to be there, there are no guarantees. What we offer is, take your cloud storage, but we'll guarantee that we can replicate it in a synchronous fashion to another region. Could be the same provider, could be another provider. That gives tremendous benefits to the customers. >> So you actually have a guarantee when you go to customers, say with an SLA guarantee? Do you back it up with like money back, what's the guarantee? >> So the guarantees are, you know we are willing to back it up with contracts and such like, and our customers put us through rigorous testing procedures, naturally. But we stand up to every one of those. We can scale and maintain the consistency guarantees that they need for modern businesses. >> Okay, so take me through the benefits. Who wants this? Because you can almost get kind of sucked into the complexities of it, and the nuances of cloud and everything as Peter laid out, it's pretty complex even as he simplified it. Who buys this? (laughs) I mean, who's the guy, is it the IT department, is it the ops guy, is it the facilities, who... >> So we sell to the IT departments, and they absolutely love the technology. But to go back to your initial statement, we have all these disasters happening, you know, hopefully people are all doing reasonably okay at the end of these horrible disasters, but if you're an enterprise of any size, it doesn't have to be a big enterprise, you cannot go back to your users or customers and say that because of a hurricane you cannot have access to your data. That's sometimes legally not allowed, and other times it's just suicide for a business >> And HPE in Houston, it's a huge plant down there. >> Jagane: Indeed. >> They got hit hard. >> Yep, in those sort of circumstances, you want to make sure that your data is available in multiple data centers spread throughout the world, and we give you that capability. >> Okay, what are some of the successes? Let's talk through now, obviously you've got the technology, I get that. Where's the stakes in the ground? Who's adopting it? I know you do a lot of biz dev deals. I don't know if they're actually OEM-type deals, or they're just licensing deals. Take us through to where your successes are with this technology. >> So, biz dev wise, we have a mix of OEM deals and licenses and co-selling agreements. The strong ones are all OEMs, of course. We have great partnerships with IBM, Amazon, Microsoft, just wonderful partnerships. The actual end customers, we started off selling mostly to the financial industry because they have a legal mandate, so they were the first to look into this sort of a thing. But now we've expanded into automobile companies. A lot of the auto companies are generating vast amounts of data from their cars, and you can't push all that data into a single data center, that's just not reasonable. You want to push that data into a single data store that's distributed across the world in just wherever the car is closest to. We offer that capability that nobody else can, so that we've got big auto manufacturers signed up, we've got big retailers signed up for exactly the same capability. You cannot imagine ingesting all that data into a single location. You want this replicated across, you want it available no matter what happens to any single region or a data center. So we've got tremendous success in retail, banking, and a lot of this is through partnerships again. >> Well congratulations, I got to ask, you know, what's new with you guys? Obviously you have success with the active-active. We'll dig into the Hortonworks things to check your comment around them not having it, so we'll certainly look with the Dataplane, which we like. We interviewed Rob Bearden. Love the announcement, but they don't have the active-active, we're going to document that, and get that on the record. But you guys are doing well. What's new here, what's in New York, what are some of your wins, can you just give a quick update on what's going on at WANdisco? >> Okay, so quick recap, we love the Hortonworks Dataplane as well. We think that we can build value into that ecosystem by building a plugin for them. And we love the whole technology. I have wonderful friends there as well. As for our own company, we see all of our, a lot of our business coming from cloud and hybrid environments. It's just the reality of the situation. You had, you know, 20 years ago, you had NFS, which was the great appender of all storage, but turned out to be very expensive, and you had 10 years, seven years ago you had HDFS come along, and that appended the cost model of NFS and SANs, which those industries were still working their way through. And now we have cloud object stores, which have appended the HDFS model, it's much more cost-efficient to operate using cloud object stores. So we will be there, we have replication products for that. >> John: And you're in the major clouds, you in Azure? >> Yes, we are in Azure. >> Google? >> Jagane: Yes, absolutely. >> AWS? >> AWS, of course. >> Oracle? >> Oracle, of course. >> So you got all the top four companies. >> We're in all of them. >> All right, so here's the next question is, >> And you're also in IBM stuff too. >> Yes, we're built tightly into IBM >> So you've got a pretty strong legacy >> And a monopoly. >> On the mainframe. >> Like the fiber channel of replication. (John and Jagane laugh) That was a bad analogy. I mean it's like... Well, I mean fiber channel has only limited suppliers 'cause they have unique technology, it was highly important. >> But the basic proposition is look, any customer that wants to ensure that a particular data source is going to be available in a distributed way, and you're going to have some degree of consistency, is going to look at this as an option. >> Yes. >> Well you guys certainly had a great team under your leadership, it's got great tech. The final question I have for you here is, you know, we've had many conversations about the industry, we like to pontificate, I certainly like to speculate, but now we have eight years of history now in the big data world, we look back, you know, we're doing our own event in New York City, you know, thanks to great support from you guys and other great friends in the community. Appreciate everyone out there supporting theCUBE, that's awesome. But the world's changed. So I got to ask you, you're a student of the industry, I know that and knowing you personally. What's been the success formula that keeps the winners around today, and what do people need to do going forward? 'Cause we've seen the train wreck, we've seen the dead bodies in the industry, we've kind of seen what's happened, there've been some survivors. Why did the current list of characters and companies survive, and what's the winning formula in your opinion to stay relevant as big data grows in a huge way from IoT to AI cloud and everything in between? >> I'll quote Stephen Hawking in this. Intelligence is the capability to adapt to changes. That's what keeps industries, that's what keeps companies, that what keeps executives around. If you can adapt to change, if you can see things coming, and adapt your core values, your core technology to that, you can offer customers a value proposition that's going to last a long time. >> And in a big data space, what is that adaptive key focus, what should they be focused on? >> I think at this point, it's extracting information from this volume of data, whether you use machine learning in the modern days, or whether it was simple hive queries, that's the value proposition, and making sure the data's available everywhere so you can do that processing on it, that remains the strength. >> So the whole concept of digital business suggests that increasingly we're going to see our assets rendered in some form as data. >> Yes. >> And we want to be able to ensure that that data is able to be where it needs to be when it needs to be there for any number of reasons. It's a very, very interesting world we're entering into. >> Peter, I think you have a good grasp on this, and I love the narrative of programming the world in real time. What's the phrase you use? It's real time but it's programming the world... Programming the real world. >> Yeah, programming the real world. >> That's a huge, that means something completely, it's not a tech, it's a not a speed or feed. >> Well the way we think about it, is that we look at IoT as a big information transducer, where information's in one form, and then you turn it into another form to do different kinds of work. And that big data's a crucial feature in how you take data from one form and turn it into another form so that it can perform work. But then you have to be able to turn that around and have it perform work back in the real world. There's a lot of new development, a lot of new technology that's coming on to help us do that. But any way you look at it, we're going to have to move data with some degree of consistency, we're still going to have to worry about making sure that if our policy says that that action needs to take place there, and that action needs to take place there, that it actually happens the way we want it to, and that's going to require a whole raft of new technologies. We're just at the very beginning of this. >> And active-active, things like active-active in what you're talking about really is about value creation. >> Well the thing that makes active-active interesting is, again, borrowing from your terms, it's a new term to both of us, I think, today. I like it actually. But the thing that makes it interesting is the idea that you can have a source here that is writing things, and you can have a source over there that are writing things, and as a consequence, you can nonetheless look at a distributed database and keep it consistent. >> Consistent, yeah. >> And that is a major, major challenge that's going to become increasingly a fundamental feature of our digital business as well. >> It's an enabling technology for the value creation and you call it work. >> Yeah, that's right. >> Transformation of work. Jagane, congratulations on the active-active, and WANdiscos's technology and all your deals you're doing, got all the cloud locked up. What's next? Well you going to lock up the edge? You're going to lock up the edge too, the cloud. >> We do like this notion of the edge cloud and all the intermediate steps. We think that replicating data between those systems or running consistent compute across those systems is an interesting problem for us to solve. We've got all the ingredients to solve that problem. We will be on that. >> Jagane Sundar, CTO of WANdisco, back on theCUBE, bringing it down. New tech, whole new generation of modern apps and infrastructure happening in distributed and decentralized networks. Of course theCUBE's got it covered for you, and more live coverage here in New York City for BigData NYC, our annual event, theCUBE and Wikibon here in Hell's Kitchen in Manhattan, more live coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media great to see you again as usual here on theCUBE. Thank you John, thank you Peter, Disaster recovery's been the low hanging fruit for you guys, One of the things we do is we can replicate Are you saying that nobody does because to do active-active, you need this capability the puck came to you guys. and with the global infrastructure trends, And the cloud is just a data center, and the messaging to both locations You guys put a stamp on the stuff, is the only way for you to do active-active replication, or are you guys hoarding it for yourselves? and we work with several powerful partners But the key thing, John, in answer to your question that you guys have the data center replication thing down. Once you have that capability, Does that relate to anything in your tech? and you can use some shaky technologies but then they're exposed on reliability. Could be the same provider, could be another provider. So the guarantees are, you know we are willing to is it the ops guy, is it the facilities, who... you cannot have access to your data. And HPE in Houston, and we give you that capability. I know you do a lot of biz dev deals. and you can't push all that data into a single data center, and get that on the record. and that appended the cost model of NFS and SANs, So you got all Like the fiber channel of replication. But the basic proposition is look, in the big data world, we look back, you know, Intelligence is the capability to adapt to changes. and making sure the data's available everywhere So the whole concept of digital business is able to be where it needs to be What's the phrase you use? That's a huge, that means something completely, that it actually happens the way we want it to, in what you're talking about really is about is the idea that you can have a source here that's going to become increasingly and you call it work. Well you going to lock up the edge? We've got all the ingredients to solve that problem. and more live coverage here in New York City

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Jagane Sundar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Jagane	PERSON	0.99+
John Furrier	PERSON	0.99+
Peter	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
Stephen Hawking	PERSON	0.99+
two writers	QUANTITY	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
Puerto Rico	LOCATION	0.99+
Texas	LOCATION	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
VMworld	ORGANIZATION	0.99+
Florida	LOCATION	0.99+
Google	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
both	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two separate databases	QUANTITY	0.99+
20 years ago	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
Cube	ORGANIZATION	0.99+
first	QUANTITY	0.99+
WANdiscos	ORGANIZATION	0.98+
over 10 years'	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
one form	QUANTITY	0.97+
Wikibon	ORGANIZATION	0.97+
One	QUANTITY	0.97+
today	DATE	0.97+
seven years ago	DATE	0.96+
one	QUANTITY	0.96+
one region	QUANTITY	0.96+
Hadoop	TITLE	0.96+
Hortonworks Dataplane	ORGANIZATION	0.95+
NYC	LOCATION	0.95+
four companies	QUANTITY	0.94+
single region	QUANTITY	0.94+
years	DATE	0.93+
Dataplane	ORGANIZATION	0.91+
single location	QUANTITY	0.91+
single data center	QUANTITY	0.91+
HPE	ORGANIZATION	0.9+
one side	QUANTITY	0.9+
one western	QUANTITY	0.89+
Paxos	TITLE	0.89+
Paxos	OTHER	0.88+
both locations	QUANTITY	0.88+
10 years	QUANTITY	0.88+
BigData	EVENT	0.87+
Azure	TITLE	0.86+

Tendü Yogurtçu, Syncsort | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello everyone, welcome back to theCUBE's special BigData NYC coverage of theCUBE here in Manhattan in New York City, we're in Hell's Kitchen. I'm John Furrier, with my cohost Jim Kobielus, whose Wikibon analyst for BigData. In conjunction with Strata Data going on right around the corner, this is our annual event where we break down the big data, the AI, the cloud, all the goodness of what's going on in big data. Our next guest is Tendu Yogurtcu who's the Chief Technology Officer at Syncsort. Great to see you again, CUBE alumni, been on multiple times. Always great to have you on, get the perspective, a CTO perspective and the Syncsort update, so good to see you. >> Good seeing you John and Jim. It's a pleasure being here too. Again the pulse of big data is in New York, and it's a great week with a lot of happening. >> I always borrow the quote from Pat Gelsinger, who's the CEO of VMware, he said on theCUBE in I think 2011, before he joined VMware as CEO he was at EMC. He said if you're not out in front of that next wave, you're driftwood. And the key to being successful is to ride the waves, and the big waves are coming in now with AI, certainly big data has been rising tide for its own bubble but now the aperture of the scale of data's larger, Syncsort has been riding the wave with us, we've been having you guys on multiple times. And it was important to the mainframe in the early days, but now Syncsort just keeps on adding more and more capabilities, and you're riding the wave, the big wave, the big data wave. What's the update now with you guys, where are you guys now in context of today's emerging data landscape? >> Absolutely. As organizations progress with their modern data architectures and building the next generation analytics platforms, leveraging machine learning, leveraging cloud elasticity, we have observed that data quality and data governance have become more critical than ever. Couple of years we have been seeing this trend, I would like to create a data lake, data as a service, and enable bigger insights from the data, and this year, really every enterprise is trying to have that trusted data set created, because data lakes are turning into data swamps, as Dave Vellante refers often (John laughs) and collection of this diverse data sets, whether it's mainframe, whether it's messaging queues, whether it's relational data warehouse environments is challenging the customers, and we can take one simple use case like Customer 360, which we have been talking for decades now, right? Yet still it's a complex problem. Everybody is trying to get that trusted single view of their customers so that they can serve the customer needs in a better way, offer better solutions and products to customers, get better insights about the customer behavior, whether leveraging deep learning, machine learning, et cetera. However, in order to do that, the data has to be in a clean, trusted, valid format, and every business is going global. You have data sets coming from Asia, from Europe, from Latin America, and many different places, in different formats and it's becoming challenge. We acquired Trillium Software in December 2016, and our vision was really to bring that world leader enterprise grade data quality into the big data environments. So last week we announced our Trillium Quality for Big Data product. This product brings unmatched capabilities of data validation, cleansing, enrichment, and matching, fuzzy matching to the data lake. We are also leveraging our Intelligent eXecution engine that we developed for data integration product, the MX8. So we are enabling the organizations to take this data quality offering, whether it's in Hadoop, MapReduce or Apache Spark, whichever computer framework it's going to be in the future. So we are very excited about that now. >> Congratulations, you mentioned the data lake being a swamp, that Dave Vellante referred to. It's interesting, because how does it become a swamp if it's a silo, right? We've seen data silos being antithesis to governance, it challenges, certainly IoT. Then you've got the complication of geopolitical borders, you mentioned that earlier. So you still got to integrate the data, you need data quality, which has been around for a while but now it's more complex. What specifically about the cleansing and the quality of the data that's more important now in the landscape now? Is it those factors, are that the drivers of the challenges today and what's the opportunity for customers, how do they figure this out? >> Complexity is because of many different factors. Some of it from being global. Every business is trying to have global presence, and the data is originating from web, from mobile, from many different data sets, and if we just take a simple address, these address formats are different in every single country. Trillium Quality for Big Data, we support over 150 postal data from different countries, and data enrichment with this data. So it becomes really complex, because you have to deal with different types of data from different countries, and the matching also becomes very difficult, whether it's John Furrier, J Furrier, John Currier, you have to be >> All my handles on Twitter, knowing that's about. (Tendu laughs) >> All of the handles you have. Every business is trying to have a better targeting in terms of offering product and understanding the single and one and only John Furrier as a customer. That creates a complexity, and any data management and data processing challenge, the variety of data and the speed that data is really being populated is higher than ever we have observed. >> Hold on Jim, I want to get Jim involved in this one conversation, 'cause I want to just make sure those guys can get settled in on, and adjust your microphone there. Jim, she's bringing up a good point, I want you to weigh in just to kind of add to the conversation and take it in the direction of where the automation's happening. If you look at what Tendu's saying as to complexity is going to have an opportunity in software. Machine learning, root-level cleanliness can be automated, because Facebook and others have shown that you can apply machine learning and techniques to the volume of data. No human can get at all the nuances. How is that impacting the data platforms and some of the tooling out there, in your opinion? >> Yeah well, much of the issue, one of the core issues is where do you place the data matching and data cleansing logic or execution in this distributed infrastructure. At the source, in the cloud, at the consumer level in terms of rolling up the disparate versions of data into a common view. So by acquiring a very strong, well-established reputable brand in data cleansing, Trillium, as Syncsort has done, a great service to your portfolio, to your customers. You know, Trillium is well known for offering lots of options in terms of where to configure the logic, where to deploy it within distributed hybrid architectures. Give us a sense for going forward the range of options you're going to be providing with for customers on where to place the cleansing and matching logic. How you're going to support, Syncsort, a flexible workflows in terms of curation of the data and so forth, because the curation cycle for data is critically important, the stewardship. So how do you plan to address all of that going forward in your product portfolio, Tendu? >> Thank you for asking the question, Jim, because that's exactly the challenge that we hear from our customers, especially from larger enterprise and financial services, banking and insurance. So our plan is our actually next upcoming release end of the year, is targeting very flexible deployment. Flexible deployment in the sense that you might be creating, when you understand the data and create the business rules and said what kind of matching and enrichment that you'll be performing on the data sets, you can actually have those business rules executed in the source of the data or in the data lake or switch between the source and the enterprise data lake that you are creating. That flexibility is what we are targeting, that's one area. On the data creation side, we see these percentages, 80% of data stewards' time is spent on data prep, data creation and data cleansing, and it is actually really a very high percentage. From our customers we see this still being a challenge. One area that we started investing is using the machine learning to understand the data, and using that discovery of the data capabilities we currently have to make recommendations what those business rules can be, or what kind of data validation and cleansing and matching might be required. So that's an area that we will be investing. >> Are you contemplating in terms of incorporating in your product portfolio, using machine learning to drive a sort of, the term I like to use is recommendation engine, that presents recommendations to the data stewards, human beings, about different data schemas or different ways of matching the data, different ways of, the optimal way of reconciling different versions of customer data. So is there going to be like a recommendation engine of that sort >> It's going to be >> In line with your >> That's what our plan currently recommendations so the users can opt to apply or not, or to modify them, because sometimes when you go too far with automation you still need some human intervention in making these decisions because you might be operating on a sample of data versus the full data set, and you may actually have to infuse some human understanding and insight as well. So our plan is to make as a recommendation in the first phase at least, that's what we are planning. And when we look at the portfolio of the products and our CEO Josh is actually today was also in theCUBE, part of Splunk .conf. We have acquisitions happening, we have organic innovation that's happening, and we really try to stay focused in terms of how do we create more value from your data, and how do we increase the business serviceability, whether it's with our Ironstream product, we made an announcement this week, Ironstream transaction tracing to create more visibility to application performance and more visibility to IT operations, for example when you make a payment with your mobile, you might be having problem and you want to be able to trace back to the back end, which is usually a legacy mainframe environment, or whether you are populating the data lake and you want to keep the data in sync and fresh with the data source, and apply the change as a CDC, or whether you are making that data from raw data set to more consumable data by creating the trusted, high quality data set. We are very much focused on creating more value and bigger insights out of the data sets. >> And Josh'll be on tomorrow, so folks watching, we're going to get the business perspective. I have some pointed questions I'm going to ask him, but I'll take one of the questions I was going to ask him but I want to get your response from a technical perspective as CTO. As Syncsort continues your journey, you keep on adding more and more things, it's been quite impressive, you guys done a great job, >> Tendu: Thank you. >> We enjoy covering the success there, watching you guys really evolve. What is the value proposition for Syncsort today, technically? If you go in, talk to a customer, and prospective new customer, why Syncsort, what's the enabling value that you're providing under the hood, technically for customers? >> We are enabling our customers to access and integrate data sets in a trusted manner. So we are ultimately liberating the data from all of the enterprise data stores, and making that data consumable in a trusted manner. And everything we provide in that data management stack, is about making data available, making data accessible and integrated the modern data architecture, bridging the gap between those legacy environments and the modern data architecture. And it becomes really a big challenge because this is a cross-platform play. It is not a single environment that enterprises are working with. Hadoop is real now, right? Hadoop is in the center of data warehouse architecture, and whether it's on-premise or in the cloud, there is also a big trend about the cloud. >> And certainly batch, they own the batch thing. >> Yeah, and as part of that, it becomes very important to be able to leverage the existing data assets in the enterprise, and that requires an understanding of the legacy data stores, and existing infrastructure, and existing data warehouse attributes. >> John: And you guys say you provide that. >> We provide that and that's our baby and provide that in enterprise grade manner. >> Hold on Jim, one second, just let her finish the thought. Okay, so given that, okay, cool you got that out there. What's the problem that you're solving for customers today? What's the big problem in the enterprise and in the data world today that you address? >> I want to have a single view of my data, and whether that data is originating on the mobile or that data is originating on the mainframe, or in the legacy data warehouse, and we provide that single view in a trusted manner. >> When you mentioned Ironstream, that reminded me that one of the core things that we're seeing in Wikibon in terms of, IT operations is increasingly being automated through AI, some call it AI ops and whatnot, we're going deeper on the research there. Ironstream, by bringing mainframe and transactional data, like the use case you brought in was IT operations data, into a data lake alongside machine data that you might source from the internet of things and so forth. Seem to me that that's a great enabler potentially for Syncsort if it wished to play your solutions or position them into IT operations as an enabler, leveraging your machine learning investments to build more automated anomaly detection and remediation into your capabilities. What are your thoughts? Is that where you're going or do you see it as an opportunity, AI for IT ops, for Syncsort going forward? >> Absolutely. We target use cases around IT operations and application performance. We integrate with Splunk ITSI, and we also provide this data available in the big data analytics platforms. So those are really application performance and IT operations are the main uses cases we target, and as part of the advanced analytics platform, for example, we can correlate that data set with other machine data that's originating in other platforms in the enterprise. Nobody's looking at what's happening on mainframe or what's happening in my Hadoop cluster or what's happening on my VMware environment, right. They want to correlate the data that's closed platform, and that's one of the biggest values we bring, whether it's on the machine data, or on the application data. >> Yeah, that's quite a differentiator for you. >> Tendu, thanks for coming on theCUBE, great to see you. Congratulations on your success. Thanks for sharing. >> Thank you. >> Okay, CUBE coverage here in BigData NYC, exclusive coverage of our event, BigData NYC, in conjunction with Strata Hadoop right around the corner. This is our annual event for SiliconANGLE, and theCUBE and Wikibon. I'm John Furrier, with Jim Kobielus, who's our analyst at Wikibon on big data. Peter Burris has been on theCUBE, he's here as well. Big three days of wall-to-wall coverage on what's happening in the data world. This is theCUBE, thanks for watching, be right back with more after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media all the goodness of what's going on in big data. and it's a great week with a lot of happening. and the big waves are coming in now with AI, and enable bigger insights from the data, of the data that's more important now and the data is originating from web, from mobile, All my handles on Twitter, All of the handles you have. and some of the tooling out there, in your opinion? and so forth, because the curation cycle for data and create the business rules and said the term I like to use is recommendation engine, and bigger insights out of the data sets. but I'll take one of the questions I was going to ask him What is the value proposition for Syncsort today, and integrated the modern data architecture, in the enterprise, and that requires an understanding and provide that in enterprise grade manner. and in the data world today that you address? or that data is originating on the mainframe, like the use case you brought in was IT operations data, and that's one of the biggest values we bring, Tendu, thanks for coming on theCUBE, great to see you. and theCUBE and Wikibon.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Asia	LOCATION	0.99+
Europe	LOCATION	0.99+
Peter Burris	PERSON	0.99+
John Furrier	PERSON	0.99+
December 2016	DATE	0.99+
VMware	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Tendu Yogurtcu	PERSON	0.99+
Manhattan	LOCATION	0.99+
Latin America	LOCATION	0.99+
Josh	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Syncsort	ORGANIZATION	0.99+
2011	DATE	0.99+
Ironstream	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
tomorrow	DATE	0.99+
EMC	ORGANIZATION	0.99+
last week	DATE	0.99+
Tendu	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one second	QUANTITY	0.99+
over 150 postal data	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
one	QUANTITY	0.98+
Trillium Software	ORGANIZATION	0.98+
New York City	LOCATION	0.98+
Trillium	ORGANIZATION	0.98+
single	QUANTITY	0.98+
John Currier	PERSON	0.98+
first phase	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
this week	DATE	0.96+
Tendü Yogurtçu	PERSON	0.96+
this year	DATE	0.96+
Twitter	ORGANIZATION	0.95+
Couple of years	QUANTITY	0.95+
today	DATE	0.95+
single view	QUANTITY	0.94+
CUBE	ORGANIZATION	0.94+
NYC	LOCATION	0.94+
one area	QUANTITY	0.93+
J Furrier	PERSON	0.92+
Hadoop	TITLE	0.91+
2017	EVENT	0.9+
three days	QUANTITY	0.89+
single environment	QUANTITY	0.88+
One area	QUANTITY	0.87+
one conversation	QUANTITY	0.86+
Apache	ORGANIZATION	0.85+
big wave	EVENT	0.84+
one simple use case	QUANTITY	0.82+

Dr. Mark Ramsey & Bruno Aziza | BigData NYC 2017

>> Live from Mid Town Manhattan. It's the Cube, covering BIGDATA New York City 2017. Brought to you by, SiliconANGLE Media and it's ecosystems sponsors. >> Hey welcome back everyone live here in New York City for the Cube special presentation of BIGDATA NYC. Here all week with the Cube in conjunction with Strata Data even happening around the corner. I'm John Furrier the host. James Kobielus, our next two guests Doctor Mark Ramsey, chief data officer and senior vice president of R&D at GSK, Glasgow Pharma company. And Bruno as he's the CMO at Fscale, both Cube alumni. Welcome back. >> Thank for having us. >> So Bruno I want to start with you because I think that Doctor Mark has some great use cases I want to dig into and go deep on with Jim. But Fscale, give us the update of the company. You guys doing well, what's happening? How's the, you have the vision of this data layer we talked a couple years ago. It's working so tell us, give us the update. >> A lot of things have happened since we talked last. I think you might have seen some of the news in terms of growth. Ten X growth since we started and mainly driven around the customer use cases. That's why I'm excited to hear from Mark and share his stories with the rest of the audience here. We have a presentation at Strata tomorrow with Vivens. It's a great IOT use case as well. So what we're seeing is the industry is changing in terms of how it's spying the idea platforms. In the past, people would buy idea platforms vertically. They'd buy the visualization, they'd buy the sementic and buy the best of great integration. We're now live in a world where there's a multitude of BI tools. And the data platforms are not standardized either. And so what we're kind of riding as a trend is this idea of the need for the universal semantic layer. This idea that you can have a universal set of semantics. In a dictionary or ontology. that can be shared across all types of business users and business use cases. Or across any data. That's really the trend that's driving our growth. And you'll see it today at this show with the used cases and the customers. And of course some of the announcements that we're doing. We're announcing a new offer with cloud there and tableau. And so we're really excited about again how they in space and the partner ecosystems embracing our solutions. >> And you guys really have a Switzerland kind of strategy. You're going to play neutral, play nicely with everybody. Because you're in a different, your abstraction layer is really more on the data. >> That's right. The whole value proposition is that you don't want to move your data. And you don't want to move your users away from the tools that they already know but you do want them to be able to take advantage of the data that you store. And this concept of virtualized layer and you're universal semantic layer that enables the use case to happen faster. Is a big value proposition to all of them. >> Doctor Mark Ramsey, I want to get your quick thoughts on this. I'm obviously your customer so. I mean you're not bias, you ponder pressure everyday. Competitive noise out there is high in this area and you're a chief data officer. You run R&D so you got that 20 miles stare into the future. You've got experience running data at a wide scale. I mean there's a lot of other potential solutions out there. What made it attractive for you? >> Well it feels a need that we have around really that virtualization. So we can leave the data in the format that it is on the platform. And then allow the users to use like Bruno was mentioning. Use a number of standardized tools to access that information. And it also gives us an ability to learn how folks are consuming the data. So they will use a variety of tools, they'll interact with the data. At scale gives us a great capability to really look under the cover, see how they're using the data. And if we need to physicalize some of that to make easier access in the long term. It gives us that... >> It's really an agility model kind to data. You're kind of agile. >> Yeah its kind of a way to make, you know so if you're using a dash boarding tool it allows you to interact with the data. And then as you see how folks are actually consuming the information. Then you can physicalize it and make that readily available. So it is, it gives you that agile cycles to go through. >> In your use of the solution, what have you seen in terms of usage patterns. What are your users using at scale for? Have you been surprised by how they're using it? And where do you plan to go in terms of the use cases you're addressing going forward with this technology? >> This technology allows us to give the users the ability to query the data. So for example we use standardized ontologies in several of the areas. And standardized ontologies are great because the data is in one format. However that's not necessarily how the business would like to look at the data and so it gives us an ability to make the data appear like the way the users would like to consume the information. And then we understand which parts of the model they're actually flexing and then we can make the decision to physicalize that. Cause again it's a great technology but virtualization there is a cost. Because the machines have to create the illusion of the data being a certain way. If you know it's something that's going to be used day in and day out then you can move it to a physicalized version. >> Is there a specific threshold when you were looking at the metrics of usage. When you know that particular data, particular views need to be physicalized. What is that threshold or what are those criteria? >> I think it's, normally is a combination of the number of connections that you have. So the joins of the data across the number of repositories of data. And that balanced with the volume of data so if you're dealing with thousands of rows verses billions of rows then that can lead you to make that decision faster. There isn't a defined metric that says, well we have this number of rows and this many columns and this size that it really will lead you down that path. But the nice thing is you can experiment and so it does give you that ability to sort of prototype and see, are folks consuming the data before you evoke the energy to make it physical. >> You know federated, I use the word federated but semantic virtualization layers clearly have been around for quite sometime. A lot of solution providers offer them. A lot of customers have used them for disparate use cases. One of the wraps traditionally again estimating virtualization is that it's simply sort of a stop gap between chaos on the one end. You know where you have dozens upon dozens of databases with no unified roll up. That's a stop gap on the way to full centralization or migration to a big data hub. Did you see semantic virtualization as being sort of your target architecture for your operational BI and so forth? Or do you on some level is it simply like I said a stop gap or transitional approach on the way to some more centralized environment? >> I think you're talking about kind of two different scenarios here. One is in federated I would agree, when folks attempted to use that to bring disparate data sources together to make it look like it was consolidated. And they happen to be on different platforms, that was definitely a atop gap on a journey to really addressing the problem. Thing that's a little different here is we're talking about this running on a standardized platform. So it's not platformed disparate it's on the platform the data is being accessed on the platform. It really gives us that flexibility to allow the consumer of the data to have a variety of views of the data without actually physicalizing each of them. So I don' know that it's on a journey cause we're never going to get to where we're going to make the data look as so many different ways. But it's very different than you know ten, 15 years ago. When folks were trying to solve disparate data sources using federation. >> Would it be fair to characterize what you do as agile visualization of the data on a data lake platform? Is that what it's essentially about? >> Yeah that, it certainly enables that. In our particular case we use the data lake as the foundation and then we actually curate the data into standardized ontologies and then really, the consumer access layer is where we're applying virtualization. In the creation of the environment that we have we've integrated about a dozen different technologies. So one of the things we're focused on is trying to create an ecosystem. And at scale is one of the components of that. It gives us flexibility so that we don't have to physicalize. >> Well you'd have to stand up any costs. So you have the flexibility with at scale. I get this right? You get the data and people can play with it without actually provisioning. It's like okay save some cash, but then also you double down on winners that come in. >> Things that are a winner you check the box, you physicalize it. You provide that access. >> You get crowd sourcing benefits like going on in your. >> You know exactly. >> The curation you mentioned. So the curation goes on inside of at scale. Are you using a different tool or something you hand wrote in house to do that? Essentially it's a data governance and data cleansing. >> That is, we use technology called Tamer. That is a machine learning based data curation tool, that's one of our fundamental tools for curation. So one of the things in the life sciences industry is you tend to have several data sources that are slightly aligned. But they're actually different and so machine learning is an excellent application. >> Lets get into the portfolio. Obviously as a CTO you've got to build a holistic view. You have a tool chest of tools and a platform. How do you look at the big picture? On that scale if it's been beautifully makes a lot of sense. So good for those guys. But you know big picture is, you got to have a variety of things in your arsenal. How do you architect that tool shed or your platform? Is everything a hammer, everything's a nail. You've got all of them though. All the things to build. >> You bring up a great point cause unfortunately a lot of times. We'll use your analogy, it's like a tool shed. So you don't want 12 lawnmowers right? In your tool shed right? So one of the challenges is that a lot of the folks in this ecosystem. They start with one area of focus and then they try to grow into area of focuses. Which means that suddenly everybody's starts to be a lawnmower, cause they think that's... >> They start as a hammer and turn into a lawn mower. >> Right. >> How did that happen, that's called pivoting. >> You can mow your lawn with a hammer but. So it's really that portfolio of tools that all together get the job done. So certainly there's a data acquisition component, there's the curation component. There's visualization machines learning, there's the foundational layer of the environment. So all of those things, our approach has been to select. The kind of best in class tools around that and then work together and... Bruno and the team at scale have been part of this. We've actually had partner summits of how do we bring that ecosystem together. >> Is your stuff mostly on prime, obviously a lot of pharma IP there. So you guys have the game that poll patent thing which is well documented. You don't want to open up the kimono and start the cloth until it's releasing so. You obviously got to keep things confidential. Mix of cloud, on prime, is it 100 percent on prime? Is there some versing for the cloud? Is it a private cloud, how do you guys look at the cloud piece? >> Yeah majority of what we're doing is on prime. The profile for us is that we persist the data. So it's not. In some cases when we're doing some of the more advanced analytics we burst to the cloud for additional processors. But the model of persisting the data means that it's much more economical to have on prime instance of what we're doing. But it is a combination, but the majority of what we're doing is on prime. >> So will you hold on Jim, one more question. I mean obviously everyone's knocking on your door. You know how to get in that account. They spend a lot of money. But you're pretty disciplined it sounds like you've got to a good view of you don't want people to come in and turn into someone that you don't want them to be. But you also run R&D so you got to have to understand the head room. How do you look at the head room of what you need down the road in terms of how you interface with the suppliers that knock on your door. Whether it's at scale currently working with you now. And then people just trying to get in there and sell you a hammer or a lawn mower. Whatever they have they're going to try, you know you're dealing with the vendor pressure. >> Right well a lot of that is around what problem we're trying to solve. And we drive all of that based on the use cases and the value to the business. I mean and so if we identify gaps that we need to address. Some of those are more specific to life sciences types of challenges where they're very specific types of tools that the population of partners is quite small. And other things. We're building an actual production, operational environment. We're not building a proof of concept, so security is extremely important. We're coberosa enabled end to end to out rest inflight. Which means it breaks some of the tools and so there's criteria of things that need to be in place in order to... >> So you got anything about scale big time? So not just putting a beach head together. But foundationally building out platform. Having the tools that fit general purpose and also specialty but scales a big thing right? >> And it's also we're addressing what we see is three different cohorts of consumers of the data. One is more in the guided analytics, the more traditional dashboards, reports. One is in more of computational notebooks, more of the scientific using R, Python, other languages. The third is more kind of almost at the bare middle level machine learning, tenser flow a number of tools that people directly interact. People don't necessarily fit nicely into those three cohorts so we're also seeing that, there's a blend. And that's something that we're also... >> There's a fourth cohort. >> Yeah well you know someone's using a computational notebook but they want to draw upon a dashboard graphic. And then they want to run a predefined tenser flow and pull all that together so. >> And what you just said, tied up the question I was going to ask. So it's perfect so. One of my core focuses is as a Wikibon analyst is on deep learning. On AI so in semantic data virtualization in a life sciences pharma context. You have undoubtedly a lot of image data, visual data. So in terms of curating that and enabling you know virtualized access to what extent are you using deep learning, tenser flow, convolutional neural networks to be able to surface up the visual patterns that can conceivably be searched using a variety of techniques. Is that a part of your overall implementation of at scale for your particular use cases currently? Or do you plan to go there in terms of like tenser flow? >> No I mean we're active, very active. In deep learning, artificial intelligence, machine learning. Again it depends on which problem you're trying to solve and so we again, there's a number of components that come together when you're looking at the image analytics. Verses using data to drive out certain decisions. But we're acting in all of those areas. Our ultimate goal is to transform the way that R&D is done within a pharmaceutical company. To accelerate the, right now it takes somewhere between five and 15 years to develop a new medicine. The goal is to really to do a lot more analytics to shorten that time significantly. Helps the patients, gets the medicines to market faster. >> That's your end game you've got to create an architecture that enables the data to add value. >> Right. >> The business. Doctor Mark Ramsey thanks so much for sharing the insight from your environment. Bruno you got something there to show us. What do you got there? He always brings a prop on. >> A few years ago I think I had a tattoo on my neck or something like this. But I'm happy that I brought this because you could see how big Mark's vision is. the reason why he's getting recognized by club they're on the data awards and so forth. Is because he's got a huge vision and it's a great opportunity for a lot of CTOs out there. I think the average CEO spent a 100 million dollars to deploy big data solutions over the last five years. But they're not able to consumer all the data they produce. I think in your case you consume about a 100 percent of the instructor data. And the average in this space is we're able to consume about one percent of the data. And this is essentially the analogy today that you're dealing with if you're on the enterprise. We'd spent a lot of time putting data in large systems and so forth. But the tool set that we give, that you did officers in their team is a cocktail straw lik this in order to drink out of it. >> That's a data lake actually. >> It's an actual lake. It's a Slurpee cup. Multiple Slurpees with the same straw. >> Who has the Hudson river water here? >> I can't answer that question I think I'd have to break a few things if I did. But the idea here is that it's not very satisfying. Enough the frustration business users and business units. When at scale's done is we built this, this is the straw you want. So I would kind of help CTOs contemplate this idea of the Slurpee and the cocktail straw. How much money are you spending here and how much money are you spending there. Because the speed at which you can get the insights to the business user. >> You got to get that straw you got to break it down so it's available everywhere. So I think that's a great innovation and it makes me thirsty. >> You know what, you can have it. >> Bruno thanks for coming from at scale. Doctor Mark Ramsey good to see you again great to have you come back. Again anytime love to have chief data officers on. Really a pioneering position, is the critical position in all organizations. It will be in the future and will continue being. Thanks for sharing your insights. It's the Cube, more live coverage after this short break. (tech music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by, And Bruno as he's the CMO at Fscale, So Bruno I want to start with you And of course some of the announcements that we're doing. And you guys really have a Switzerland And you don't want to move your users You run R&D so you got that in the format that it is on the platform. It's really an agility model kind to data. So it is, it gives you that agile cycles to go through. And where do you plan to go and day out then you can move it to a physicalized version. When you know that particular data, particular views But the nice thing is you can experiment You know where you have dozens upon dozens of databases So it's not platformed disparate it's on the platform So one of the things we're focused on So you have the flexibility with at scale. Things that are a winner you check the box, You get crowd sourcing benefits So the curation goes on So one of the things in the life sciences industry you got to have a variety of things in your arsenal. So one of the challenges is that a lot of the folks Bruno and the team at scale have been part of this. So you guys have the game that poll patent thing but the majority of what we're doing is on prime. of what you need down the road and the value to the business. So you got anything about scale big time? more of the scientific using R, Python, other languages. Yeah well you know someone's using to what extent are you using deep learning, Helps the patients, gets the medicines to market faster. that enables the data to add value. Bruno you got something there to show us. that you did officers in their team is a cocktail straw It's a Slurpee cup. Because the speed at which you can get the insights you got to break it down so it's available everywhere. Doctor Mark Ramsey good to see you again

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
James Kobielus	PERSON	0.99+
Mark	PERSON	0.99+
Bruno	PERSON	0.99+
New York City	LOCATION	0.99+
John Furrier	PERSON	0.99+
20 miles	QUANTITY	0.99+
Mark Ramsey	PERSON	0.99+
100 percent	QUANTITY	0.99+
12 lawnmowers	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
100 million dollars	QUANTITY	0.99+
Fscale	ORGANIZATION	0.99+
third	QUANTITY	0.99+
dozens	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
One	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Python	TITLE	0.99+
today	DATE	0.99+
Bruno Aziza	PERSON	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.98+
each	QUANTITY	0.98+
fourth cohort	QUANTITY	0.98+
NYC	LOCATION	0.98+
Cube	ORGANIZATION	0.98+
Hudson river	LOCATION	0.98+
Vivens	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
three cohorts	QUANTITY	0.98+
Doctor	PERSON	0.98+
billions of rows	QUANTITY	0.97+
Ten X	QUANTITY	0.97+
tomorrow	DATE	0.97+
two guests	QUANTITY	0.97+
one format	QUANTITY	0.97+
thousands of rows	QUANTITY	0.97+
BIGDATA	ORGANIZATION	0.97+
prime	COMMERCIAL_ITEM	0.96+
one more question	QUANTITY	0.96+
couple years ago	DATE	0.96+
Dr.	PERSON	0.96+
agile	TITLE	0.96+
R&D	ORGANIZATION	0.95+
two different scenarios	QUANTITY	0.95+
about one percent	QUANTITY	0.95+
five	QUANTITY	0.93+
Strata Data	ORGANIZATION	0.93+
three different cohorts	QUANTITY	0.92+
Mid Town Manhattan	LOCATION	0.92+
dozens of databases	QUANTITY	0.92+
Wikibon	ORGANIZATION	0.92+
ten,	DATE	0.89+
about a 100 percent	QUANTITY	0.89+
BigData	ORGANIZATION	0.88+
2017	DATE	0.86+
one area	QUANTITY	0.81+
BIGDATA New York City 2017	EVENT	0.79+
last five years	DATE	0.78+
15 years ago	DATE	0.78+
about a dozen different technologies	QUANTITY	0.76+
A few years ago	DATE	0.76+
one end	QUANTITY	0.74+
Glasgow Pharma	ORGANIZATION	0.7+
things	QUANTITY	0.69+
R	TITLE	0.65+

Tendü Yogurtçu | BigData SV 2017

>> Announcer: Live from San Jose, California. It's The Cube, covering Big Data Silicon Valley 2017. (upbeat electronic music) >> California, Silicon Valley, at the heart of the big data world, this is The Cube's coverage of Big Data Silicon Valley in conjunction with Strata Hadoop, well of course we've been here for multiple years, covering Hadoop World for now our eighth year, now that's Strata Hadoop but we do our own event, Big Data SV in New York City and Silicon Valley, SV NYC. I'm John Furrier, my cohost George Gilbert, analyst at Wikibon. Our next guest is Tendü Yogurtçu with Syncsort, general manager of the big data, did I get that right? >> Yes, you got it right. It's always a pleasure to be at The Cube. >> (laughs) I love your name. That's so hard for me to get, but I think I was close enough there. Welcome back. >> Thank you. >> Great to see you. You know, one of the things I'm excited about with Syncsort is we've been following you guys, we talk to you guys every year, and it just seems to be that every year, more and more announcements happen. You guys are unstoppable. You're like what Amazon does, just more and more announcements, but the theme seems to be integration. Give us the latest update. You had an update, you bought Trillium, you got a hit deal with Hortonworks, you got integrated with Spark, you got big news here, what's the news here this year? >> Sure. Thank you for having me. Yes, it's very exciting times at Syncsort and I've probably say that every time I appear because every time it's more exciting than the previous, which is great. We bought Trillium Software and Trillium Software has been leading data quality over a decade in many of the enterprises. It's very complimentary to our data integration, data management portfolio because we are helping our customers to access all of their enterprise data, not just the new emerging sources in the connected devices and mobile and streaming. Also leveraging reference data, my main frame legacy systems and the legacy enterprise data warehouse. While we are doing that, accessing data, data lake is now actually, in some cases, turning into data swamp. That was a term Dave Vellante used a couple of years back in one of the crowd chats and it's becoming real. So, data-- >> Real being the data swamps, data lakes are turning into swamps because they're not being leveraged properly? >> Exactly, exactly. Because it's about also having access to write data, and data quality is very complimentary because dream has had trusted right data, so to enterprise customers in the traditional environments, so now we are looking forward to bring that enterprise trust of the data quality into data lake. In terms of the data integration, data integration has been always very critical to any organization. It's even more critical now that the data is shifting gravity and the amount of data organizations have. What we have been delivering in very large enterprise production environments for the last three years is we are hearing our competitors making announcements in those areas very recently, which is a validation because we are already running in very large production environments. We are offering value by saying "Create your applications for integrating your data," whether it's in the cloud or originating on the cloud or origination on the main frames, whether it's on the legacy data warehouse, you can deploy the same exact application without any recompilations, without any changes on your standalone Windows laptop or in Hadoop MapReduce, or Spark in the cloud. So this design once and deploy anywhere is becoming more and more critical with data, it's originating in many different places and cloud is definitely one of them. Our data warehouse optimization solution with Hortonworks and AtScale, it's a special package to accelerate this adoption. It's basically helping organizations to offload the workload from the existing Teradata or Netezza data warehouse and deploying in Hadoop. We provide a single button to automatically map the metadata, create the metadata in Hive or on Hadoop and also make the data accessible in the new environment and AtScale provides fast BI on top of that. >> Wow, that's amazing. I want to ask you a question, because this is a theme, so I just did a tweetup just now while you were talking saying "the theme this year is cleaning up the data lakes, or data swamps, AKA data lakes. The other theme is integration. Can you just lay out your premise on how enterprises should be looking at integration now because it's the multi-vendor world, it's the multi-cloud world, multi-data type and source with metadata world. How do you advise customers that have the plethora of action coming at them. IOT, you've got cloud, you've got big data, I've got Hadoop here, I got Spark over here, what's the integration formula? >> First thing is identify your business use cases. What's your business's challenge, what's your business goals, and the challenge, because that should be the real driver. We assist in some organizations, they start with the intention "we would like to create a data lake" without having that very clear understanding, what is it that I'm trying to solve with this data lake? Data as a service is really becoming a theme across multiple organizations, whether it's on the enterprise side or on some of the online retail organizations, for example. As part of that data as a service, organizations really need to adopt tools that are going to enable them to take advantage of the technology stack. The technology stack is evolving very rapidly. The skill sets are rare, and skill sets are rare because you need to be kind of making adjustments. Am I hiring Ph.D students who can program Scala in the most optimized way, or should I hire Java developers, or should I hire Python developers, the names of the tools in the stack, Spark one versus Spark two APIs, change. It's really evolving very rapidly. >> It's hard to find Scala developers, I mean, you go outside Silicon Valley. >> Exactly. So you need to be, as an organization, ours advises that you really need to find tools that are going to fit those business use cases and provide a single software environment, that data integration might be happening on premise now, with some of the legacy enterprise data warehouse, and it might happen in a hybrid, on premise and cloud environment in the near future and perhaps completely in the cloud. >> So standard tools, tools that have some standard software behind it, so you don't get stuck in the personnel hiring problem. Some unique domain expertise that's hard to hire. >> Yes, skill set is one problem, the second problem is the fact that the applications needs to be recompiled because the stack is evolving and the APIs are not compatible with the previous version, so that's the maintenance cost to keep up with things, to be able to catch up with the new versions of the stack, that's another area that the tools really help, because you want to be able to develop the application and deploy it anywhere in any complete platform. >> So Tendü, if I hear you properly, what you're saying is integration sounds great on paper, it's important, but there's some hidden costs there, and that is the skill set and then there's the stack recompiling, I'm making sure. Okay, that's awesome. >> The tools help with that. >> Take a step back and zoom out and talk about Syncsort's positioning, because you guys have been changing with the stacks as well, I mean you guys have been doing very well with the announcements, you've been just coming on the market all the time. What is the current value proposition for Syncsort today? >> The current value proposition is really we have organizations to create the next generation modern data architecture by accessing and liberating all enterprise data and delivering that data at the right time and the right quality data. It's liberate, integrate, with integrity. That's our value proposition. How do we do that? We provide that single software environment. You can have batch legacy data and streaming data sources integrated in the same exact environment and it enables you to adapt to Spark 2 or Flink or whichever complete framework is going to help them. That has been our value proposition and it is proven in many production deployments. >> What's interesting to is the way you guys have approached the market. You've locked down the legacy, so you have, we talk about the main frame and well beyond that now, you guys have and understand the legacy, so you kind of lock that down, protect it, make it secure, it's security-wise, but you do that too, but making sure it works because it's still data there, because legacy systems are really critical in the hybrid. >> Main frame expertise and heritage that we have is a critical part of our offering. We will continue to focus on innovation on the main frame side as well as on the distributed. One of the announcements that we made since our last conversation was we have partnership with Compuware and we now bring in more data types about application failures, it's a Band-Aid data to Splunk for operational intelligence. We will continue to also support more delivery types, we have batch delivery, we have streaming delivery, and now replication into Hadoop has been a challenge so our focus is now replication from the B2 on mainframe and ISA on mainframe to Hadoop environments. That's what we will continue to focus on, mainframe, because we have heritage there and it's also part of big enterprise data lake. You cannot make sense of the customer data that you are getting from mobile if you don't reference the critical data sets that are on the mainframe. With the Trillium acquisition, it's very exciting because now we are at a kind of pivotal point in the market, we can bring that data validation, cleansing, and matching superior capabilities we have to the big data environments. One of the things-- >> So when you get in low latency, you guys do the whole low latency thing too? You bring it in fast? >> Yes, we bring it, that's our current value proposition and as we are accessing this data and integrating this part of the data lake, now we have capabilities with Trillium that we can profile that data, get statistics and start using machine learning to automate the data steward's job. Data stewards are still spending 75% of their time trying to clean the data. So if we can-- >> Lot of manual work labor there, and modeling too, by the way, the modeling and just the cleaning, cleaning and modeling kind of go hand in hand. >> Exactly. If we can automate any of these steps to drive the business rules automatically and provide right data on the data lake, that would be very valuable. This is what we are hearing from our customers as well. >> We've heard probably five years about the data lake as the center of gravity of big data, but we're hearing at least a bifurcation, maybe more, where now we want to take that data and apply it, operationalize it in making decisions with machine learning, predictive analytics, but at the same time we're trying to square this strange circle of data, the data lake where you didn't say up front what you wanted it to look like but now we want ever richer metadata to make sense out of it, a layer that you're putting on it, the data prep layer, and others are trying to put different metadata on top of it. What do you see that metadata layer looking like over the next three to five years? >> The governance is a very key topic and social organizations who are ahead of the game in the big data and who already established that data lake, data governance and even analytics governance becomes important. What we are delivering here with Trillium, we will have generally available by end of Q1. We are basically bringing business rules to the data. Instead of bringing data to business rules, we are taking the business rules and deploying them where the data exists. That will be key because of the data gravity you mentioned because the data might be in the Hadoop environment, there might be in a, like I said, enterprise data warehouse, and it might be originating in the cloud, and you don't want to move the data to the business rules. You want to move the business rules to where the data exists. Cloud is an area that we see more and more of our customers are moving forward. Two main use cases around our integration is one, because the data is originating in cloud, and the second one is archiving data to cloud, and we announced actually, tighter integration with cloud with our director earlier this week for this event, and that we have been in cloud deployments and we have actually an offering, an elastic MapReduce already and on AC too for couple of years now, and also on the Google cloud storage, but this announcement is primarily making deployments even easier by leveraging cloud director's elasticity for increasing and reducing the deployment. Now our customers will also take advantage of integration jobs from that elasticity. >> Tendü, it's great to have you on The Cube because you have an engineering mind but you're also now general manager of the business, and your business is changing. You're in the center of the action, so I want to get your expertise and insight into enterprise readiness concept and we saw last week at Google Cloud 2017, you know, Google going down the path of being enterprise ready, or taking steps, I don't think they're fully ready, but they're certainly serious about the cloud on the enterprise, and that's clear from Diane Green, who knows the enterprise. It sparked the conversation last week, around what does enterprise readiness mean for cloud players, because there's so many details in between the lines, if you will, of what products are, that integration, certification, SLAs. What's your take on the notion of cloud readiness? Vizaviz, Google and others that are bringing cloud compute, a lot of resources, with an IOT market that's now booming, big data evolving very, very fast, lot of realtime, lot of analytics, lot of innovation happening. What's the enterprise picture look like from a readiness standpoint? How do these guys get ready? >> From a big picture, for enterprise there are couple of things that these cannot be afterthought. Security, metadata lineage is part of data governance, and being able to have flexibility in the architecture, that they will not be kind of recreating the jobs that they might have all the way to deployed and on premise environments, right? To be able to have the same application running from on premise to cloud will be critical because it gives flexibility for adaptation in the enterprise. Enterprise may have some MapReduce jobs running on premise with the Spark jobs on cloud because they are really doing some predictive analytics, graph analytics on those, they want to be able to kind of have that flexible architecture where we hear this concept of a hybrid environment. You don't want to be deploying a completely different product in the cloud and redo your jobs. That flexibility of architecture, flexibility-- >> So having different code bases in the cloud versus on prem requires two jobs to do the same thing. >> Two jobs for maintaining, two jobs for standardizing, and two different skill sets of people potentially. So security, governance, and being able to access easily and have applications move in between environments will be very critical. >> So seamless integration between clouds and on prem first, and then potentially multi-cloud. That's table stakes in your mind. >> They are absolutely table stakes. A lot of vendors are trying to focus on that, definitely Hadoop vendors are also focusing on that. Also, one of the things, like when people talk about governance, the requirements are changing. We have been talking about single view and customer 360 for a while now, right? Do we have it right yet? The enrichment is becoming a key. With Trillium we made the recent announcement, the precise enriching, it's not just the address that you want to deliver and make sure that address should be correct, it's also the email address, and the phone number, is it mobile number, is it landline? It's enriched data sets that we have to be really dealing, and there's a lot of opportunity, and we are really excited because data quality, discovery and integration are coming together and we have a good-- >> Well Tendü, thank you for joining us, and congratulations as Syncsort broadens their scope to being a modern data platform solution provider for companies, congratulations. >> Thank you. >> Thanks for coming. >> Thank you for having me. >> This is The Cube here live in Silicon Valley and San Jose, I'm John Furrier, George Gilbert, you're watching our coverage of Big Data Silicon Valley in conjunction with Strata Hadoop. This is Silicon Angles, The Cube, we'll be right back with more live coverage. We've got two days of wall to wall coverage with experts and pros talking about big data, the transformations here inside The Cube. We'll be right back. (upbeat electronic music)

Published Date : Mar 14 2017

SUMMARY :

It's The Cube, covering Big Data Silicon Valley 2017. general manager of the big data, did I get that right? Yes, you got it right. That's so hard for me to get, but more announcements, but the theme seems to be integration. a decade in many of the enterprises. on Hadoop and also make the data accessible in it's the multi-cloud world, multi-data type it's on the enterprise side or on some It's hard to find Scala developers, I mean, the near future and perhaps completely in the cloud. get stuck in the personnel hiring problem. another area that the tools really help, So Tendü, if I hear you properly, what you're coming on the market all the time. and delivering that data at the right the legacy, so you kind of lock that down, One of the announcements that we made since automate the data steward's job. the modeling and just the cleaning, and provide right data on the data lake, data, the data lake where you didn't say the data to the business rules. many details in between the lines, if you will, kind of recreating the jobs that they might code bases in the cloud versus on prem So security, governance, and being able to on prem first, and then potentially multi-cloud. it's also the email address, and Well Tendü, thank you for the transformations here inside The Cube.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
John Furrier	PERSON	0.99+
two jobs	QUANTITY	0.99+
Two jobs	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
75%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Diane Green	PERSON	0.99+
San Jose, California	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Syncsort	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
second problem	QUANTITY	0.99+
last week	DATE	0.99+
Compuware	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Spark 2	TITLE	0.99+
one	QUANTITY	0.99+
one problem	QUANTITY	0.99+
Vizaviz	ORGANIZATION	0.99+
Tendü Yogurtçu	PERSON	0.99+
Spark	TITLE	0.99+
eighth year	QUANTITY	0.99+
One	QUANTITY	0.99+
five years	QUANTITY	0.99+
Two main use cases	QUANTITY	0.98+
Trillium	ORGANIZATION	0.98+
Python	TITLE	0.98+
Netezza	ORGANIZATION	0.98+
Trillium Software	ORGANIZATION	0.98+
this year	DATE	0.98+
Wikibon	ORGANIZATION	0.97+
Hortonworks	ORGANIZATION	0.97+
Hadoop	TITLE	0.97+
earlier this week	DATE	0.96+
today	DATE	0.96+
Teradata	ORGANIZATION	0.95+
Big Data Silicon Valley 2017	EVENT	0.94+
First thing	QUANTITY	0.94+
single view	QUANTITY	0.94+
big data	ORGANIZATION	0.92+
Hive	TITLE	0.92+
Java	TITLE	0.92+
The Cube	ORGANIZATION	0.92+
single button	QUANTITY	0.91+
AtScale	ORGANIZATION	0.91+
end of Q1	DATE	0.9+
single software	QUANTITY	0.9+
second one	QUANTITY	0.89+
first	QUANTITY	0.89+
California,	LOCATION	0.89+
Flink	TITLE	0.88+
Big Data	TITLE	0.88+
two different skill	QUANTITY	0.87+
Silicon Valley,	LOCATION	0.84+
360	QUANTITY	0.83+
three	QUANTITY	0.82+
last three years	DATE	0.8+
Valley	TITLE	0.79+
Google Cloud 2017	EVENT	0.79+
Windows	TITLE	0.78+
prem	ORGANIZATION	0.76+
couple of years back	DATE	0.76+
NYC	LOCATION	0.75+
two APIs	QUANTITY	0.75+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for BIGDATA New York City 2017: