Image Title

Search Results for STRATA Data:

Jack Norris | Strata Data Conference 2013


 

>>Okay. We're back here inside the cube, our flagship program about the events and extract the signal from the noise. This is strata conference. O'Reilly media is a big data event. We're talking about Hadoop analytics, data platforms, and big is come into the enterprise from the front door. As we heard them yesterday. I'm John Frey with Dave Volante, wiki.org. And we're here with Jack Norris, our cube alumni, and a favorite guest here. You're a in charge executive at map. Our, you guys are leading the charge with this use of a dupe. Welcome back to the cube. Thank you. Okay, so what's, let's chat about what's going on. What's your take on all the big news out here for the distributions. I'll the big power moose. You guys have a relationship with EMC. Okay. Exclusive relationship with those guys. Intel's got a distribution Horton versus with Microsoft, a lot of things going on. So this is your wheelhouse. So what's your take on the Hadoop action here? >>Well, I think there's an article in Forbes where I think they, they said it best. This is showing that map bars had the right strategy all along. And what we're seeing is, is basically there's a fairly low bar to taking a patchy Hadoop and providing a distribution. And so we're seeing a lot of new entrance in the market and there's, there's a lot of options. If you want to try Hadoop and experiment and get started. And then there's production class Hadoop, which includes enterprise data protection, snapshots mirrors, ability to integrate. And that's basically map R so start and test and dev with, with a lot of options and then move into production, class >>Mapbox. So break it down for the folks out there who are tipping the toe in the water and hearing all the noise. Cause it's right now, the noise level is very high, right? With the, with the recent announcements. But you guys have been doing business obviously for many years in this area. So when people say, Hey, I want to get a Hadoop distribution with enterprise. What, what should they be looking for? Okay. Because it's not that easy to kind of swing through the noise. So could you share with the folks out there, what, what to look for in like the, the table stakes, the check boxes? Cause there's a lot of claims. There's a lot of noise is this. And that is a lot of different options. Some teams have more committers or no committers than others, so that's all noise, but let's what are the key things that customers need to know? So I think there's, miling, >>There's three areas. All right. One is kind of how it integrates into your enterprise. And with Hadoop, you have the Hadoop distributed file system API. That's how you interact. Well, if you're able to also use standard tools that can use standard file and database access, it makes it much, much easier. So map ours unique and supporting NFS and making that happen. That's a, that's a big difference. The second is on dependability and there's high availability capabilities and then there's data protection. So I'll focus on snapshots as an example, you've got data replicated and Hindu. That's great. But if you have a user error, an application error, that's replicated just as quickly. So having the ability to recover and double-edged in time. Yeah. So if I can say, Hey, I made a mistake. Can I go back two minutes earlier with snapshots that makes it possible map ours, unique and snapshot support. And then finally, there's there's disaster recovery mirroring where you can go across clusters, mirror, what's going on across the land and being able to recover in the case of a disaster where you lose a whole cluster or use a whole >>Section and that's not available in >>Other, those aren't available either. That's >>NFS, >>Snapshots has been on the JIRA list for over five years. >>Yeah. Okay. So I wonder >>If I could find that and then there's third. Cause I said three and almost said two, the third is performance and scale and, but >>That'd be for >>Integration, dependability and speed. >>Okay. So dependability Jr's part of the VR snapshots. MDR. Okay. So let's talk about the performance because you guys had asked a Google's a big partner of you guys. So we should, we just had them on the cube strata. So you have to have a record setting. Do you have a record setting? EMC take that. Well, you work with DMC. So let me talk about the performance real quick. Then we'll talk about some of the EMC conversations, but performance, you have a variety of diverse performance benchmarks, Google you have within the enterprise. Can you talk about those? >>So, so what we announced this week was the minute sort world record. So minutes or runs across technologies is just, how can you, you know, how much data can you sort in 60 seconds? And if you look back at, at the previous record that was done in the labs with Microsoft with special purpose software, and they did 1.4 terabytes Hadoop hasn't been used since 2009, it's been several years because it's got features in there that work against performance. Things like checkpointing and logging because it assumes you've got long running MapReduce jobs. So we set the record with our distribution of Hadoop. So we have kind of one hand tied behind our back, given that technology. Secondly, we sent it in the cloud, which is the other hand tied behind our back because it's a, it's a virtualized environment. So we set the record with just with your legs And a 1.5 terabytes in 60 seconds. Very proud of that. >>Well, that's interesting because we've been doing a lot of labs testing, Dave and I and our teams on cost. Right. So, yeah. And it's an interesting benchmark because you always don't look at the nuance, the cost to compare a cloud performance versus bare metal. Most people don't factor into setup, cost of deployment. Exactly. So can you just quickly talk about that and how significant of an order of magnitude of your customer? >>So the, the previous Hadoop record took 3,400 servers about 27,000 cores, 13, 13,000, almost 14,000 discs and did 600 gigs, actually a little less than that at 5 78. And on Google, we did it with 2020 100 virtual instances, 8,000 cores did 1.5 terabytes >>And costs. You spin up the Google versus >>Basically if you look at that and you assume conservatively 4,000 per server, it's $13.8 million worth of hardware previously. And the cost to do that run on Google was $20 and 33 cents. >>Well, you got to discount. I mean, come on a partner mean it really costs that much. I mean, they that's what they would charge for it. Actually >>We are map artist's case on that minute. If you look at the Asheville charges to be 1200, >>Okay. It's not six millions, so millions to thousands. Yep. Okay. That's impressive. We'll have to go look at the numbers. Like we're going to look at GreenPlum's numbers in the next couple of weeks when talking about the Google relationship and men were that the up way with that was that >>Very excited about it. We're actually deployed throughout the cloud. We've got multiple partners Google's in limited preview. So we've got a number of customers kind of, you know, testing that and, and doing some really interesting things. >>So we monitor the data center market. I'll see with our proprietary tool that you know about the viewfinder and crowd spots and thing is that the data center verticals interesting, right? If you look at the sentiment analysis of what the conversation is on, on just the Twitter data, it's Facebook, apple, these companies. And when we dig into the numbers, it's not so much the companies, it's the fact that their data center operations are significantly being looked at as the leading indicator for where CEO's are going. So I want to ask you in your conversations with your customers, what are the conversations around moving to the cloud and where are they on that transition? Because we hear, yeah, one of the cloud for all the benefits you were mentioning, but Google and Facebook, these are the gold standards as, as architecture necessarily a cut and paste architecture, but they see the benefits that they're doing. So what are your conversations with your enterprise customers around the cloud cloud architecture and what other features besides replication and disaster recovery, are they, are they looking at >>Well, it's basically work, workload driven and dataset driven. So data that's already in the cloud are kind of a natural first step is, well, why don't I do the analysis there as well? So things like Google earth and digital advertising data, that's real interesting candidates for that also periodic workload. So if they have workloads that need to spin up and spin down, the, the cloud works, works really well for that. And in some cases it's driven by their own environments. They've got data centers that are approaching capacity and they need to kind of do offloads and then looking at the, at the cloud because it's easy to get up running quickly and uses an alternative. >>I want to do come back to one of your three sort of value props here, particularly the dependability piece and specifically the snapshot. So somebody asked me one time, how do you know a couple of years ago, how do you back up a petabyte as he could do this thing? And then his answer was, well, you don't know. So I want to, I want to ask you how your customers are protecting and, and, and, and what you guys are bringing to the table. >>So snapshots is not a bolt on feature. It's basically a low level feature based on the underlying data architecture. So when we architected that from the beginning, snapshots was, was a, was a core feature. And if you use a technique called redirect on, right, you're not copying the data, right? So you can do efficient, you can do a petabyte snapshot, you know, basically almost instantaneously because you're tracking the pointers of the latest blocks that have been written. So if, if the data change rate is, is basically, data's not changing, you can snapshot every minute and not have any additional storage overhead. >>Right. Okay. And, and so you can set that. So you, you map, map, our technologies will allow them to set that, dial that up, dial it down and switches. >>So we support logical volumes. So you can set policies at that volume and you can say, well, this volume is critical data. And then I can set policies. Well, critical data is every minute. And then I can change what the definition of critical data is. Maybe it's every five minutes, et cetera. So you can set up these different policies at volumes and have snapshots happen independently for each. >>Can you do that by workload or dataset or by application or whatever I get essentially provided as a service, as opposed to kind of a one size fits all approach. >>Exactly. And that, that also corresponds to user access, administrative privileges, you know, other features and policies within the, within the cluster. >>How about the, you know, this whole trend toward bringing SQL into, into Hadoop. What's, what's your take on that? And what's your angle? >>So interactive, SQL's an important aspect because you've got so many people trained in the organization and, and leverage, you know, sequel, but it's one of many use cases that needs to run across a big data platform. So there's a range of big data analytics, batch analytics, interactive capabilities with sequel, database operations, no sequel search streaming, all those are kind of functions that need to run across a platform. So it's a piece, but it's not the big driver, because what we've seen is that there's higher rival rate of machine generated data and machine generated response to respond to those for digital advertising, for recommendation engines for fraud detection can really move the needle for an organization, have huge swings and profitability >>And the ball down the field big time. Yeah. And >>Having an interactive piece with a kind of a human element involved, it doesn't really scale and work on a 24 by seven basis. >>Jack final question, we're over now by a minute. But when I ask a one party question, obviously, very competitive landscape right now in terms of competitiveness, the stakes are higher because the demand in the market market opportunities is massive. What's map ours business strategy going forward, no change in direction. Is it going to be same old, same old. You guys have any new things going down and you see the marketplace. >>We've got a huge lead when it comes to kind of mission critical enterprise grade features. And our focus is one platform. So the ability to support enterprise Hadoop, enterprise HBase and provide those full capabilities for ease of use for dependability, for performance. And, you know, we've seen a lot of companies test on one distribution and switch to map are and will continue to help that in the future. >>Well, we, we will, we will say we've been covering this big data space now going on four years now, Dave and I, and we've watched all the players pivot a few times. You guys have not, you guys have been true to your mission from day one and that we know where you stand. No one, everyone knows where you stand enterprise grade. It's a good strategy. I think everyone's putting that on their label now. So enterprise grade Washington, we call it a congratulations map art and said the cube. We'll be right back with our next guest here on day three wall-to-wall coverage at O'Reilly media. When do our news, our next from 12 to one, we'll be right back after this short break.

Published Date : Mar 4 2013

SUMMARY :

So what's your take on the Hadoop If you want to try Hadoop So could you share with the folks out there, what, what to look for in like the, the table stakes, And with Hadoop, you have the Hadoop That's If I could find that and then there's third. So let's talk about the performance because you And if you look back at, at the previous record that was done in the labs with So can you just quickly talk about that and how significant And on Google, we did it with 2020 100 virtual instances, And costs. And the cost to do that run on Google was $20 Well, you got to discount. If you look at the Asheville charges to be 1200, We'll have to go look at the numbers. So we've got a number of customers kind of, you know, testing that and, So I want to ask you in your conversations with your customers, So data that's already in the cloud are kind of a natural first step is, well, So I want to, I want to ask you how your customers are protecting and, and, So you can do efficient, you can do a petabyte snapshot, So you, you map, So you can set policies at that volume and you can say, Can you do that by workload or dataset or by application or whatever I get essentially provided as a service, you know, other features and policies within the, within the cluster. How about the, you know, this whole trend toward bringing SQL into, into Hadoop. you know, sequel, but it's one of many use cases that needs to run And the ball down the field big time. Having an interactive piece with a kind of a human element involved, and you see the marketplace. So the ability to support enterprise Hadoop, You guys have not, you guys have been true to your mission from day

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VolantePERSON

0.99+

MicrosoftORGANIZATION

0.99+

$20QUANTITY

0.99+

Jack NorrisPERSON

0.99+

John FreyPERSON

0.99+

appleORGANIZATION

0.99+

$13.8 millionQUANTITY

0.99+

DavePERSON

0.99+

600 gigsQUANTITY

0.99+

GoogleORGANIZATION

0.99+

60 secondsQUANTITY

0.99+

1.5 terabytesQUANTITY

0.99+

33 centsQUANTITY

0.99+

FacebookORGANIZATION

0.99+

3,400 serversQUANTITY

0.99+

six millionsQUANTITY

0.99+

8,000 coresQUANTITY

0.99+

EMCORGANIZATION

0.99+

O'ReillyORGANIZATION

0.99+

1200QUANTITY

0.99+

thirdQUANTITY

0.99+

thousandsQUANTITY

0.99+

AshevilleLOCATION

0.99+

millionsQUANTITY

0.99+

twoQUANTITY

0.99+

TwitterORGANIZATION

0.99+

2009DATE

0.99+

1.4 terabytesQUANTITY

0.99+

SQLTITLE

0.99+

threeQUANTITY

0.99+

yesterdayDATE

0.99+

24QUANTITY

0.99+

this weekDATE

0.99+

four yearsQUANTITY

0.99+

one partyQUANTITY

0.99+

over five yearsQUANTITY

0.99+

three areasQUANTITY

0.99+

HadoopTITLE

0.99+

OneQUANTITY

0.98+

2020DATE

0.98+

oneQUANTITY

0.98+

100 virtual instancesQUANTITY

0.97+

secondQUANTITY

0.97+

one platformQUANTITY

0.97+

first stepQUANTITY

0.97+

JackPERSON

0.97+

one timeQUANTITY

0.97+

SecondlyQUANTITY

0.95+

about 27,000 coresQUANTITY

0.94+

HBaseTITLE

0.93+

13, 13,000QUANTITY

0.93+

GreenPlumORGANIZATION

0.92+

day threeQUANTITY

0.92+

DMCORGANIZATION

0.91+

IntelORGANIZATION

0.9+

a minuteQUANTITY

0.9+

day oneQUANTITY

0.89+

Strata Data ConferenceEVENT

0.89+

4,000 per serverQUANTITY

0.89+

14,000 discsQUANTITY

0.87+

five minutesQUANTITY

0.85+

WashingtonLOCATION

0.84+

one distributionQUANTITY

0.83+

wiki.orgOTHER

0.83+

sevenQUANTITY

0.83+

couple of years agoDATE

0.83+

5 78QUANTITY

0.82+

eachQUANTITY

0.81+

JrPERSON

0.79+

12QUANTITY

0.77+

Basil Faruqui, BMC Software | BigData NYC 2017


 

>> Live from Midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (calm electronic music) >> Basil Faruqui, who's the Solutions Marketing Manger at BMC, welcome to theCUBE. >> Thank you, good to be back on theCUBE. >> So first of all, heard you guys had a tough time in Houston, so hope everything's gettin' better, and best wishes to everyone down in-- >> We're definitely in recovery mode now. >> Yeah and so hopefully that can get straightened out quick. What's going on with BMC? Give us a quick update in context to BigData NYC. What's happening, what is BMC doing in the big data space now, the AI space now, the IOT space now, the cloud space? >> So like you said that, you know, the data link space, the IOT space, the AI space, there are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline it's ingestion, storage, processing, and analytics. What keeps changing around it, is the infrastructure, the types of data, the volume of data, and the applications that surround it. And the rate of change has picked up immensely over the last few years with Hadoop coming in to the picture, public cloud providers pushing it. It's obviously creating a number of challenges, but one of the biggest challenges that we are seeing in the market, and we're helping costumers address, is a challenge of automating this and, obviously, the benefit of automation is in scalability as well and reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex, how do you automate all of this from a single point of control? How do you continue to absorb new technologies, and not re-architect our automation strategy every time, whether it's it Hadoop, whether it's bringing in machine learning from a cloud provider? And that is the issue we've been solving for customers-- >> Alright let me jump into it. So, first of all, you mention some things that never change, ingestion, storage, and what's the third one? >> Ingestion, storage, processing and eventually analytics. >> And analytics. >> Okay so that's cool, totally buy that. Now if your move and say, hey okay, if you believe that standard, but now in the modern era that we live in, which is complex, you want breath of data, but also you want the specialization when you get down to machine limits highly bounded, that's where the automation is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Correct >> How do you architect that? If I'm a CXO, or I'm a CDO, what's in it for me? How do I architect this? 'Cause that's really the number one thing, as I know what the building blocks are, but they've changed in their dynamics to the market place. >> So the way I look at it, is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot, and you spend three months on it, and you deliver some results, but if you cannot roll it out worldwide, nationwide, whatever it is, essentially the project has failed. The analogy I often given is Walmart has been testing the pick-up tower, I don't know if you've seen. So this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about a hundred stores today. Now if that's a success, and Walmart wants to roll this out nation wide, how much time do you think their IT department's going to have? Is this a five year project, a ten year project? No, and the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation, you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> But you're describing a very complex automation scenario. How can you automate in a hurry without sacrificing the details of what needs to be? In other words, there would seem to call for repurposing or reusing prior automation scripts and rules, so forth. How can the Walmart's of the world do that fast, but also do it well? >> Yeah so we do it, we go about it in two ways. One is that out of the box we provide a lot of pre-built integrations to some of the most commonly used systems in an enterprise. All the way from the Mainframes, Oracles, SAPs, Hadoop, Tableaus of the world, they're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago was that the automation was something that was considered close to the project becoming operational. Okay, and that's where a lot of rework happened because developers had been writing their own scripts using point solutions, so we said alright, it's time to shift automation left, and allow companies to build automations and artifact very early in the developmental life cycle. About a month ago, we released what we call Control-M Workbench, its essentially a community edition of Control-M, targeted towards developers so that instead of writing their own scripts, they can use Control-M in a completely offline manner, without having to connect to an enterprise system. As they build, and test, and iterate, they're using Control-M to do that. So as the application progresses through the development life cycle, and all of that work can then translate easily into an enterprise edition of Control-M. >> Just want to quickly define what shift left means for the folks that might not know software methodologies, they don't think >> Yeah, so. of left political, left or right. >> So, we're not shifting Control-M-- >> Alt-left, alt-right, I mean, this is software development, so quickly take a minute and explain what shift left means, and the importance of it. >> Correct, so if you think of software development as a straight line continuum, you've got, you will start with building some code, you will do some testing, then unit testing, then user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. Developers would come in and deliver the application to Ops and Ops would say, well hang on a second, all this Crontab, and these other point solutions we've been using for automation, that's not what we use in production, and we need you to now go right in-- >> So test early and often. >> Test early and often. So the challenge was the developers, the tools they used were not the tools that were being used on the production end of the site. And there was good reason for it, because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Now Control-M Workbench is a very light version, which is targeted at developers and focuses on the needs that they have when they're building and developing it. So as the application progresses-- >> How much are you seeing waterfall-- >> But how much can they, go ahead. >> How much are you seeing waterfall, and then people shifting left becoming more prominent now? What percentage of your customers have moved to Agile, and shifting left percentage wise? >> So we survey our customers on a regular basis, and the last survey showed that eighty percent of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it, And that's the other-- >> And getting close to a 100 as possible, pretty much. >> Yeah, exactly. The tipping point is reached. >> And what is driving. >> What is driving all is the need from the business. The days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. >> Iteration, yeah, yeah. And we have also innovated in that space, and the approach we call jobs as code, where you can build entire complex data pipelines in code format, so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and I'll let you take the floor and get a word in soon, but I have one final question on this BMC methodology thing. You guys have a history, obviously BMC goes way back. Remember Max Watson CEO, and Bob Beach, back in '97 we used to chat with him, dominated that landscape. But we're kind of going back to a systems mindset. The question for you is, how do you view the issue of this holy grail, the promised land of AI and machine learning, where end-to-end visibility is really the goal, right? At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So there's a trade-off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market? Because customers want the end-to-end promise, but they don't want to try to get there too fast. There's a diseconomies of scale potentially. How do you talk about that? >> Correct. >> And that's exactly the approach we've taken with Control-M Workbench, the Community Edition, because earlier on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build and test and show value, okay, and they don't need something that is with all the bells and whistles. We're allowing you to handle that piece, in that manner, through Control-M Workbench. As things progress and the application progresses, the needs change as well. Well now I'm closer to delivering this to the business, I need to be able to manage this within an SLA, I need to be able to manage this end-to-end and connect this to other systems of record, and streaming data, and clickstream data, all of that. So that, we believe that it doesn't have to be a trade off, that you don't have to compromise speed and quality for end-to-end visibility and enterprise grade automation. >> You mentioned trade offs, so the Control-M Workbench, the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation when the tool's offline? I mean it seems like the more development they do offline, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate, the mitigation risk in using Control-M Workbench. >> Sure, so we spend a lot of time observing how developers work, right? And very early in the development stage, all they're doing is working off of their Mac or their laptop, and they're not really connected to any. And that is where they end up writing a lot of scripts, because whatever code business logic they've written, the way they're going to make it run is by writing scripts. And that, essentially, becomes the problem, because then you have scripts managing more scripts, and as the application progresses, you have this complex web of scripts and Crontabs and maybe some opensource solutions, trying to simply make all of this run. And by doing this on an offline manner, that doesn't mean that they're losing all of the other Control-M capabilities. Simply, as the application progresses, whatever automation that the builtin Control-M can seamlessly now flow into the next stage. So when you are ready to take an application into production, there's essentially no rework required from an automation perspective. All of that, that was built, can now be translated into the enterprise-grade Control M, and that's where operations can then go in and add the other artifacts, such as SLA management and forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives, 'cause, so you're like an analyst here, so Jim, I want you guys to comment. My question to both of you would be, lookin' at this time in history, obviously in the BMC side we mention some of the history, you guys are transforming on a new journey in extending that capability of this world. Jim, you're covering state-of-the-art AI machine learning. What's your take of this space now? Strata Data, which is now Hadoop World, which is Cloud Air went public, Hortonworks is now public, kind of the big, the Hadoop guys kind of grew up, but the world has changed around them, it's not just about Hadoop anymore. So I'd like to get your thoughts on this kind of perspective, that we're seeing a much broader picture in big data in NYC, versus the Strata Hadoop show, which seems to be losing steam, but I mean in terms of the focus. The bigger focus is much broader, horizontally scalable. And your thoughts on the ecosystem right now? >> Let the Basil answer fist, unless Basil wants me to go first. >> I think that the reason the focus is changing, is because of where the projects are in their lifecycle. Now what we're seeing is most companies are grappling with, how do I take this to the next level? How do I scale? How do I go from just proving out one or two use cases to making the entire organization data driven, and really inject data driven decision making in all facets of decision making? So that is, I believe what's driving the change that we're seeing, that now you've gone from Strata Hadoop to being Strata Data, and focus on that element. And, like I said earlier, the difference between success and failure is your ability to scale and operationalize. Take machine learning for an example. >> Good, that's where there's no, it's not a hype market, it's show me the meat on the bone, show me scale, I got operational concerns of security and what not. >> And machine learning, that's one of the hottest topics. A recent survey I read, which pulled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models, and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of a data scientist's time, and that is exactly one of the problems we're solving for our customers around the world. >> That needs to be automated to the hilt. To help them >> Correct. to be more productive, to deliver faster results. >> Ecosystem perspective, Jim, what's your thoughts? >> Yeah, everything that Basil said, and I'll just point out that many of the core uses cases for AI are automation of the data pipeline. It's driving machine learning driven predictions, classifications, abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's goin' on. The history, historical data, what's goin' on in terms of current streaming data, to drive optimal outcomes, using predictive models and so forth, in line to applications. So really, fundamentally then, what's goin' on is that automation is an artifact that needs to be driven into your application architecture as a repurposable resource for a variety of-- >> Do customers even know what to automate? I mean, that's the question, what do I-- >> You're automating human judgment. You're automating effort, like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that can be automated, 'cause those are pattern structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on with a Glass'Gim CSK, with that scale, and his attitude is, we see the results from the users, then we double down and pay for it and automate it. So the automation question, it's an option question, it's a rhetorical question, but it just begs the question, which is who's writing the algorithms as machines get smarter and start throwing off their own real-time data? What are you looking at? How do you determine? You're going to need machine learning for machine learning? Are you going to need AI for AI? Who writes the algorithms >> It's actually, that's. for the algorithm? >> Automated machine learning is a hot, hot not only research focus, but we're seeing it more and more solution providers, like Microsoft and Google and others, are goin' deep down, doubling down in investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion. I see you're startin' to some things with blockchain and some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now? Because people might have a historical view of BMC. What's the latest, what should they know? What's the new Instagram picture of BMC? What should they know about you guys? >> So I think what I would say people should know about BMC is that all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very, very mature solution, an example of which is Navistar. Their CIO's actually speaking at the Keno tomorrow. They've had Control-M for 15, 20 years, and they've automated virtually every business function through Control-M. And when they started their predictive maintenance project, where they're ingesting data from about 300,000 vehicles today to figure out when this vehicle might break, and to predict maintenance on it. When they started their journey, they said that they always knew that they were going to use Control-M for it, because that was the enterprise standard, and they knew that they could simply now extend that capability into this area. And when they started about three, four years ago, they were ingesting data from about 100,000 vehicles. That has now scaled to over 325,000 vehicles, and they have no had to re-architect their strategy as they grow and scale. So I would say that is one of the key messages that we are taking to market, is that we are bringing innovation that spans over 25 years, and evolving it-- >> Modernizing it, basically. >> Modernizing it, and bringing it to newer platforms. >> Well congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing kind of the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here, on BigData NYC, this is the theCUBE, I'm John Furrier. Jim Kobielus here in New York city. More live coverage, for three days we'll be here, today, tomorrow and Thursday, and BigData NYC, more coverage after this short break. (calm electronic music) (vibrant electronic music)

Published Date : Feb 11 2019

SUMMARY :

Brought to you by SiliconANGLE Media who's the Solutions Marketing Manger at BMC, in the big data space now, the AI space now, And that is the issue we've been solving for customers-- So, first of all, you mention some things that never change, and eventually analytics. but now in the modern era that we live in, 'Cause that's really the number one thing, No, and the management's going to How can the Walmart's of the world do that fast, One is that out of the box we provide a lot of left political, left or right. Alt-left, alt-right, I mean, this is software development, and we need you to now go right in-- and focuses on the needs that they have And getting close to a 100 The tipping point is reached. The days of the five year implementation timelines are gone. and the approach we call jobs as code, At the same time, you want bounded experiences at root level And that's exactly the approach I mean it seems like the more development and as the application progresses, kind of the big, the Hadoop guys kind of grew up, Let the Basil answer fist, and focus on that element. it's not a hype market, it's show me the meat of the problems we're solving That needs to be automated to the hilt. to be more productive, to deliver faster results. and I'll just point out that many of the core uses cases like the judgments that a working data engineer makes So the automation question, it's an option question, for the algorithm? doubling down in investments in exactly that area. What's the latest, what should they know? should know about BMC is that all the work kind of modernizing kind of the core things. Thanks for coming and sharing the BMC perspective

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

Jim KobielusPERSON

0.99+

WalmartORGANIZATION

0.99+

BMCORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

NYCLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

oneQUANTITY

0.99+

Basil FaruquiPERSON

0.99+

five yearQUANTITY

0.99+

ten monthsQUANTITY

0.99+

two weeksQUANTITY

0.99+

three monthsQUANTITY

0.99+

six monthsQUANTITY

0.99+

John FurrierPERSON

0.99+

15QUANTITY

0.99+

BasilPERSON

0.99+

HoustonLOCATION

0.99+

HortonworksORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

MacCOMMERCIAL_ITEM

0.99+

BMC SoftwareORGANIZATION

0.99+

two waysQUANTITY

0.99+

bothQUANTITY

0.99+

tomorrowDATE

0.99+

Midtown ManhattanLOCATION

0.99+

OneQUANTITY

0.99+

ten yearQUANTITY

0.99+

over 25 yearsQUANTITY

0.99+

over 325,000 vehiclesQUANTITY

0.99+

about 300,000 vehiclesQUANTITY

0.99+

third oneQUANTITY

0.99+

three daysQUANTITY

0.99+

about 100,000 vehiclesQUANTITY

0.99+

about 80%QUANTITY

0.98+

BigDataORGANIZATION

0.98+

ThursdayDATE

0.98+

eighty percentQUANTITY

0.98+

todayDATE

0.98+

20 yearsQUANTITY

0.98+

one quick questionQUANTITY

0.98+

single pointQUANTITY

0.98+

Bob BeachPERSON

0.97+

four years agoDATE

0.97+

two use casesQUANTITY

0.97+

one final questionQUANTITY

0.97+

'97DATE

0.97+

InstagramORGANIZATION

0.97+

AgileTITLE

0.96+

New York cityLOCATION

0.96+

About a month agoDATE

0.96+

OraclesORGANIZATION

0.96+

HadoopTITLE

0.95+

about a hundred storesQUANTITY

0.94+

less than 3%QUANTITY

0.94+

2017DATE

0.93+

Glass'GimORGANIZATION

0.92+

aboutQUANTITY

0.92+

firstQUANTITY

0.91+

OpsORGANIZATION

0.91+

HadoopORGANIZATION

0.9+

Max WatsonPERSON

0.88+

100QUANTITY

0.88+

theCUBEORGANIZATION

0.88+

MainframesORGANIZATION

0.88+

NavistarORGANIZATION

0.86+

Basil Faruqui, BMC | theCUBE NYC 2018


 

(upbeat music) >> Live from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone to theCUBE NYC. This is theCUBE's live coverage covering CubeNYC Strata Hadoop Strata Data Conference. All things data happen here in New York this week. I'm John Furrier with Peter Burris. Our next guest is Basil Faruqui lead solutions marketing manager digital business automation within BMC returns, he was here last year with us and also Big Data SV, which has been renamed CubeNYC, Cube SV because it's not just big data anymore. We're hearing words like multi cloud, Istio, all those Kubernetes. Data now is so important, it's now up and down the stack, impacting everyone, we talked about this last year with Control M, how you guys are automating in a hurry. The four pillars of pipelining data. The setup days are over; welcome to theCUBE. >> Well thank you and it's great to be back on theCUBE. And yeah, what you said is exactly right, so you know, big data has really, I think now been distilled down to data. Everybody understands data is big, and it's important, and it is really you know, it's quite a cliche, but to a larger degree, data is the new oil, as some people say. And I think what you said earlier is important in that we've been very fortunate to be able to not only follow the journey of our customers but be a part of it. So about six years ago, some of the early adopters of Hadoop came to us and said that look, we use your products for traditional data warehousing on the ERP side for orchestration workloads. We're about to take some of these projects on Hadoop into production and really feel that the Hadoop ecosystem is lacking enterprise-grade workflow orchestration tools. So we partnered with them and some of the earliest goals they wanted to achieve was build a data lake, provide richer and wider data sets to the end users to be able to do some dashboarding, customer 360, and things of that nature. Very quickly, in about five years time, we have seen a lot of these projects mature from how do I build a data lake to now applying cutting-edge ML and AI and cloud is a major enabler of that. You know, it's really, as we were talking about earlier, it's really taking away excuses for not being able to scale quickly from an infrastructure perspective. Now you're talking about is it Hadoop or is it S3 or is it Azure Blob Storage, is it Snowflake? And from a control-end perspective, we're very platform and technology agnostic, so some of our customers who had started with Hadoop as a platform, they are now looking at other technologies like Snowflake, so one of our customers describes it as kind of the spine or a power strip of orchestration where regardless of what technology you have, you can just plug and play in and not worry about how do I rewire the orchestration workflows because control end is taking care of it. >> Well you probably always will have to worry about that to some degree. But I think where you're going, and this is where I'm going to test with you, is that as analytics, as data is increasingly recognized as a strategic asset, as analytics increasingly recognizes the way that you create value out of those data assets, and as a business becomes increasingly dependent upon the output of analytics to make decisions and ultimately through AI to act differently in markets, you are embedding these capabilities or these technologies deeper into business. They have to become capabilities. They have to become dependable. They have to become reliable, predictable, cost, performance, all these other things. That suggests that ultimately, the historical approach of focusing on the technology and trying to apply it to a periodic or series of data science problems has to become a little bit more mature so it actually becomes a strategic capability. So the business can say we're operating on this, but the technologies to take that underlying data science technology to turn into business operations that's where a lot of the net work has to happen. Is that what you guys are focused on? >> Yeah, absolutely, and I think one of the big differences that we're seeing in general in the industry is that this time around, the pull of how do you enable technology to drive the business is really coming from the line of business, versus starting on the technology side of the house and then coming to the business and saying hey we've got some cool technologies that can probably help you, it's really line of business now saying no, I need better analytics so I can drive new business models for my company, right? So the need for speed is greater than ever because the pull is from the line of business side. And this is another area where we are unique is that, you know, Control M has been designed in a way where it's not just a set of solutions or tools for the technical guys. Now, the line of business is getting closer and closer, you know, it's blending into the technical side as well. They have a very, very keen interest in understanding are the dashboards going to be refreshed on time? Are we going to be able to get all the right promotional offers at the right time? I mean, we're here at NYC Strata, there's a lot of real-time promotion happening here. The line of business has direct interest in the delivery and the timing of all of this, so we have always had multiple interfaces to Control M where a business user who has an interest in understanding are the promotional offers going to happen at the right time and is that on schedule? They have a mobile app for them to do that. A developer who's building up complex, multi-application platform, they have an API and a programmatic interface to do that. Operations that has to monitor all of this has rich dashboards to be able to do that. That's one of the areas that has been key for our success over the last couple decades, and we're seeing that translate very well into the big data place. >> So I just want to go under the hood for a minute because I love that answer. And I'd like to pivot off what Peter said, tying it back to the business, okay, that's awesome. And I want to learn a little bit more about this because we talked about this last year and I kind of am seeing it now. Kubernetes and all this orchestration is about workloads. You guys nailed the workflow issue, complex workflows. Because if you look at it, if you're adding line of business into the equation, that's just complexity in and of itself. As more workflows exist within its own line of business, whether it's recommendations and offers and workflow issues, more lines of business in there is complex for even IT to deal with, so you guys have nailed that. How does that work? Do you plug it in and the lines of businesses have their own developers, so the people who work with the workflows engage how? >> So that's a good question, with sort of orchestration and automation now becoming very, very generic, it's kind of important to classify where we play. So there's a lot of tools that do release and build automation. There's a lot of tools that'll do infrastructure automation and orchestration. All of this infrastructure and release management process is done ultimately to run applications on top of it, and the workflows of the application need orchestration and that's the layer that we play in. And if you think about how does the end user, the business and consumer interact with all of this technology is through applications, k? So the orchestration of the workflow's inside the applications, whether you start all the way from an ERP or a CRM and then you land into a data lake and then do an ML model, and then out come the recommendations analytics, that's the layer we are automating today. Obviously, all of this-- >> By the way, the technical complexity for the user's in the app. >> Correct, so the line of business obviously has a lot more control, you're seeing roles like chief digital officers emerge, you're seeing CTOs that have mandates like okay you're going to be responsible for all applications that are facing customer facing where the CIO is going to take care of everything that's inward facing. It's not a settled structure or science involved. >> It's evolving fast. >> It's evolving fast. But what's clear is that line of business has a lot more interest and influence in driving these technology projects and it's important that technologies evolve in a way where line of business can not only understand but take advantage of that. >> So I think it's a great question, John, and I want to build on that and then ask you something. So the way we look at the world is we say the first fifty years of computing were known process, unknown technology. The next fifty years are going to be unknown process, known technology. It's all going to look like a cloud. But think about what that means. Known process, unknown technology, Control M and related types of technologies tended to focus on how you put in place predictable workflows in the technology layer. And now, unknown process, known technology, driven by the line of business, now we're talking about controlling process flows that are being created, bespoke, strategic, differentiating doing business. >> Well, dynamic, too, I mean, dynamic. >> Highly dynamic, and those workflows in many respects, those technologies, piecing applications and services together, become the process that differentiates the business. Again, you're still focused on the infrastructure a bit, but you've moved it up. Is that right? >> Yeah, that's exactly right. We see our goal as abstracting the complexity of the underlying application data and infrastructure. So, I mean, it's quite amazing-- >> So it could be easily reconfigured to a business's needs. >> Exactly, so whether you're on Hadoop and now you're thinking about moving to Snowflake or tomorrow something else that comes up, the orchestration or the workflow, you know, that's as a business as a product that's our goal is to continue to evolve quickly and in a manner that we continue to abstract the complexity so from-- >> So I've got to ask you, we've been having a lot of conversations around Hadoop versus Kubernetes on multi cloud, so as cloud has certainly come in and changed the game, there's no debate on that. How it changes is debatable, but we know that multiple clouds is going to be the modus operandus for customers. >> Correct. >> So I got a lot of data and now I've got pipelining complexities and workflows are going to get even more complex, potentially. How do you see the impact of the cloud, how are you guys looking at that, and what are some customer use cases that you see for you guys? >> So the, what I mentioned earlier, that being platform and technology agnostic is actually one of the unique differentiating factors for us, so whether you are an AWS or an Azure or a Google or On-Prem or still on a mainframe, a lot of, we're in New York, a lot of the banks, insurance companies here still do some of the most critical processing on the mainframe. The ability to abstract all of that whether it's cloud or legacy solutions is one of our key enablers for our customers, and I'll give you an example. So Malwarebytes is one of our customers and they've been using Control M for several years. Primarily the entire structure is built on AWS, but they are now utilizing Google cloud for some of their recommendation analysis on sentiment analysis because their goal is to pick the best of breed technology for the problem they're looking to solve. >> Service, the best breed service is in the cloud. >> The best breed service is in the cloud to solve the business problem. So from Control M's perspective, transcending from AWS to Google cloud is completely abstracted for them, so runs Google tomorrow it's Azure, they decide to build a private cloud, they will be able to extend the same workflow orchestration. >> But you can build these workflows across whatever set of services are available. >> Correct, and you bring up an important point. It's not only being able to build the workflows across platforms but being able to define dependencies and track the dependencies across all of this, because none of this is happening in silos. If you want to use Google's API to do the recommendations, well, you've got to feed it the data, and the data's pipeline, like we talked about last time, data ingestion, data storage, data processing, and analytics have very, very intricate dependencies, and these solutions should be able to manage not only the building of the workflow but the dependencies as well. >> But you're defining those elements as fundamental building blocks through a control model >> Correct. >> That allows you to treat the higher level services as reliable, consistent, capabilities. >> Correct, and the other thing I would like to add here is not only just build complex multiplatform, multiapplication workflows, but never lose focus of the business service of the business process there, so you can tie all of this to a business service and then, these things are complex, there are problems, let's say there's an ETL job that fails somewhere upstream, Control M will immediately be able to predict the impact and be able to tell you this means the recommendation engine will not be able to make the recommendations. Now, the staff that's going to work under mediation understands the business impact versus looking at a screen where there's 500 jobs and one of them has failed. What does that really mean? >> Set priorities and focal points and everything else. >> Right. >> So I just want to wrap up by asking you how your talk went at Strata Hadoop Data Conference. What were you talking about, what was the core message? Was it Control M, was it customer presentations? What was the focus? >> So the focus of yesterday's talk was actually, you know, one of the things is academic talk is great, but it's important to, you know, show how things work in real life. The session was focused on a real-use case from a customer. Navistar, they have IOT data-driven pipelines where they are predicting failures of parts inside trucks and buses that they manufacture, you know, reducing vehicle downtime. So we wanted to simulate a demo like that, so that's exactly what we did. It was very well received. In real-time, we spun up EMR environment in AWS, automatically provision control of infrastructure there, we applied spark and machine learning algorithms to the data and out came the recommendation at the end was that, you know, here are the vehicles that are-- >> Fix their brakes. (laughing) >> Exactly, so it was very, very well received. >> I mean, there's a real-world example, there's real money to be saved, maintenance, scheduling, potential liability, accidents. >> Liability is a huge issue for a lot of manufacturers. >> And Navistar has been at the leading edge of how to apply technologies in that business. >> They really have been a poster child for visual transformation. >> They sure have. >> Here's a company that's been around for 100 plus years and when we talk to them they tell us that we have every technology under the sun that has come since the mainframe, and for them to be transforming and leading in this way, we're very fortunate to be part of their journey. >> Well we'd love to talk more about some of these customer use cases. Other people love about theCUBE, we want to do more of them, share those examples, people love to see proof in real-world examples, not just talk so appreciate it sharing. >> Absolutely. >> Thanks for sharing, thanks for the insights. We're here Cube live in New York City, part of CubeNYC, we're getting all the data, sharing that with you. I'm John Furrier with Peter Burris. Stay with us for more day two coverage after this short break. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media with Control M, how you guys are automating in a hurry. describes it as kind of the spine or a power strip but the technologies to take that underlying of the house and then coming to the business You guys nailed the workflow issue, and that's the layer that we play in. for the user's in the app. Correct, so the line of business and it's important that technologies evolve in a way So the way we look at the world is we say that differentiates the business. of the underlying application data and infrastructure. so as cloud has certainly come in and changed the game, and what are some customer use cases that you see for the problem they're looking to solve. is in the cloud. The best breed service is in the cloud But you can build these workflows across and the data's pipeline, like we talked about last time, That allows you to treat the higher level services and be able to tell you this means the recommendation engine So I just want to wrap up by asking you at the end was that, you know, Fix their brakes. there's real money to be saved, And Navistar has been at the leading edge of how They really have been a poster child for and for them to be transforming and leading in this way, people love to see proof in real-world examples, Thanks for sharing, thanks for the insights.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

Basil FaruquiPERSON

0.99+

Peter BurrisPERSON

0.99+

BMCORGANIZATION

0.99+

PeterPERSON

0.99+

500 jobsQUANTITY

0.99+

GoogleORGANIZATION

0.99+

New YorkLOCATION

0.99+

last yearDATE

0.99+

AWSORGANIZATION

0.99+

New York CityLOCATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

John FurrierPERSON

0.99+

HadoopTITLE

0.99+

first fifty yearsQUANTITY

0.99+

theCUBEORGANIZATION

0.99+

NavistarORGANIZATION

0.99+

tomorrowDATE

0.98+

yesterdayDATE

0.98+

oneQUANTITY

0.98+

this weekDATE

0.97+

MalwarebytesORGANIZATION

0.97+

CubeORGANIZATION

0.95+

Control MORGANIZATION

0.95+

NYCLOCATION

0.95+

SnowflakeTITLE

0.95+

Strata Hadoop Data ConferenceEVENT

0.94+

100 plus yearsQUANTITY

0.93+

CubeNYC Strata Hadoop Strata Data ConferenceEVENT

0.92+

last couple decadesDATE

0.91+

AzureTITLE

0.91+

about five yearsQUANTITY

0.91+

IstioORGANIZATION

0.9+

CubeNYCORGANIZATION

0.89+

dayQUANTITY

0.87+

about six years agoDATE

0.85+

KubernetesTITLE

0.85+

todayDATE

0.84+

NYC StrataORGANIZATION

0.83+

HadoopORGANIZATION

0.78+

one of themQUANTITY

0.77+

Big Data SVORGANIZATION

0.75+

2018EVENT

0.7+

KubernetesORGANIZATION

0.66+

fifty yearsDATE

0.62+

Control MTITLE

0.61+

four pillarsQUANTITY

0.61+

twoQUANTITY

0.6+

-PremORGANIZATION

0.6+

Cube SVCOMMERCIAL_ITEM

0.58+

a minuteQUANTITY

0.58+

S3TITLE

0.55+

AzureORGANIZATION

0.49+

cloudTITLE

0.49+

2018DATE

0.43+

Jim Franklin & Anant Chintamaneni | theCUBE NYC 2018


 

>> Live from New York. It's theCUBE. Covering theCUBE New York City, 2018. Brought to you by SiliconANGLE Media, and it's ecosystem partners. >> I'm John Furrier with Peter Burris, our next two guests are Jim Franklin with Dell EMC Director of Product Management Anant Chintamaneni, who is the Vice President of Products at BlueData. Welcome to theCUBE, good to see you. >> Thanks, John. >> Thank you. >> Thanks for coming on. >> I've been following BlueData since the founding. Great company, and the founders are great. Great teams, so thanks for coming on and sharing what's going on, I appreciate it. >> It's a pleasure, thanks for the opportunity. >> So Jim, talk about the Dell relationship with BlueData. What are you guys doing? You have the Dell-ready solutions. How is that related now, because you've seen this industry with us over the years morph. It's really now about, the set-up days are over, it's about proof points. >> That's right. >> AI and machine learning are driving the signal, which is saying, 'We need results'. There's action on the developer's side, there's action on the deployment, people want ROI, that's the main focus. >> That's right. That's right, and we've seen this journey happen from the new batch processing days, and we're seeing that customer base mature and come along, so the reason why we partnered with BlueData is, you have to have those softwares, you have to have the contenders. They have to have the algorithms, and things like that, in order to make this real. So it's been a great partnership with BlueData, it's dated back actually a little farther back than some may realize, all the way to 2015, believe it or not, when we used to incorporate BlueData with Isilon. So it's been actually a pretty positive partnership. >> Now we've talked with you guys in the past, you guys were on the cutting edge, this was back when Docker containers were fashionable, but now containers have become so proliferated out there, it's not just Docker, containerization has been the wave. Now, Kubernetes on top of it is really bringing in the orchestration. This is really making the storage and the network so much more valuable with workloads, whether respective workloads, and AI is a part of that. How do you guys navigate those waters now? What's the BlueData update, how are you guys taking advantage of that big wave? >> I think, great observation, re-embrace Docker containers, even before actually Docker was even formed as a company by that time, and Kubernetes was just getting launched, so we saw the value of Docker containers very early on, in terms of being able to obviously provide the agility, elasticity, but also, from a packaging of applications perspective, as we all know it's a very dynamic environment, and today, I think we are very happy to know that, with Kubernetes being a household name now, especially a tech company, so the way we're navigating this is, we have a turnkey product, which has containerization, and then now we are taking our value proposition of big data and AI and lifecycle management and bringing it to Kubernetes with an open source project that we launched called Cube Director under our umbrella. So, we're all about bringing stateful applications like Hadoop, AI, ML to the community and to our customer base, which is some of the largest financial services in health care customers. >> So the container revolution has certainly groped developers, and developers have always had a history of chasing after the next cool technology, and for good reason, it's not like just chasing after... Developers tend not to just chase after the shiny thing, they chased after the most productive thing, and they start using it, and they start learning about it, and they make themselves valuable, and they build more valuable applications as a result. But there's this interesting meshing of creators, makers, in the software world, between the development community and the data science community. How are data scientists, who you must be spending a fair amount of time with, starting to adopt containers, what are they looking at? Are they even aware of this, as you try to help these communities come together? >> We absolutely talk to the data scientists and they're the drivers of determining what applications they want to consume for the different news cases. But, at the end of the day, the person who has to deliver these applications, you know data scientists care about time to value, getting the environment quickly all prepared so they can access the right data sets. So, in many ways, most of our customers, many of them are unaware that there's actually containers under the hood. >> So this is the data scientists. >> The data scientists, but the actual administrators and the system administrators were making these tools available, are using containers as a way to accelerate the way they package the software, which has a whole bunch of dependent libraries, and there's a lot of complexity our there. So they're simplifying all that and providing the environment as quickly as possible. >> And in so doing, making sure that whatever workloads are put together, can scaled, can be combined differently and recombined differently, based on requirements of the data scientists. So the data scientist sees the tool... >> Yeah. >> The tool is manifest as, in concert with some of these new container related technologies, and then the whole CICD process supports the data scientist >> The other thing to think about though, is that this also allows freedom of choice, and we were discussing off camera before, these developers want to pick out what they want to pick out what they want to work with, they don't want to have to be locked in. So with containers, you can also speed that deployment but give them freedom to choose the tools that make them best productive. That'll make them much happier, and probably much more efficient. >> So there's a separation under the data science tools, and the developer tools, but they end up all supporting the same basic objective. So how does the infrastructure play in this, because the challenge of big data for the last five years as John and I both know, is that a lot of people conflated. The outcome of data science, the outcome of big data, with the process of standing up clusters, and lining up Hadoop, and if they failed on the infrastructure, they said it was a failure overall. So how you making the infrastructure really simple, and line up with this time of value? >> Well, the reality is, we all need food and water. IT still needs server and storage in order to work. But at the end of the day, the abstraction has to be there just like VMware in the early days, clouds, containers with BlueData is just another way to create a layer of abstraction. But this one is in the context of what the data scientist is trying to get done, and that's the key to why we partnered with BlueData and why we delivered big data as a service. >> So at that point, what's the update from Dell EMC and Dell, in particular, Analytics? Obviously you guys work with a lot of customers, have challenges, how are you solving those problems? What are those problems? Because we know there's some AI rumors, big Dell event coming up, there's rumors of a lot of AI involved, I'm speculating there's going to be probably a new kind of hardware device and software. What's the state of the analytics today? >> I think a lot of the customers we talked about, they were born in that batch processing, that Hadoop space we just talked about. I think they largely got that right, they've largely got that figured out, but now we're seeing proliferation of AI tools, proliferation of sandbox environments, and you're psyched to see a little bit of silo behavior happening, so what we're trying to do is that IT shop is trying to dispatch those environments, dispatch with some speed, with some agility. They want to have it at the right economic model as well, so we're trying to strike a better balance, say 'Hey, I've invested in all this infrastructure already, I need to modernize it, and that I also need to offer it up in a way that data scientists can consume it'. Oh, by the way, we're starting to see them start to hire more and more of these data scientists. Well, you don't want your data scientists, this very expensive, intelligent resource, sitting there doing data mining, data cleansing, detail offloads, we want them actually doing modeling and analytics. So we find that a lot of times right now as you're doing an operational change, the operational mindset as you're starting to hire these very expensive people to do this very good work, at the corest of the data, but they need to get productive in the way that you hired them to be productive. >> So what is this ready solution, can you just explain what that is? Is it a program, is it a hardware, is it a solution? What is the ready solution? >> Generally speaking, what we do as a division is we look for value workloads, just generally speaking, not necessarily in batch processing, or AI, or applications, and we try and create an environment that solves that customer challenge, typically they're very complex, SAP, Oracle Database, it's AI, my goodness. Very difficult. >> Variety of tools, using hives, no sequel, all this stuff's going on. >> Cassandra, you've got Tensorflow, so we try fit together a set of knowledge experts, that's the key, the intellectual property of our engineers, and their deep knowledge expertise in a certain area. So for AI, we have a sight of them back at the shop, they're in the lab, and this is what they do, and they're serving up these models, they're putting data through its paces, they're doing the work of a data scientist. They are data scientists. >> And so this is where BlueData comes in. You guys are part of this abstraction layer in the ready solutions. Offering? Is that how it works? >> Yeah, we are the software that enables the self-service experience, the multitenancy, that the consumers of the ready solution would want in terms of being able to onboard multiple different groups of users, lines of business, so you could have a user that wants to run basic spark, cluster, spark jobs, or you could have another user group that's using Tensorflow, or accelerated by a special type of CPU or GPU, and so you can have them all on the same infrastructure. >> One of the things Peter and I were talking about, Dave Vellante, who was here, he's at another event right now getting some content but, one of the things we observed was, we saw this awhile ago so it's not new to us but certainly we're seeing the impact at this event. Hadoop World, there's now called Strata Data NYC, is that we hear words like Kubernetes, and Multi Cloud, and Istio for the first time. At this event. This is the impact of the Cloud. The Cloud has essentially leveled the Hadoop World, certainly there's some Hadoop activity going on there, people have clusters, there's standing up infrastructure for analytical infrastructures that do analytics, obviously AI drives that, but now you have the Cloud being a power base. Changing that analytics infrastructure. How has it impacted you guys? BlueData, how are you guys impacted by the Cloud? Tailwind for you guys? Helpful? Good? >> You described it well, it is a tailwind. This space is about the data, not where the data lives necessarily, but the robustness of the data. So whether that's in the Cloud, whether that's on Premise, whether that's on Premise in your own private Cloud, I think anywhere where there's data that can be gathered, modeled, and new insights being pulled out of, this is wonderful, so as we ditched data, whether it's born in the Cloud or born on Premise, this is actually an accelerant to the solutions that we built together. >> As BlueData, we're all in on the Cloud, we support all the three major Cloud providers that was the big announcement that we made this week, we're generally available for AWS, GCP, and Azure, and, in particular, we start with customers who weren't born in the Cloud, so we're talking about some of the large financial services >> We had Barclays UK here who we nominated, they won the Cloud Era Data Impact Award, and what they're actually going through right now, is they started on Prem, they have these really packaged certified technology stacks, whether they are Cloud Era Hadoop, whether they are Anaconda for data science, and what they're trying to do right now is, they're obviously getting value from that on Premise with BlueData, and now they want to leverage the Cloud. They want to be able to extend into the Cloud. So, we as a company have made our product a hybrid Cloud-ready platform, so it can span on Prem as well as multiple Clouds, and you have the ability to move the workloads from one to the other, depending on data gravity, SLA considerations. >> Compliancy. >> I think it's one more thing, I want to test this with you guys, John, and that is, analytics is, I don't want to call it inert, or passive, but analytics has always been about getting the right data to human beings so they can make decisions, and now we're seeing, because of AI, the distinction that we draw between analytics and AI is, AI is about taking action on the data, it's about having a consequential action, as a result of the data, so in many respects, NCL, Kubernetes, a lot of these are not only do some interesting things for the infrastructure associated with big data, but they also facilitate the incorporation of new causes of applications, that act on behalf of the brand. >> Here's the other thing I'll add to it, there's a time element here. It used to be we were passive, and it was in the past, and you're trying to project forward, that's no longer the case. You can do it right now. Exactly. >> In many respects, the history of the computing industry can be drawn in this way, you focused on the past, and then with spreadsheets in the 80s and personal computing, you focused on getting everybody to agree on the future, and now, it's about getting action to happen right now. >> At the moment it happens. >> And that's why there's so much action. We're passed the set-up phase, and I think this is why we're hearing, seeing machine learning being so popular because it's like, people want to take action there's a demand, that's a signal that it's time to show where the ROI is and get action done. Clearly we see that. >> We're capitalists, right? We're all trying to figure out how to make money in these spaces. >> Certainly there's a lot of movement, and Cloud has proven that spinning up an instance concept has been a great thing, and certainly analytics. It's okay to have these workloads, but how do you tie it together? So, I want to ask you, because you guys have been involved in containers, Cloud has certainly been a tailwind, we agree with you 100 percent on that. What is the relevance of Kubernetes and Istio? You're starting to see these new trends. Kubernetes, Istio, Cupflow. Higher level microservices with all kinds of stateful and stateless dynamics. I call it API 2.0, it's a whole other generation of abstractions that are going on, that are creating some goodness for people. What is the impact, in your opinion, of Kubernetes and this new revolution? >> I think the impact of Kubernetes is, I just gave a talk here yesterday, called Hadoop-la About Kubernetes. We were thinking very deeply about this. We're thinking deeply about this. So I think Kubernetes, if you look at the genesis, it's all about stateless applications, and I think as new applications are being written folks are thinking about writing them in a manner that are decomposed, stateless, microservices, things like Cupflow. When you write it like that, Kubernetes fits in very well, and you get all the benefits of auto-scaling, and so control a pattern, and ultimately Kubernetes is this finite state machine-type model where you describe what the state should be, and it will work and crank towards making it towards that state. I think it's a little bit harder for stateful applications, and I think that's where we believe that the Kubernetes community has to do a lot more work, and folks like BlueData are going to contribute to that work which is, how do you bring stateful applications like Hadoop where there's a lot of interdependent services, they're not necessarily microservices, they're actually almost close to monolithic applications. So I think new applications, new AI ML tooling that's going to come out, they're going to be very conscious of how they're running in a Cloud world today that folks weren't aware of seven or eight years ago, so it's really going to make a huge difference. And I think things like Istio are going to make a huge difference because you can start in the cloud and maybe now expand on to Prem. So there's going to be some interesting dynamics. >> Without hopping management frameworks, absolutely. >> And this is really critical, you just nailed it. Stateful is where ML will shine, if you can then cross the chasma to the on Premise where the workloads can have state sharing. >> Right. >> Scales beautifully. It's a whole other level. >> Right. You're going to the data into the action, or the activity, you're going to have to move the processing to the data, and you want to have nonetheless, a common, seamless management development framework so that you have the choices about where you do those things. >> Absolutely. >> Great stuff. We can do a whole Cube segment just on that. We love talking about these new dynamics going on. We'll see you in CF CupCon coming up in Seattle. Great to have you guys on. Thanks, and congratulations on the relationship between BlueData and Dell EMC and Ready Solutions. This is Cube, with the Ready Solutions here. New York City, talking about big data and the impact, the future of AI, all things stateful, stateless, Cloud and all. It's theCUBE bringing you all the action. Stay with us for more after this short break.

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media, Welcome to theCUBE, good to see you. Great company, and the founders are great. So Jim, talk about the Dell relationship with BlueData. AI and machine learning are driving the signal, so the reason why we partnered with BlueData is, What's the BlueData update, how are you guys and bringing it to Kubernetes with an open source project and the data science community. But, at the end of the day, the person who has to deliver and the system administrators So the data scientist sees the tool... So with containers, you can also speed that deployment So how does the infrastructure play in this, But at the end of the day, the abstraction has to be there What's the state of the analytics today? in the way that you hired them to be productive. and we try and create an environment that all this stuff's going on. that's the key, the intellectual property of our engineers, in the ready solutions. and so you can have them all on the same infrastructure. Kubernetes, and Multi Cloud, and Istio for the first time. but the robustness of the data. and you have the ability to move the workloads I want to test this with you guys, John, Here's the other thing I'll add to it, and personal computing, you focused on getting everybody to We're passed the set-up phase, and I think this is why how to make money in these spaces. we agree with you 100 percent on that. the Kubernetes community has to do a lot more work, And this is really critical, you just nailed it. It's a whole other level. so that you have the choices and the impact, the future of AI,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Anant ChintamaneniPERSON

0.99+

Peter BurrisPERSON

0.99+

Jim FranklinPERSON

0.99+

JohnPERSON

0.99+

BlueDataORGANIZATION

0.99+

DellORGANIZATION

0.99+

PeterPERSON

0.99+

JimPERSON

0.99+

2015DATE

0.99+

New YorkLOCATION

0.99+

100 percentQUANTITY

0.99+

John FurrierPERSON

0.99+

New York CityLOCATION

0.99+

Ready SolutionsORGANIZATION

0.99+

SeattleLOCATION

0.99+

yesterdayDATE

0.99+

Dell EMCORGANIZATION

0.99+

Barclays UKORGANIZATION

0.99+

first timeQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

todayDATE

0.99+

OneQUANTITY

0.98+

bothQUANTITY

0.98+

AWSORGANIZATION

0.98+

this weekDATE

0.97+

CF CupConEVENT

0.97+

oneQUANTITY

0.97+

CassandraPERSON

0.97+

sevenDATE

0.96+

two guestsQUANTITY

0.96+

IsilonORGANIZATION

0.96+

80sDATE

0.96+

NCLORGANIZATION

0.96+

SAPORGANIZATION

0.95+

API 2.0OTHER

0.92+

AnacondaORGANIZATION

0.92+

Cloud Era HadoopTITLE

0.91+

NYCLOCATION

0.91+

HadoopTITLE

0.91+

eight years agoDATE

0.91+

PremORGANIZATION

0.9+

CupflowTITLE

0.89+

PremiseTITLE

0.89+

KubernetesTITLE

0.88+

one more thingQUANTITY

0.88+

IstioORGANIZATION

0.87+

DockerTITLE

0.85+

DockerORGANIZATION

0.85+

CupflowORGANIZATION

0.84+

CubeORGANIZATION

0.83+

last five yearsDATE

0.82+

CloudTITLE

0.8+

KubernetesORGANIZATION

0.8+

Oracle DatabaseORGANIZATION

0.79+

2018DATE

0.79+

CloudsTITLE

0.78+

GCPORGANIZATION

0.77+

theCUBEORGANIZATION

0.76+

Cloud Era Data Impact AwardEVENT

0.74+

CubePERSON

0.73+

DD, Cisco + Han Yang, Cisco | theCUBE NYC 2018


 

>> Live from New York, It's the CUBE! Covering theCUBE, New York City 2018. Brought to you by SiliconANGLE Media and its Ecosystem partners. >> Welcome back to the live CUBE coverage here in New York City for CUBE NYC, #CubeNYC. This coverage of all things data, all things cloud, all things machine learning here in the big data realm. I'm John Furrier and Dave Vellante. We've got two great guests from Cisco. We got DD who is the Vice President of Data Center Marketing at Cisco, and Han Yang who is the Senior Product Manager at Cisco. Guys, welcome to the Cube. Thanks for coming on again. >> Good to see ya. >> Thanks for having us. >> So obviously one of the things that has come up this year at the Big Data Show, used to be called Hadoop World, Strata Data, now it's called, the latest name. And obviously CUBE NYC, we changed from Big Data NYC to CUBE NYC, because there's a lot more going on. I heard hallway conversations around blockchain, cryptocurrency, Kubernetes has been said on theCUBE already at least a dozen times here today, multicloud. So you're seeing the analytical world try to be, in a way, brought into the dynamics around IT infrastructure operations, both cloud and on premises. So interesting dynamics this year, almost a dev ops kind of culture to analytics. This is a new kind of sign from this community. Your thoughts? >> Absolutely, I think data and analytics is one of those things that's pervasive. Every industry, it doesn't matter. Even at Cisco, I know we're going to talk a little more about the new AI and ML workload, but for the last few years, we've been using AI and ML techniques to improve networking, to improve security, to improve collaboration. So it's everywhere. >> You mean internally, in your own IT? >> Internally, yeah. Not just in IT, in the way we're designing our network equipment. We're storing data that's flowing through the data center, flowing in and out of clouds, and using that data to make better predictions for better networking application performance, security, what have you. >> The first topic I want to talk to you guys about is around the data center. Obviously, you do data center marketing, that's where all the action is. The cloud, obviously, has been all the buzz, people going to the cloud, but Andy Jassy's announcement at VMworld really is a validation that we're seeing, for the first time, hybrid multicloud validated. Amazon announced RDS on VMware on-premises. >> That's right. This is the first time Amazon's ever done anything of this magnitude on-premises. So this is a signal from the customers voting with their wallet that on-premises is a dynamic. The data center is where the data is, that's where the main footprint of IT is. This is important. What's the impact of that dynamic, of data center, where the data is with the option of a cloud. How does that impact data, machine learning, and the things that you guys see as relevant? >> I'll start and Han, feel free to chime in here. So I think those boundaries between this is a data center, and this a cloud, and this is campus, and this is the edge, I think those boundaries are going away. Like you said, data center is where the data is. And it's the ability of our customers to be able to capture that data, process it, curate it, and use it for insight to take decision locally. A drone is a data center that flies, and boat is a data center that floats, right? >> And a cloud is a data center that no one sees. >> That's right. So those boundaries are going away. We at Cisco see this as a continuum. It's the edge cloud continuum. The edge is exploding, right? There's just more and more devices, and those devices are cranking out more data than ever before. Like I said, it's the ability of our customers to harness the data to make more meaningful decisions. So Cisco's take on this is the new architectural approach. It starts with the network, because the network is the one piece that connects everything- every device, every edge, every individual, every cloud. There's a lot of data within the network which we're using to make better decisions. >> I've been pretty close with Cisco over the years, since '95 timeframe. I've had hundreds of meetings, some technical, some kind of business. But I've heard that term edge the network many times over the years. This is not a new concept at Cisco. Edge of the network actually means something in Cisco parlance. The edge of the network >> Yeah. >> that the packets are moving around. So again, this is not a new idea at Cisco. It's just materialized itself in a new way. >> It's not, but what's happening is the edge is just now generating so much data, and if you can use that data, convert it into insight and make decisions, that's the exciting thing. And that's why this whole thing about machine learning and artificial intelligence, it's the data that's being generated by these cameras, these sensors. So that's what is really, really interesting. >> Go ahead, please. >> One of our own studies pointed out that by 2021, there will be 847 zettabytes of information out there, but only 1.3 zettabytes will actually ever make it back to the data center. That just means an opportunity for analytics at the edge to make sense of that information before it ever makes it home. >> What were those numbers again? >> I think it was like 847 zettabytes of information. >> And how much makes it back? >> About 1.3. >> Yeah, there you go. So- >> So a huge compression- >> That confirms your research, Dave. >> We've been saying for a while now that most of the data is going to stay at the edge. There's no reason to move it back. The economics don't support it, the latency doesn't make sense. >> The network cost alone is going to kill you. >> That's right. >> I think you really want to collect it, you want to clean it, and you want to correlate it before ever sending it back. Otherwise, sending that information, of useless information, that status is wonderful. Well that's not very valuable. And 99.9 percent, "things are going well." >> Temperature hasn't changed. (laughs) >> If it really goes wrong, that's when you want to alert or send more information. How did it go bad? Why did it go bad? Those are the more insightful things that you want to send back. >> This is not just for IoT. I mean, cat pictures moving between campuses cost money too, so why not just keep them local, right? But the basic concepts of networking. This is what I want to get in my point, too. You guys have some new announcements around UCS and some of the hardware and the gear and the software. What are some of the new announcements that you're announcing here in New York, and what does it mean for customers? Because they want to know not only speeds and feeds. It's a software-driven world. How does the software relate? How does the gear work? What's the management look like? Where's the control plane? Where's the management plane? Give us all the data. >> I think the biggest issues starts from this. Data scientists, their task is to export different data sources, find out the value. But at the same time, IT is somewhat lagging behind. Because as the data scientists go from data source A to data source B, it could be 3 petabytes of difference. IT is like, 3 petabytes? That's only from Monday through Wednesday? That's a huge infrastructure requirement change. So Cisco's way to help the customer is to make sure that we're able to come out with blueprints. Blueprints enabling the IT team to scale, so that the data scientists can work beyond their own laptop. As they work through the petabytes of data that's come in from all these different sources, they're able to collaborate well together and make sense of that information. Only by scaling with IT helping the data scientists to work the scale, that's the only way they can succeed. So that's why we announced a new server. It's called a C480ML. Happens to have 8 GPUs from Nvidia inside helping customers that want to do that deep learning kind of capabilities. >> What are some of the use cases on these as products? It's got some new data capabilities. What are some of the impacts? >> Some of the things that Han just mentioned. For me, I think the biggest differentiation in our solution is things that we put around the box. So the management layer, right? I mean, this is not going to be one server and one data center. It's going to be multiple of them. You're never going to have one data center. You're going to have multiple data centers. And we've got a really cool management tool called Intersight, and this is supported in Intersight, day one. And Intersight also uses machine learning techniques to look at data from multiple data centers. And that's really where the innovation is. Honestly, I think every vendor is bend sheet metal around the latest chipset, and we've done the same. But the real differentiation is how we manage it, how we use the data for more meaningful insight. I think that's where some of our magic is. >> Can you add some code to that, in terms of infrastructure for AI and ML, how is it different than traditional infrastructures? So is the management different? The sheet metal is not different, you're saying. But what are some of those nuances that we should understand. >> I think especially for deep learning, multiple scientists around the world have pointed that if you're able to use GPUs, they're able to run the deep learning frameworks faster by roughly two waters magnitude. So that's part of the reason why, from an infrastructure perspective, we want to bring in that GPUs. But for the IT teams, we didn't want them to just add yet another infrastructure silo just to support AI or ML. Therefore, we wanted to make sure it fits in with a UCS-managed unified architecture, enabling the IT team to scale but without adding more infrastructures and silos just for that new workload. But having that unified architecture, it helps the IT to be more efficient and, at the same time, is better support of the data scientists. >> The other thing I would add is, again, the things around the box. Look, this industry is still pretty nascent. There is lots of start-ups, there is lots of different solutions, and when we build a server like this, we don't just build a server and toss it over the fence to the customer and say "figure it out." No, we've done validated design guides. With Google, with some of the leading vendors in the space to make sure that everything works as we say it would. And so it's all of those integrations, those partnerships, all the way through our systems integrators, to really understand a customer's AI and ML environment and can fine tune it for the environment. >> So is that really where a lot of the innovation comes from? Doing that hard work to say, "yes, it's going to be a solution that's going to work in this environment. Here's what you have to do to ensure best practice," etc.? Is that right? >> So I think some of our blueprints or validated designs is basically enabling the IT team to scale. Scale their stores, scale their CPU, scale their GPU, and scale their network. But do it in a way so that we work with partners like Hortonworks or Cloudera. So that they're able to take advantage of the data lake. And adding in the GPU so they're able to do the deep learning with Tensorflow, with Pytorch, or whatever curated deep learning framework the data scientists need to be able to get value out of those multiple data sources. These are the kind of solutions that we're putting together, making sure our customers are able to get to that business outcome sooner and faster, not just a-- >> Right, so there's innovation at all altitudes. There's the hardware, there's the integrations, there's the management. So it's innovation. >> So not to go too much into the weeds, but I'm curious. As you introduce these alternate processing units, what is the relationship between traditional CPUs and these GPUs? Are you managing them differently, kind of communicating somehow, or are they sort of fenced off architecturally. I wonder if you could describe that. >> We actually want it to be integrated, because by having it separated and fenced off, well that's an IT infrastructure silo. You're not going to have the same security policy or the storage mechanisms. We want it to be unified so it's easier on IT teams to support the data scientists. So therefore, the latest software is able to manage both CPUs and GPUs, as well as having a new file system. Those are the solutions that we're putting forth, so that ARC-IT folks can scale, our data scientists can succeed. >> So IT's managing a logical block. >> That's right. And even for things like inventory management, or going back and adding patches in the event of some security event, it's so much better to have one integrated system rather than silos of management, which we see in the industry. >> So the hard news is basically UCS for AI and ML workloads? >> That's right. This is our first server custom built ground up to support these deep learning, machine learning workloads. We partnered with Nvidia, with Google. We announced earlier this week, and the phone is ringing constantly. >> I don't want to say godbot. I just said it. (laughs) This is basically the power tool for deep learning. >> Absolutely. >> That's how you guys see it. Well, great. Thanks for coming out. Appreciate it, good to see you guys at Cisco. Again, deep learning dedicated technology around the box, not just the box itself. Ecosystem, Nvidia, good call. Those guys really get the hot GPUs out there. Saw those guys last night, great success they're having. They're a key partner with you guys. >> Absolutely. >> Who else is partnering, real quick before we end the segment? >> We've been partnering with software sci, we partner with folks like Anaconda, with their Anaconda Enterprise, which data scientists love to use as their Python data science framework. We're working with Google, with their Kubeflow, which is open source project integrating Tensorflow on top of Kubernetes. And of course we've been working with folks like Caldera as well as Hortonworks to access the data lake from a big data perspective. >> Yeah, I know you guys didn't get a lot of credit. Google Cloud, we were certainly amplifying it. You guys were co-developing the Google Cloud servers with Google. I know they were announcing it, and you guys had Chuck on stage there with Diane Greene, so it was pretty positive. Good integration with Google can make a >> Absolutely. >> Thanks for coming on theCUBE, thanks, we appreciate the commentary. Cisco here on theCUBE. We're in New York City for theCUBE NYC. This is where the world of data is converging in with IT infrastructure, developers, operators, all running analytics for future business. We'll be back with more coverage, after this short break. (upbeat digital music)

Published Date : Sep 12 2018

SUMMARY :

It's the CUBE! Welcome back to the live CUBE coverage here So obviously one of the things that has come up this year but for the last few years, Not just in IT, in the way we're designing is around the data center. and the things that you guys see as relevant? And it's the ability of our customers to It's the edge cloud continuum. The edge of the network that the packets are moving around. is the edge is just now generating so much data, analytics at the edge Yeah, there you go. that most of the data is going to stay at the edge. I think you really want to collect it, (laughs) Those are the more insightful things and the gear and the software. the data scientists to work the scale, What are some of the use cases on these as products? Some of the things that Han just mentioned. So is the management different? it helps the IT to be more efficient in the space to make sure that everything works So is that really where a lot of the data scientists need to be able to get value There's the hardware, there's the integrations, So not to go too much into the weeds, Those are the solutions that we're putting forth, in the event of some security event, and the phone is ringing constantly. This is basically the power tool for deep learning. Those guys really get the hot GPUs out there. to access the data lake from a big data perspective. the Google Cloud servers with Google. This is where the world of data

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

NvidiaORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

Han YangPERSON

0.99+

GoogleORGANIZATION

0.99+

New YorkLOCATION

0.99+

Diane GreenePERSON

0.99+

AmazonORGANIZATION

0.99+

DavePERSON

0.99+

HortonworksORGANIZATION

0.99+

2021DATE

0.99+

New York CityLOCATION

0.99+

Andy JassyPERSON

0.99+

8 GPUsQUANTITY

0.99+

847 zettabytesQUANTITY

0.99+

John FurrierPERSON

0.99+

99.9 percentQUANTITY

0.99+

MondayDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

3 petabytesQUANTITY

0.99+

AnacondaORGANIZATION

0.99+

WednesdayDATE

0.99+

DDPERSON

0.99+

first timeQUANTITY

0.99+

one serverQUANTITY

0.99+

ClouderaORGANIZATION

0.99+

PythonTITLE

0.99+

first topicQUANTITY

0.99+

one pieceQUANTITY

0.99+

VMworldORGANIZATION

0.99+

'95DATE

0.98+

1.3 zettabytesQUANTITY

0.98+

NYCLOCATION

0.98+

bothQUANTITY

0.98+

oneQUANTITY

0.98+

this yearDATE

0.98+

Big Data ShowEVENT

0.98+

CalderaORGANIZATION

0.98+

two watersQUANTITY

0.97+

todayDATE

0.97+

ChuckPERSON

0.97+

OneQUANTITY

0.97+

Big DataORGANIZATION

0.97+

earlier this weekDATE

0.97+

IntersightORGANIZATION

0.97+

hundreds of meetingsQUANTITY

0.97+

CUBEORGANIZATION

0.97+

first serverQUANTITY

0.97+

last nightDATE

0.95+

one data centerQUANTITY

0.94+

UCSORGANIZATION

0.92+

petabytesQUANTITY

0.92+

two great guestsQUANTITY

0.9+

TensorflowTITLE

0.86+

CUBE NYCORGANIZATION

0.86+

HanPERSON

0.85+

#CubeNYCLOCATION

0.83+

Strata DataORGANIZATION

0.83+

KubeflowTITLE

0.82+

Hadoop WorldORGANIZATION

0.81+

2018DATE

0.8+

Stephanie McReynolds, Alation | theCUBE NYC 2018


 

>> Live from New York, It's theCUBE! Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello and welcome back to theCUBE live in New York City, here for CUBE NYC. In conjunct with Strata Conference, Strata Data, Strata Hadoop This is our ninth year covering the big data ecosystem which has evolved into machine learning, A.I., data science, cloud, a lot of great things happening all things data, impacting all businesses I'm John Furrier, your host with Dave Vellante and Peter Burris, Peter is filling in for Dave Vellante. Next guest, Stephanie McReynolds who is the CMO, VP of Marketing for Alation, thanks for joining us. >> Thanks for having me. >> Good to see you. So you guys have a pretty spectacular exhibit here in New York. I want to get to that right away, top story is Attack of the Bots. And you're showing a great demo. Explain what you guys are doing in the show. >> Yah, well it's robot fighting time in our booth, so we brought a little fun to the show floor my kids are.. >> You mean big data is not fun enough? >> Well big data is pretty fun but occasionally you got to get your geek battle on there so we're having fun with robots but I think the real story in the Alation booth is about the product and how machine learning data catalogs are helping a whole variety of users in the organization everything from improving analyst productivity and even some business user productivity of data to then really supporting data scientists in their work by helping them to distribute their data products through a data catalog. >> You guys are one of the new guard companies that are doing things that make it really easy for people who want to use data, practitioners that the average data citizen has been called, or people who want productivity. Not necessarily the hardcore, setting up clusters, really kind of like the big data user. What's that market look like right now, has it met your expectations, how's business, what's the update? >> Yah, I think we have a strong perspective that for us to close the final mile and get to real value out of the data, it's a human challenge, there's a trust gap with managers. Today on stage over at STRATA it was interesting because Google had a speaker and it wasn't their chief data officer it was their chief decision scientist and I think that reflects what that final mile is is that making decisions and it's the trust gap that managers have with data because they don't know how the insides are coming to them, what are all the details underneath. In order to be able to trust decisions you have to understand who processed the data, what decision making criteria did they use, was this data governed well, are we introducing some bias into our algorithms, and can that be controlled? And so Alation becomes a platform for supporting getting answers to those issues. And then there's plenty of other companies that are optimizing the performance of those QUERYS and the storage of that data, but we're trying to really to close that trust gap. >> It's very interesting because from a management standpoint we're trying to do more evidence based management. So there's a major trend in board rooms, and executive offices to try to find ways to acculturate the executive team to using data, evidence based management healthcare now being applied to a lot of other domains. We've also historically had a situation where the people who focused or worked with the data was a relatively small coterie of individuals that crave these crazy systems to try to bring those two together. It sounds like what you're doing, and I really like the idea of the data scientists, being able to create data products that then can be distributed. It sounds like you're trying to look at data as an asset to be created, to be distributed so they can be more easily used by more people in your organization, have we got that right? >> Absolutely. So we're now seeing we're in just over a hundred production implementations of Alation, at large enterprises, and we're now seeing those production implementations get into the thousands of users. So this is going beyond those data specialists. Beyond the unicorn data scientists that understand the systems and math and technology. >> And business. >> And business, right. In business. So what we're seeing now is that a data catalog can be a point of collaboration across those different audiences in an enterprise. So whereas three years ago some of our initial customers kept the data catalog implementations small, right. They were getting access to the specialists to this catalog and asked them to certify data assets for others, what were starting to see is a proliferation of creation of self service data assets, a certification process that now is enterprise-wide, and thousands of users in these organizations. So Ebay has over a thousand weekly logins, Munich Reinsurance was on stage yesterday, their head of data engineering said they have 2,000 users on Alation at this point on their data lake, Fiserv is going to speak on Thursday and they're getting up to those numbers as well, so we see some really solid organizations that are solving medical, pharmaceutical issues, right, the largest re insurer in the world leading tech companies, starting to adopt a data catalog as a foundation for how their going to make those data driven decisions in the organization. >> Talk about how the product works because essentially you're bringing kind of the decision scientists, for lack of a better word, and productivity worker, almost like a business office suite concept, as a SAS, so you got a SAS model that says "Hey you want to play with data, use it but you have to do some front end work." Take us through how you guys roll out the platform, how are your customers consuming the service, take us through the engagement with customers. >> I think for customers, the most interesting part of this product is that it displays itself as an application that anyone can use, right? So there's a super familiar search interface that, rather than bringing back webpages, allows you to search for data assets in your organization. If you want more information on that data asset you click on those search results and you can see all of the information of how that data has been used in the organization, as well as the technical details and the technical metadata. And I think what's even more powerful is we actually have a recommendation engine that recommends data assets to the user. And that can be plugged into Tablo and Salesworth, Einstein Analytics, and a whole variety of other data science tools like Data Haiku that you might be using in your organization. So this looks like a very easy to use application that folks are familiar with that you just need a web browser to access, but on the backend, the hard work that's happening is the automation that we do with the platform. So by going out and crawling these source systems and looking at not just the technical descriptions of data, the metadata that exists, but then being able to understand by parsing the sequel weblogs, how that data is actually being used in the organization. We call it behavior I.O. by looking at the behaviors of how that data's being used, from those logs, we can actually give you a really good sense of how that data should be used in the future or where you might have gaps in governing that data or how you might want to reorient your storage or compute infrastructure to support the type of analytics that are actually being executed by real humans in your organization. And that's eye opening to a lot of I.T. sources. >> So you're providing insights to the data usage so that the business could get optimized for whether it's I.T. footprint component, or kinds of use cases, is that kind of how it's working? >> So what's interesting is the optimization actually happens in a pretty automated way, because we can make recommendations to those consumers of data of how they want to navigate the system. Kind of like Google makes recommendations as you browse the web, right? >> If you misspell something, "Oh did you mean this", kind of thing? >> "Did you mean this, might you also be interested in this", right? It's kind of a cross between Google and Amazon. Others like you may have used these other data assets in the past to determine revenue for that particular region, have you thought about using this filter, have you thought about using this join, did you know that you're trying to do analysis that maybe the sales ops guy has already done, and here's the certified report, why don't you just start with that? We're seeing a lot of reuse in organizations, wherein the past I think as an industry when Tablo and Click and all these B.I tools that were very self service oriented started to take off it was all about democratizing visualization by letting every user do their own thing and now we're realizing to get speed and accuracy and efficiency and effectiveness maybe there's more reuse of the work we've already done in existing data assets and by recommending those and expanding the data literacy around the interpretation of those, you might actually close this trust gap with the data. >> But there's one really important point that you raised, and I want to come back to it, and that is this notion of bias. So you know, Alation knows something about the data, knows a lot about the metadata, so therefore, I don't want to say understands, but it's capable of categorizing data in that way. And you're also able to look at the usage of that data by parsing some of sequel statements and then making a determination of the data as it's identified is appropriately being used based on how people are actually applying it so you can identify potential bias or potential misuse or whatever else it might be. That is an incredibly important thing. As you know John, we had an event last night and one of the things that popped up is how do you deal with emergence in data science in A.I, etc. And what methods do you put in place to actually ensure that the governance model can be extended to understand how those things are potentially in a very soft way, corrupting the use of the data. So could you spend a little bit more time talking about that because it's something a lot of people are interested in, quite frankly we don't know about a lot of tools that are doing that kind of work right now. It's an important point. >> I think the traditional viewpoint was if we just can manage the data we will be able to have a govern system. So if we control the inputs then well have a safe environment, and that was kind of like the classic single source of truth, data warehouse type model. >> Stewards of the data. >> What we're seeing is with the proliferation of sources of data and how quickly with IOT and new modern sources, data is getting created, you're not able to manage data at that point of that entry point. And it's not just about systems, it's about individuals that go on the web and find a dataset and then load it into a corporate database, right? Or you merge an Excel file with something that in a database. And so I think what we see happening, not only when you look at bias but if you look at some of the new regulations like [Inaudible] >> Sure. Ownership, [Inaudible] >> The logic that you're using to process that data, the algorithm itself can be biased, if you have a biased training data site that you feed it into a machine learning algorithm, the algorithm itself is going to be biased. And so the control point in this world where data is proliferating and we're not sure we can control that entirely, becomes the logic embedded in the algorithm. Even if that's a simple sequel statement that's feeding a report. And so Alation is able to introspect that sequel and highlight that maybe there is bias at work and how this algorithm is composed. So with GDPR the consumer owns their own data, if they want to pull it out from a training data set, you got to rerun that algorithm without that consumer data and that's your control point then going forward for the organization on different governance issues that pop up. >> Talk about the psychology of the user base because one of the things that shifted in the data world is a few stewards of data managed everything, now you've got a model where literally thousands of people of an organization could be users, productivity users, so you get a social component in here that people know who's doing data work, which in a way, creates a new persona or class of worker. A non techy worker. >> Yeah. It's interesting if you think about moving access to the data and moving the individuals that are creating algorithms out to a broader user group, what's important, you have to make sure that you're educating and training and sharing knowledge with that democratized audience, right? And to be able to do that you kind of want to work with human psychology, right? You want to be able to give people guidance in the course of their work rather than have them memorize a set of rules and try to remember to apply those. If you had a specialist group you can kind of control and force them to memorize and then apply, the more modern approach is to say "look, with some of these machine learning techniques that we have, why don't we make a recommendation." What you're going to do is introduce bias into that calculation. >> And we're capturing that information as you use the data. >> Well were also making a recommendation to say "Hey do you know you're doing this? Maybe you don't want to do that." Most people are using the data are not bad actors. They just can't remember all the rule sets to apply. So what were trying to do is cut someone behaviorally in the act before they make that mistake and say hey just a bit of a reminder, a bit of a coaching moment, did you know what you're doing? Maybe you can think of another approach to this. And we've found that many organizations that changes the discussion around data governance. It's no longer this top down constraint to finding insight, which frustrates an audience, is trying to use that data. It's more like a coach helping you improve and then social aspect of wanting to contribute to the system comes into play and people start communicating, collaborating, the platform and curating information a little bit. >> I remember when Microsoft Excel came out, the spreadsheet, or Lotus 123, oh my God, people are going to use these amazing things with spreadsheets, they did. You're taking a similar approach with analytics, much bigger surface area of work to kind of attack from a data perspective, but in a way kind of the same kind of concept, put the hands of the users, have the data in their hands so to speak. >> Yeah, enable everyone to make data driven decisions. But make sure that they're interpreting that data in the right way, right? Give them enough guidance, don't let them just kind of attack the wild west and fair it out. >> Well looking back at the Microsoft Excel spreadsheet example, I remember when a finance department would send a formatted spreadsheet with all the rules for how to use it out of 50 different groups around the world, and everyone figured out that you can go in and manipulate the macros and deliver any results they want. And so it's that same notion, you have to know something about that, but this site, in many respects Stephanie you're describing a data governance model that really is more truly governance, that if we think about a data asset it's how do we mediate a lot of different claims against that set of data so that its used appropriately, so its not corrupted, so that it doesn't effect other people, but very importantly so that the out6comes are easier to agree upon because there's some trust and there's some valid behaviors and there's some verification in the flow of the data utilization. >> And where we give voice to a number of different constituencies. Because business opinions from different departments can run slightly counter to one another. There can be friction in how to use particular data assets in the business depending on the lens that you have in that business and so what were trying to do is surface those different perspectives, give them voice, allow those constituencies to work that out in a platform that captures that debate, captures that knowledge, makes that debate a knowledge of foundation to build upon so in many ways its kind of like the scientific method, right? As a scientist I publish a paper. >> Get peer reviewed. >> Get peer reviewed, let other people weigh in. >> And it becomes part of the canon of knowledge. >> And it becomes part of the canon. And in the scientific community over the last several years you see that folks are publishing their data sets out publicly, why can't an enterprise do the same thing internally for different business groups internally. Take the same approach. Allow others to weigh in. It gets them better insights and it gets them more trust in that foundation. >> You get collective intelligence from the user base to help come in and make the data smarter and sharper. >> Yeah and have reusable assets that you can then build upon to find the higher level insights. Don't run the same report that a hundred people in the organization have already run. >> So the final question for you. As you guys are emerging, starting to do really well, you have a unique approach, honestly we think it fits in kind of the new guard of analytics, a productivity worker with data, which is we think is going to be a huge persona, where are you guys winning, and why are you winning with your customer base? What are some things that are resonating as you go in and engage with prospects and customers and existing customers? What are they attracted to, what are they like, and why are you beating the competition in your sales and opportunities? >> I think this concept of a more agile, grassroots approach to data governance is a breath of fresh air for anyone who spend their career in the data space. Were at a turning point in industry where you're now seeing chief decision scientists, chief data officers, chief analytic officers take a leadership role in organizations. Munich Reinsurance is using their data team to actually invest and hold new arms of their business. That's how they're pushing the envelope on leadership in the insurance space and were seeing that across our install base. Alation becomes this knowledge repository for all of those mines in the organization, and encourages a community to be built around data and insightful questions of data. And in that way the whole organization raises to the next level and I think its that vision of what can be created internally, how we can move away from just claiming that were a big data organization and really starting to see the impact of how new business models can be creative in these data assets, that's exciting to our customer base. >> Well congratulations. A hot start up. Alation here on theCUBE in New York City for cubeNYC. Changing the game on analytics, bringing a breath of fresh air to hands of the users. A new persona developing. Congratulations, great to have you. Stephanie McReynolds. Its the cube. Stay with us for more live coverage, day one of two days live in New York City. We'll be right back.

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media the CMO, VP of Marketing for Alation, thanks for joining us. So you guys have a pretty spectacular so we brought a little fun to the show floor in the Alation booth is about the product You guys are one of the new guard companies is that making decisions and it's the trust gap and I really like the idea of the data scientists, production implementations get into the thousands of users. and asked them to certify data assets for others, kind of the decision scientists, gaps in governing that data or how you might want to so that the business could get optimized as you browse the web, right? in the past to determine revenue for that particular region, and one of the things that popped up is how do you deal and that was kind of like the classic it's about individuals that go on the web and find a dataset the algorithm itself is going to be biased. because one of the things that shifted in the data world And to be able to do that you kind of They just can't remember all the rule sets to apply. have the data in their hands so to speak. that data in the right way, right? and everyone figured out that you can go in in the business depending on the lens that you have And in the scientific community over the last several years You get collective intelligence from the user base Yeah and have reusable assets that you can then build upon and why are you winning with your customer base? and really starting to see the impact of how new business bringing a breath of fresh air to hands of the users.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Stephanie McReynoldsPERSON

0.99+

AmazonORGANIZATION

0.99+

Dave VellantePERSON

0.99+

JohnPERSON

0.99+

Peter BurrisPERSON

0.99+

GoogleORGANIZATION

0.99+

StephaniePERSON

0.99+

ThursdayDATE

0.99+

New YorkLOCATION

0.99+

John FurrierPERSON

0.99+

50 different groupsQUANTITY

0.99+

PeterPERSON

0.99+

New York CityLOCATION

0.99+

EbayORGANIZATION

0.99+

2,000 usersQUANTITY

0.99+

ExcelTITLE

0.99+

Attack of the BotsTITLE

0.99+

thousandsQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

two daysQUANTITY

0.99+

yesterdayDATE

0.99+

ninth yearQUANTITY

0.99+

twoQUANTITY

0.99+

STRATAORGANIZATION

0.99+

TodayDATE

0.99+

FiservORGANIZATION

0.99+

last nightDATE

0.99+

three years agoDATE

0.99+

AlationPERSON

0.99+

NYCLOCATION

0.98+

Lotus 123TITLE

0.98+

Munich ReinsuranceORGANIZATION

0.98+

oneQUANTITY

0.98+

GDPRTITLE

0.97+

AlationORGANIZATION

0.96+

MicrosoftORGANIZATION

0.94+

SASORGANIZATION

0.94+

over a thousand weekly loginsQUANTITY

0.91+

theCUBEORGANIZATION

0.9+

Strata ConferenceEVENT

0.89+

single sourceQUANTITY

0.86+

thousands of peopleQUANTITY

0.86+

thousands of usersQUANTITY

0.84+

TabloORGANIZATION

0.83+

day oneQUANTITY

0.78+

2018EVENT

0.75+

CUBEORGANIZATION

0.75+

SalesworthORGANIZATION

0.74+

Einstein AnalyticsORGANIZATION

0.73+

TabloTITLE

0.73+

Strata HadoopEVENT

0.73+

a hundred peopleQUANTITY

0.7+

2018DATE

0.66+

pointQUANTITY

0.63+

yearsDATE

0.63+

AlationLOCATION

0.62+

ClickORGANIZATION

0.62+

Munich ReinsuranceTITLE

0.6+

over a hundredQUANTITY

0.59+

DataORGANIZATION

0.58+

Strata DataEVENT

0.57+

lastDATE

0.55+

HaikuTITLE

0.47+

Kickoff | theCUBE NYC 2018


 

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AppleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Diane GreenePERSON

0.99+

GoogleORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JohnPERSON

0.99+

AlibabaORGANIZATION

0.99+

DavePERSON

0.99+

Dave VellantePERSON

0.99+

Jeff HammerbacherPERSON

0.99+

$30QUANTITY

0.99+

New YorkLOCATION

0.99+

2010DATE

0.99+

IBMORGANIZATION

0.99+

Doug CuttingPERSON

0.99+

Mike OlsonPERSON

0.99+

HortonworksORGANIZATION

0.99+

DallasLOCATION

0.99+

O'ReillyORGANIZATION

0.99+

YahooORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

fiveQUANTITY

0.99+

AWSORGANIZATION

0.99+

Abi MehdaPERSON

0.99+

John FurrierPERSON

0.99+

New York CityLOCATION

0.99+

$2.5 billionQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

MapRORGANIZATION

0.99+

Amr AwadallahPERSON

0.99+

$40 billionQUANTITY

0.99+

17 employeesQUANTITY

0.99+

VMworldORGANIZATION

0.99+

TodayDATE

0.99+

ImpalaORGANIZATION

0.99+

Nine yearsQUANTITY

0.99+

four years agoDATE

0.98+

last nightDATE

0.98+

last decadeDATE

0.98+

Strata Data ConferenceEVENT

0.98+

Strata ConferenceEVENT

0.98+

Hadoop SummitEVENT

0.98+

ninth yearQUANTITY

0.98+

Four years agoDATE

0.98+

two worldsQUANTITY

0.97+

five companiesQUANTITY

0.97+

todayDATE

0.97+

Strata HadoopEVENT

0.97+

Hadoop WorldEVENT

0.96+

CUBEORGANIZATION

0.96+

Google NextORGANIZATION

0.95+

TwitterORGANIZATION

0.95+

this yearDATE

0.95+

SparkORGANIZATION

0.95+

USLOCATION

0.94+

CUBENYCEVENT

0.94+

Strata O'ReillyORGANIZATION

0.93+

next decadeDATE

0.93+

David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018


 

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

George GilbertPERSON

0.99+

David AbercrombiePERSON

0.99+

Michael NixonPERSON

0.99+

MichaelPERSON

0.99+

JuneDATE

0.99+

twoQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

San JoseLOCATION

0.99+

ScalaTITLE

0.99+

firstQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

five-minuteQUANTITY

0.99+

SnowflakeTITLE

0.99+

ChristmasEVENT

0.98+

Strata Data ConferenceEVENT

0.98+

three-layerQUANTITY

0.98+

first dayQUANTITY

0.98+

a dozenQUANTITY

0.98+

two petabytesQUANTITY

0.97+

SharethroughORGANIZATION

0.97+

JSONTITLE

0.97+

SQLTITLE

0.96+

one placeQUANTITY

0.95+

six monthsQUANTITY

0.94+

Forager Tasting Room & EateryORGANIZATION

0.91+

todayDATE

0.89+

SnowflakeORGANIZATION

0.87+

SparkTITLE

0.87+

12 billion impressions a monthQUANTITY

0.87+

Machine LearningTITLE

0.84+

Big DataORGANIZATION

0.84+

billions of impressionsQUANTITY

0.8+

CuBOLTITLE

0.79+

Big Data SV 2018EVENT

0.77+

onceQUANTITY

0.72+

theCUBEORGANIZATION

0.63+

JSONsTITLE

0.61+

timesQUANTITY

0.55+

Satyen Sangani, Alation | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. (upbeat music) >> Welcome back to theCUBE, I'm Lisa Martin with John Furrier. We are covering our second day of our event Big Data SV. We've had some great conversations, John, yesterday, today as well. Really looking at Big Data, digital transformation, Big Data, plus data science, lots of opportunity. We're excited to welcome back to theCUBE an alumni, Satyen Sangani, the co-founder and CEO of Alation. Welcome back! >> Thank you, it's wonderful to be here again. >> So you guys finish up your fiscal year end of December 2017, where in the first quarter of 2018. You guys had some really strong results, really strong momentum. >> Yeah. >> Tell us what's going on at Alation, how are you pulling this momentum through 2018. >> Well, I think we have had an enterprise focused business historically, because we solve a very complicated problem for very big enterprises, and so, in the last quarter we added customers like American Express, PepsiCo, Roche. And with huge expansions from our existing customers, some of whom, over the course of a year, I think went 12 X from an initial base. And so, we found some just incredible momentum in Q4 and for us that was a phenomenal cap to a great year. >> What about the platform you guys are doing? Can you just take a minute to explain what Alation does again just to refresh where you are on the product side? You mentioned some new accounts, some new use cases. >> Yeah. >> What's the update? Take a minute, talk about the update. >> Absolutely, so, you certainly know, John, but Alation's a data catalog and a data catalog essentially, you can think of it as Yelp or Amazon for data and information side of the enterprise. So if you think about how many different databases there are, how many different reports there are, how many different BI tools there are, how many different APIs there are, how many different algorithms there are, it's pretty dizzying for the average analyst. It's pretty dizzying for the average CIO. It's pretty dizzying for the average chief data officer. And particularly, inside of Fortune 500s where you have hundreds of thousands of databases. You have a situation where people just have too much signal or too much noise, not enough signal. And so what we do is we provide this Yelp for that information. You can come to Alation as a catalog. You can do a search on revenue 2017. You'll get all of the reports, all of the dashboards, all of the tables, all of the people that you might need to be able to find. And that gives you a single place of reference, so you can understand what you've got and what can answer your questions. >> What's interesting is, first of all, I love data. We're data driven, we're geeks on data. But when I start talking to folks that are outside the geek community or nerd community, you say data and they go, "Oh," because they cringe and they say, "Facebook." They see that data issues there. GDPR, data nightmare, where's the store, you got to manage it. And then, people are actually using data, so they're realizing how hard (laughs) it is. >> Yeah >> How much data do we have? So it's kind of like a tropic disillusionment, if you will. Now they got to get their hands on it. They've got to put it to work. >> Yeah. >> And they know that So, it's now becoming really hard (laughs) in their mind. This is business people. >> Yeah. >> They have data everywhere. How do you guys talk to that customer? Because, if you don't have quality data, if you don't have data you can trust, if you don't have the right people, it's hard to get it going. >> Yeah. >> How do you guys solve that problem and how do you talk to customers? >> So we talk a lot about data literacy. There is a lot of data in this world and that data is just emblematic of all of the stuff that's going on in this world. There's lots of systems, there's lots of complexity and the data, basically, just is about that complexity. Whether it's weblogs, or sensors, or the like. And so, you can either run away from that data, and say, "Look, I'm going to not, "I'm going to bury my head in the sand. "I'm going to be a business. "I'm just going to forget about that data stuff." And that's certainly a way to go. >> John: Yeah. >> It's a way to go away. >> Not a good outlook. >> I was going to say, is that a way of going out of business? >> Or, you can basically train, it's a human resources problem fundamentally. You've got to train your people to understand how to use data, to become data literate. And that's what our software is all about. That's what we're all about as a company. And so, we have a pretty high bar for what we think we do as a business and we're this far into that. Which is, we think we're training people to use data better. How do you learn to think scientifically? How do you go use data to make better decisions? How do you build a data driven culture? Those are the sorts of problems that I'm excited to work on. >> Alright, now take me through how you guys play out in an engagement with the customer. So okay, that's cool, you guys can come in, we're getting data literate, we understand we need to use data. Where are you guys winning? Where are you guys seeing some visibility, both in terms of the traction of the usage of the product, the use cases? Where is it kind of coming together for you guys? >> Yeah, so we literally, we have a mantra. I think any early stage company basically wins because they can focus on doing a couple of things really well. And for us, we basically do three things. We allow people to find data. We allow people to understand the data that they find. And we allow them to trust the data that they see. And so if I have a question, the first place I start is, typically, Google. I'll go there and I'll try to find whatever it is that I'm looking for. Maybe I'm looking for a Mediterranean restaurant on 1st Street in San Jose. If I'm going to go do that, I'm going to do that search and I'm going to find the thing that I'm looking for, and then I'm going to figure out, out of the possible options, which one do I want to go to. And then I'll figure out whether or not the one that has seven ratings is the one that I trust more than the one that has two. Well, data is no different. You're going to have to find the data sets. And inside of companies, there could be 20 different reports and there could be 20 different people who have information, and so you're going to trust those people through having context and understanding. >> So, trust, people, collaboration. You mentioned some big brands that you guys added towards the end of calendar 2017. How do you facilitate these conversations with maybe the chief data officer. As we know, in large enterprises, there's still a lot of ownership over data silos. >> Satyen: Yep. >> What is that conversation like, as you say on your website, "The first data catalog designed for collaboration"? How do you help these organizations as large as Coca-Cola understand where all the data are and enable the human resources to extract values, and find it, understand it, and trust it? >> Yeah, so we have a very simple hypothesis, which is, look, people fundamentally have questions. They're fundamentally curious. So, what you need to do as a chief data officer, as a chief information officer, is really figure out how to unlock that curiosity. Start with the most popular data sets. Start with the most popular systems. Start with the business people who have the most curiosity and the most demand for information. And oh, by the way, we can measure that. Which is the magical thing that we do. So we can come in and say, "Look, "we look at the logs inside of your systems to know "which people are using which data sets, "which sources are most popular, which areas are hot." Just like a social network might do. And so, just like you can say, "Okay, these are the trending restaurants." We can say, "These are the trending data sets." And that curiosity allows people to know, what data should I document first? What data should I make available first? What data do I improve the data quality over first? What data do I govern first? And so, in a world where you've got tons of signal, tons of systems, it's totally dizzying to figure out where you should start. But what we do is, we go these chief data officers and say, "Look, we can give you a tool and a catalyst so "that you know where to go, "what questions to answer, who to serve first." And you can use that to expand to other groups in the company. >> And this is interesting, a lot of people you mentioned social networks, use data to optimize for something, and in the case of Facebook, they they use my data to target ads for me. You're using data to actually say, "This is how people are using the data." So you're using data for data. (laughs) >> That's right. >> So you're saying-- >> Satyen: We're measuring how you can use data. >> And that's interesting because, I hear a lot of stories like, we bought a tool, we never used it. >> Yep. >> Or people didn't like the UI, just kind of falls on the side. You're looking at it and saying, "Let's get it out there and let's see who's using the data." And then, are you doubling down? What happens? Do I get a little star, do I get a reputation point, am I being flagged to HR as a power user? How are you guys treating that gamification in this way? It's interesting, I mean, what happens? Do I become like-- >> Yeah, so it's funny because, when you think about search, how do you figure out that something's good? So what Google did is, they came along and they've said, "We've got PageRank." What we're going to do is we're going to say, "The pages that are the best pages are the ones "that people link to most often." Well, we can do the same thing for data. The data sources that are the most useful ones are the people that are used most often. Now on top of that, you can say, "We're going to have experts put ratings," which we do. And you can say people can contribute knowledge and reviews of how this data set can be used. And people can contribute queries and reports on top of those data sets. And all of that gives you this really rich graph, this rich social graph, so that now when I look at something it doesn't look like Greek. It looks like, "Oh, well I know Lisa used this data set, "and then John used it "and so at least it must answer some questions "that are really intelligent about the media business "or about the software business. "And so that can be really useful for me "if I have no clue as to what I'm looking at." >> So the problem that you-- >> It's on how you demystify it through the social connections. >> So the problem that you solve, if what I hear you correctly, is that you make it easy to get the data. So there's some ease of use piece of it, >> Yep. >> cataloging. And then as you get people using it, this is where you take the data literacy and go into operationalizing data. >> Satyen: That's right. >> So this seems to be the challenge. So, if I'm a customer and I have a problem, the profile of your target customer or who your customers are, people who need to expand and operationalize data, how would you talk about it? >> Yeah, so it's really interesting. We talk about, one of our customers called us, sort of, the social network for nerds inside of an enterprise. And I think for me that's a compliment. (John laughing) But what I took from that, and when I explained the business of Alation, we start with those individuals who are data literate. The data scientists, the data engineers, the data stewards, the chief data officer. But those people have the knowledge and the context to then explain data to other people inside of that same institution. So in the same way that Facebook started with Harvard, and then went to the rest of the Ivies, and then went to the rest of the top 20 schools, and then ultimately to mom, and dad, and grandma, and grandpa. We're doing the exact same thing with data. We start with the folks that are data literate, we expand from there to a broader audience of people that don't necessarily have data in their titles, but have curiosity and questions. >> I like that on the curiosity side. You spent some time up at Strata Data. I'm curious, what are some of the things you're hearing from customers, maybe partners? Everyone used to talk about Hadoop, it was this big thing. And then there was a creation of data lakes, and swampiness, and all these things that are sort of becoming more complex in an organization. And with the rise of myriad data sources, the velocity, the volume, how do you help an enterprise understand and be able to catalog data from so many different sources? Is it that same principle that you just talked about in terms of, let's start with the lowest hanging fruit, start making the impact there and then grow it as we can? Or is an enterprise needs to be competitive and move really, really quickly? I guess, what's the process? >> How do you start? >> Right. >> What do people do? >> Yes! >> So it's interesting, what we find is multiple ways of starting with multiple different types of customers. And so, we have some customers that say, "Look, we've got a big, we've got Teradata, "and we've got some Hadoop, "and we've got some stuff on Amazon, "and we want to connect it all." And those customers do get started, and they start with hundreds of users, in some case, they start with thousands of users day one, and they just go Big Bang. And interestingly enough, we can get those customers enabled in matters of weeks or months to go do that. We have other customers that say, "Look, we're going to start with a team of 10 people "and we're going to see how it grows from there." And, we can accommodate either model or either approach. From our prospective, you just have to have the resources and the investment corresponding to what you're trying to do. If you're going to say, "Look, we're going to have, two dollars of budget, and we're not going to have the human resources, and the stewardship resources behind it." It's going to be hard to do the Big Bang. But if you're going to put the appropriate resources up behind it, you can do a lot of good. >> So, you can really facilitate the whole go big or go home approach, as as well as the let's start small think fast approach. >> That's right, and we always, actually ironically, recommend the latter. >> Let's start small, think fast, yeah. >> Because everybody's got a bigger appetite than they do the ability to execute. And what's great about the tool, and what I tell our customers and our employees all day long is, there's only metric I track. So year over year, for our business, we basically grow in accounts by net of churn by 55%. Year over year, and that's actually up from the prior year. And so from my perspective-- >> And what does that mean? >> So what that means is, the same customer gave us 55 cents more on the dollar than they did the prior year. Now that's best in class for most software businesses that I've heard. But what matters to me is not so much that growth rate in and of itself. What it means to me is this, that nobody's come along and says, "I've mastered my data. "I understand all of the information side of my company. "Every person knows everything there is to know." That's never been said. So if we're solving a problem where customers are saying, "Look, we get, and we can find, and understand, "and trust data, and we can do that better last year "than we did this year, and we can do it even more "with more people," we're going to be successful. >> What I like about what you're doing is, you're bringing an element of operationalizing data for literacy and for usage. But you're really bringing this notion of a humanizing element to it. Where you see it in security, you see it in emerging ecosystems. Where there's a community of data people who know how hard it is and was, and it seems to be getting easier. But the tsunami of new data coming in, IOT data, whatever, and new regulators like GDPR. These are all more surface area problems. But there's a community coming together. How have you guys seen your product create community? Have you seen any data on that, 'cause it sounds like, as people get networked together, the natural outcome of that is possibly usage you attract. But is there a community vibe that you're seeing? Is there an internal collaboration where they sit, they're having meet ups, they're having lunches. There's a social aspect in a human aspect. >> No, it's humanal, no, it's amazing. So in really subtle but really, really powerful ways. So one thing that we do for every single data source or every single report that we document, we just put who are the top users of this particular thing. So really subtly, day one, you're like, "I want to go find a report. "I don't even know "where to go inside of this really mysterious system". Postulation, you're able to say, "Well, I don't know where to go, but at least I can go call up John or Lisa," and say, "Hey, what is it that we know about this particular thing?" And I didn't have to know them. I just had to know that they had this report and they had this intelligence. So by just discovering people in who they are, you pick up on what people can know. >> So people of the new Google results, so you mentioned Google PageRank, which is web pages and relevance. You're taking a much more people approach to relevance. >> Satyen: That's right. >> To the data itself. >> That's right, and that builds community in very, very clear ways, because people have curiosity. Other people are in the mechanism why in which they satisfy that curiosity. And so that community builds automatically. >> They pay it forward, they know who to ask help for. >> That's right. >> Interesting. >> That's right. >> Last question, Satyen. The tag line, first data catalog designed for collaboration, is there a customer that comes to mind to you as really one that articulates that point exactly? Where Alation has come in and really kicked open the door, in terms of facilitating collaboration. >> Oh, absolutely. I was literally, this morning talking to one of our customers, Munich Reinsurance, largest reinsurance customer or company in the world. Their chief data officer said, "Look, three years ago, "we started with 10 people working on data. "Today, we've got hundreds. "Our aspiration is to get to thousands." We have three things that we do. One is, we actually discover insights. It's actually the smallest part of what we do. The second thing that we do is, we enable people to use data. And the third thing that we do is, drive a data driven culture. And for us, it's all about scaling knowledge, to centers in China, to centers in North America, to centers in Australia. And they've been doing that at scale. And they go to each of their people and they say, "Are you a data black belt, are you a data novice?" It's kind of like skiing. Are you blue diamond or a black diamond. >> Always ski in pairs (laughs) >> That's right. >> And they do ski in pairs. And what they end up ultimately doing is saying, "Look, we're going to train all of our workforce to become better, so that in three, 10 years, we're recognized as one of the most innovative insurance companies in the world." Three years ago, that was not the case. >> Process improvement at a whole other level. My final question for you is, for the folks watching or the folks that are going to watch this video, that could be a potential customer of yours, what are they feeling? If I'm the customer, what smoke signals am I seeing that say, I need to call Alation? What are some of the things that you've found that would tell a potential customer that they should be talkin' to you guys? >> Look, I think that they've got to throw out the old playbook. And this was a point that was made by some folks at a conference that I was at earlier this week. But they basically were saying, "Look, the DLNA's PlayBook was all about providing the right answer." Forget about that. Just allow people to ask the right questions. And if you let people's curiosity guide them, people are industrious, and ambitious, and innovative enough to go figure out what they need to go do. But if you see this as a world of control, where I'm going to just figure out what people should know and tell them what they're going to go know. that's going to be a pretty, a poor career to go choose because data's all about, sort of, freedom and innovation and understanding. And we're trying to push that along. >> Satyen, thanks so much for stopping by >> Thank you. >> and sharing how you guys are helping organizations, enterprises unlock data curiosity. We appreciate your time. >> I appreciate the time too. >> Thank you. >> And thanks John! >> And thank you. >> Thanks for co-hosting with me. For John Furrier, I'm Lisa Martin, you're watching theCUBE live from our second day of coverage of our event Big Data SV. Stick around, we'll be right back with our next guest after a short break. (upbeat music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media Satyen Sangani, the co-founder and CEO of Alation. So you guys finish up your fiscal year how are you pulling this momentum through 2018. in the last quarter we added customers like What about the platform you guys are doing? Take a minute, talk about the update. And that gives you a single place of reference, you got to manage it. So it's kind of like a tropic disillusionment, if you will. And they know that How do you guys talk to that customer? And so, you can either run away from that data, Those are the sorts of problems that I'm excited to work on. Where is it kind of coming together for you guys? and I'm going to find the thing that I'm looking for, that you guys added towards the end of calendar 2017. And oh, by the way, we can measure that. a lot of people you mentioned social networks, I hear a lot of stories like, we bought a tool, And then, are you doubling down? And all of that gives you this really rich graph, It's on how you demystify it So the problem that you solve, And then as you get people using it, and operationalize data, how would you talk about it? and the context to then explain data the volume, how do you help an enterprise understand have the resources and the investment corresponding to So, you can really facilitate the whole recommend the latter. than they do the ability to execute. What it means to me is this, that nobody's come along the natural outcome of that is possibly usage you attract. And I didn't have to know them. So people of the new Google results, And so that community builds automatically. is there a customer that comes to mind to And the third thing that we do is, And what they end up ultimately doing is saying, that they should be talkin' to you guys? And if you let people's curiosity guide them, and sharing how you guys are helping organizations, Thanks for co-hosting with me.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
PepsiCoORGANIZATION

0.99+

Lisa MartinPERSON

0.99+

Satyen SanganiPERSON

0.99+

JohnPERSON

0.99+

American ExpressORGANIZATION

0.99+

AlationORGANIZATION

0.99+

RocheORGANIZATION

0.99+

SatyenPERSON

0.99+

thousandsQUANTITY

0.99+

LisaPERSON

0.99+

55 centsQUANTITY

0.99+

AustraliaLOCATION

0.99+

AmazonORGANIZATION

0.99+

Coca-ColaORGANIZATION

0.99+

2018DATE

0.99+

10 peopleQUANTITY

0.99+

threeQUANTITY

0.99+

John FurrierPERSON

0.99+

hundredsQUANTITY

0.99+

YelpORGANIZATION

0.99+

San JoseLOCATION

0.99+

ChinaLOCATION

0.99+

HarvardORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

twoQUANTITY

0.99+

TodayDATE

0.99+

2017DATE

0.99+

55%QUANTITY

0.99+

second dayQUANTITY

0.99+

North AmericaLOCATION

0.99+

GoogleORGANIZATION

0.99+

todayDATE

0.99+

two dollarsQUANTITY

0.99+

20 different peopleQUANTITY

0.99+

yesterdayDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

last yearDATE

0.99+

three years agoDATE

0.99+

firstQUANTITY

0.99+

second thingQUANTITY

0.99+

OneQUANTITY

0.99+

oneQUANTITY

0.99+

first quarter of 2018DATE

0.99+

20 different reportsQUANTITY

0.99+

three thingsQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

last quarterDATE

0.98+

DLNAORGANIZATION

0.98+

third thingQUANTITY

0.98+

Three years agoDATE

0.98+

eachQUANTITY

0.98+

singleQUANTITY

0.98+

bothQUANTITY

0.98+

1st StreetLOCATION

0.98+

Big BangEVENT

0.98+

this yearDATE

0.98+

Strata DataORGANIZATION

0.97+

12 XQUANTITY

0.97+

GDPRTITLE

0.97+

seven ratingsQUANTITY

0.96+

AlationPERSON

0.95+

this morningDATE

0.95+

Big Data SV 2018EVENT

0.94+

first dataQUANTITY

0.94+

TeradataORGANIZATION

0.93+

10 yearsQUANTITY

0.93+

Ian Swanson, DataScience.com | Big Data SV 2018


 

(royal music) >> Announcer: John Cleese. >> There's a lot of people out there who have no idea what they're doing, but they have absolutely no idea that they have no idea what they're doing. Those are the ones with the confidence and stupidity who finish up in power. That's why the planet doesn't work. >> Announcer: Knowledgeable, insightful, and a true gentleman. >> The guy at the counter recognized me and said... Are you listening? >> John Furrier: Yes, I'm tweeting away. >> No, you're not. >> I tweet, I'm tweeting away. >> He is kind of rude that way. >> You're on your (bleep) keyboard. >> Announcer: John Cleese joins the Cube alumni. Welcome, John. >> John Cleese: Have you got any phone calls you need to answer? >> John Furrier: Hold on, let me check. >> Announcer: Live from San Jose, it's the Cube, presenting Big Data Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. (busy music) >> Hey, welcome back to the Cube's continuing coverage of our event, Big Data SV. I'm Lisa Martin with my co-host, George Gilbert. We are down the street from the Strata Data Conference. This is our second day, and we've been talking all things big data, cloud data science. We're now excited to be joined by the CEO of a company called Data Science, Ian Swanson. Ian, welcome to the Cube. >> Thanks so much for having me. I mean, it's been a awesome two days so far, and it's great to wrap up my trip here on the show. >> Yeah, so, tell us a little bit about your company, Data Science, what do you guys do? What are some of the key opportunities for you guys in the enterprise market? >> Yeah, absolutely. My company's called datascience.com, and what we do is we offer an enterprise data science platform where data scientists get to use all they tools they love in all the languages, all the libraries, leveraging everything that is open source to build models and put models in production. Then we also provide IT the ability to be able to manage this massive stack of tools that data scientists require, and it all boils down to one thing, and that is, companies need to use the data that they've been storing for years. It's about, how do you put that data into action. We give the tools to data scientists to get that data into action. >> Let's drill down on that a bit. For a while, we thought if we just put all our data in this schema-on-read repository, that would be nirvana. But it wasn't all that transparent, and we recognized we have to sort of go in and structure it somewhat, help us take the next couple steps. >> Ian: Yeah, the journey. >> From this partially curated data sets to something that turns into a model that is actionable. >> That's actually been the theme in the show here at the Strata Data Conference. If we went back years ago, it was, how do we store data. Then it was, how do we not just store and manage, but how do we transform it and get it into a shape that we can actually use it. The theme of this year is how do we get it to that next step, the next step of putting it into action. To layer onto that, data scientists need to access data, yes, but then they need to be able to collaborate, work together, apply many different techniques, machine learning, AI, deep learning, these are all techniques of a data scientist to be able to build a model. But then there's that next step, and the next is, hey, I built this model, how do I actually get it in production? How does it actually get used? Here's the shocking thing. I was at an event where there's 500 data scientists in the audience, and I said, "Stand up if you worked on a model for more than nine months "and it never went into production." 90% of the audience stood up. That's the last mile that we're all still working on, and what's exciting is, we can make it possible today. >> Wanting to drill down into the sort of, it sounds like there's a lot of choice in the tools. But typically, to do a pipeline, you either need well established APIs that everyone understands and plugs together with, or you need an end to end sort of single vendor solution that becomes the sort of collaboration backbone. How are you organized, how are you built? >> This might be self-serving, but datascience.com, we have enterprise data science platform, we recommend a unified platform for data science. Now, that unified platform needs to be highly configurable. You need to make it so that that workbench, you can use any tool that you want. Some data scientists might want to use a hammer, others want to be able to use a screwdriver over here. The power is how configurable, how extensible it is, how open source you can adopt everything. The amazing trends that we've seen have been proprietary solutions going back decades, to now, the rise of open source. Every day, dozens if not hundreds of new machine learning libraries are being released every single day. We've got to give those capabilities to data scientists and make them scale. >> OK, so the, and I think it's pretty easy to see how you would have incorporate new machine learning libraries into a pipeline. But then there's also the tools for data preparation, and for like feature extraction and feature engineering, you might even have some tools that help you with figuring out which algorithm to select. What holds all that together? >> Yeah, so orchestrating the enterprise data science stack is the hardest challenge right now. There has to be a company like us that is the glue, that is not just, do these solutions work together, but also, how do they collaborate, what is that workflow? What are those steps in that process? There's one thing that you might have left out, and that is, model deployment, model interpretation, model management. >> George: That's the black art, yeah. >> That's where this whole thing is going next. That was the exciting thing that I heard in terms of all these discussion with business leaders throughout the last two days is model deployment, model management. >> If I can kind of take this to maybe shift the conversation a little bit to the target audience. Talked a lot about data scientists and needing to enable them. I'm curious about, we just talked with, a couple of guests ago, about the chief data officer. How, you work with enterprises, how common is the chief data officer role today? What are some of the challenges they've got that datascience.com can help them to eliminate? >> Yeah, the CIO and the chief data officer, we have CIOs that have been selecting tools for companies to use, and now the chief data officer is sitting down with the CEO and saying, "How do we actually drive business results?" We work very closely with both of those personas. But on the CDO side, it's really helping them educate their teams on the possibilities of what could be realized with the data at hand, and making sure that IT is enabling the data scientists with the right tools. We supply the tools, but we also like to go in there with our customers and help coach, help educate what is possible, and that helps with the CDO's mission. >> A question along that front. We've been talking about sort of empowering the data scientist, and really, from one end of the modeling life cycle all the way to the end or the deployment, which is currently the hardest part and least well supported. But we also have tons of companies that don't have data science trained people, or who are only modestly familiar. Where do, what do we do with them? How do we get those companies into the mainstream in terms of deploying this? >> I think whether you're a small company or a big company, digital transformation is the mandate. Digital transformation is not just, how do I make a taxi company become Uber, or how do I make a speaker company become Sonos, the smart speaker, it's how do I exploit all the sources of my data to get better and improved operational processes, new business models, increased revenue, reduced operation costs. You could start small, and so we work with plenty of smaller companies. They'll hire a couple data scientists, and they're able to do small quick wins. You don't have to go sit in the basement for a year having something that is the thing, the unicorn in the business, it's small quick wins. Now we, my company, we believe in writing code, trained, educated, data scientists. There are solutions out there that you throw data at, you push a button, it gets an output. It's this magic black box. There's risk in that. Model interpretation, what are the features it's scoring on, there's risk, but those companies are seeing some level of success. We firmly believe, though, in hiring a data science team that is trained, you can start small, two or three, and get some very quick wins. >> I was going to say, those quick wins are essential for survivability, like digital transformation is essential, but it's also, I mean, to survival at a minimum, right? >> Ian: Yes. >> Those quick wins are presumably transformative to an enterprise being able to sustain, and then eventually, or ideally, be able to take market share from their competition. >> That is key for the CDO. The CDO is there pitching what is possible, he's pitching, she's pitching the dream. In order to be able to help visualize what that dream and the outcome could be, we always say, start small, quick wins, then from there, you can build. What you don't want to do is go nine months working on something and you don't know if there's going to be outcome. A lot of data science is trial and error. This is science, we're testing hypotheses. There's not always an outcome that's to be there, so small quick wins is something we highly recommend. >> A question, one of the things that we see more and more is the idea that actionable insights are perishable, and that latency matters. In fact, you have a budget for latency, almost, like in that short amount of time, the more sort of features that you can dynamically feed into a model to get a score, are you seeing more of that? How are the use cases that you're seeing, how's that pattern unfolding? >> Yeah, so we're seeing more streaming data use cases. We work with some of the biggest technology companies in the world, so IoT, connected services, streaming real time decisions that are happening. But then, also, there are so many use cases around org that could be marketing, finance, HR related, not just tech related. On the marketing side, imagine if you're customer service, and somebody calls you, and you know instantly the lifetime value of that customer, and it kicks off a totally new talk track, maybe get escalated immediately to a new supervisor, because that supervisor can handle this top tier customer. These are decisions that can happen real time leveraging machine learning models, and these are things that, again, are small quick wins, but massive, massive impact. It's about decision process now. That's digital transformation. >> OK. Are you seeing patterns in terms of how much horsepower customers are budgeting for the training process, creating the model? Because we know it's very compute intensive, like, even Intel, some people call it, like, high performance compute, like a supercomputer type workload. How much should people be budgeting? Because we don't see any guidelines or rules of thumb for this. >> I still think the boundaries are being worked out. There's a lot of great work that Nvidia's doing with GPU, we're able to do things faster on compute power. But even if we just start from the basics, if you go and talk to a data scientist at a massive company where they have a team of over 1,000 data scientists, and you say to do this analysis, how do you spin up your compute power? Well, I go walk over to IT and I knock on the door, and I say, "Set up this machine, set up this cluster." That's ridiculous. A product like ours is able to instantly give them the compute power, scale it elastically with our cloud service partners or work with on-prem solutions to be able to say, get the power that you need to get the results in the time that's needed, quick, fast. In terms of the boundaries of the budget, that's still being defined. But at the end of the day, we are seeing return on investment, and that's what's key. >> Are you seeing a movement towards a greater scope of integration for the data science tool chain? Or is it that at the high end, where you have companies with 1,000 data scientists, they know how to deal with specialized components, whereas, when there's perhaps less of, a smaller pool of expertise, the desire for end to end integration is greater. >> I think there's this kind of thought that is not necessarily right, and that is, if you have a bigger data science team, you're more sophisticated. We actually see the same sophistication level of 1,000 person data science team, in many cases, to a 20 person data science team, and sometimes inverse, I mean, it's kind of crazy. But it's, how do we make sure that we give them the tools so they can drive value. Tools need to include collaboration and workflow, not just hammers and nails, but how do we work together, how do we scale knowledge, how do we get it in the hands of the line of business so they can use the results. It's that that is key. >> That's great, Ian. I also like that you really kind of articulated start small, quick ins can make massive impact. We want to thank you so much for stopping by the Cube and sharing that, and what you guys are doing at Data Science to help enterprises really take advantage of the value that data can really deliver. >> Thanks so much for having datascience.com on, really appreciate it. >> Lisa: Absolutely. George, thank you for being my co-host. >> You're always welcome. >> We want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert, and we are at our event Big Data SV on day two. Stick around, we'll be right back with our next guest after a short break. (busy music)

Published Date : Mar 8 2018

SUMMARY :

Those are the ones with the confidence and stupidity and a true gentleman. The guy at the counter recognized me and said... Announcer: John Cleese joins the Cube alumni. brought to you by Silicon Angle Media We are down the street from the Strata Data Conference. and it's great to wrap up my trip here on the show. and it all boils down to one thing, and that is, the next couple steps. to something that turns into a model that is actionable. and the next is, hey, I built this model, that becomes the sort of collaboration backbone. how open source you can adopt everything. OK, so the, and I think it's pretty easy to see Yeah, so orchestrating the enterprise data science stack in terms of all these discussion with business leaders a couple of guests ago, about the chief data officer. and making sure that IT is enabling the data scientists empowering the data scientist, and really, having something that is the thing, or ideally, be able to take market share and the outcome could be, we always say, start small, the more sort of features that you can dynamically in the world, so IoT, connected services, customers are budgeting for the training process, get the power that you need to get the results Or is it that at the high end, We actually see the same sophistication level and sharing that, and what you guys are doing Thanks so much for having datascience.com on, George, thank you for being my co-host. and we are at our event Big Data SV on day two.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

Ian SwansonPERSON

0.99+

GeorgePERSON

0.99+

IanPERSON

0.99+

LisaPERSON

0.99+

UberORGANIZATION

0.99+

John FurrierPERSON

0.99+

Silicon Angle MediaORGANIZATION

0.99+

JohnPERSON

0.99+

John CleesePERSON

0.99+

500 data scientistsQUANTITY

0.99+

90%QUANTITY

0.99+

dozensQUANTITY

0.99+

NvidiaORGANIZATION

0.99+

San JoseLOCATION

0.99+

20 personQUANTITY

0.99+

Data ScienceORGANIZATION

0.99+

nine monthsQUANTITY

0.99+

1,000 personQUANTITY

0.99+

twoQUANTITY

0.99+

two daysQUANTITY

0.99+

more than nine monthsQUANTITY

0.99+

second dayQUANTITY

0.99+

1,000 data scientistsQUANTITY

0.99+

threeQUANTITY

0.99+

Big Data SVEVENT

0.99+

over 1,000 data scientistsQUANTITY

0.99+

CubeORGANIZATION

0.99+

bothQUANTITY

0.99+

Strata Data ConferenceEVENT

0.98+

oneQUANTITY

0.98+

IntelORGANIZATION

0.98+

SonosORGANIZATION

0.98+

one thingQUANTITY

0.97+

a yearQUANTITY

0.96+

todayDATE

0.95+

day twoQUANTITY

0.95+

this yearDATE

0.94+

singleQUANTITY

0.92+

Big Data SV 2018EVENT

0.88+

DataScience.comORGANIZATION

0.87+

hundreds of new machine learning librariesQUANTITY

0.86+

lot of peopleQUANTITY

0.83+

decadesQUANTITY

0.82+

every single dayQUANTITY

0.81+

years agoDATE

0.77+

last two daysDATE

0.76+

datascience.comORGANIZATION

0.75+

one endQUANTITY

0.7+

yearsQUANTITY

0.67+

datascience.comOTHER

0.65+

couple stepsQUANTITY

0.64+

Big DataEVENT

0.64+

couple of guestsDATE

0.57+

coupleQUANTITY

0.52+

Silicon ValleyLOCATION

0.52+

thingsQUANTITY

0.5+

CubeTITLE

0.47+

Ziya Ma, Intel | Big Data SV 2018


 

>> Live from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big data SV. I'm Lisa Martin with my co-host George Gilbert. We're down the street from the Strata Data Conference, hearing a lot of interesting insights on big data. Peeling back the layers, looking at opportunities, some of the challenges, barriers to overcome but also the plethora of opportunities that enterprises alike have that they can take advantage of. Our next guest is no stranger to theCUBE, she was just on with me a couple days ago at the Women in Data Science Conference. Please welcome back to theCUBE, Ziya Ma. Vice President of Software and Services Group and the Director of Big Data Technologies from Intel. Hi Ziya! >> Hi Lisa. >> Long time, no see. >> I know, it was just really two to three days ago. >> It was, well and now I can say happy International Women's Day. >> The same to you, Lisa. >> Thank you, it's great to have you here. So as I mentioned, we are down the street from the Strata Data Conference. You've been up there over the last couple days. What are some of the things that you're hearing with respect to big data? Trends, barriers, opportunities? >> Yeah, so first it's very exciting to be back at the conference again. The one biggest trend, or one topic that's hit really hard by many presenters, is the power of bringing the big data system and data science solutions together. You know, we're definitely seeing in the last few years the advancement of big data and advancement of data science or you know, machine learning, deep learning truly pushing forward business differentiation and improve our life quality. So that's definitely one of the biggest trends. Another thing I noticed is there was a lot of discussion on big data and data science getting deployed into the cloud. What are the learnings, what are the use cases? So I think that's another noticeable trend. And also, there were some presentations on doing the data science or having the business intelligence on the edge devices. That's another noticeable trend. And of course, there were discussion on security, privacy for data science and big data so that continued to be one of the topics. >> So we were talking earlier, 'cause there's so many concepts and products to get your arms around. If someone is looking at AI and machine learning on the back end, you know, we'll worry about edge intelligence some other time, but we know that Intel has the CPU with the Xeon and then this lower power one with Atom. There's the GPU, there's ASICs, FPGAS, and then there are these software layers you know, with higher abstraction layer, higher abstraction level. Help us put some of those pieces together for people who are like saying, okay, I know I've got a lot of data, I've got to train these sophisticated models, you know, explain this to me. >> Right, so Intel is a real solution provider for data science and big data. So at the hardware level, and George, as you mentioned, we offer a wide range of products from general purpose like Xeon to targeted silicon such as FPGA, Nervana, and other ASICs chips like Nervana. And also we provide adjacencies like networking the hardware, non-volatile memory and mobile. You know, those are the other adjacent products that we offer. Now on top of the hardware layer, we deliver fully optimized software solutions stack from libraries, frameworks, to tools and solutions. So that we can help engineers or developers to create AI solutions with greater ease and productivity. For instance, we deliver Intel optimized math kernel library. That leverage of the latest instruction set gives us significant performance boosts when you are running your software on Intel hardware. We also deliver framework like BigDL and for Spark and big data type of customers if they are looking for deep learning capabilities. We also optimize some popular open source deep learning frameworks like Caffe, like TensorFlow, MXNet, and a few others. So our goal is to provide all the necessary solutions so that at the end our customers can create the applications, the solutions that they really need to address their biggest pinpoints. >> Help us think about the maturity level now. Like, we know that the very most sophisticated internet service providers who are sort of all over this machine learning now for quite a few years. Banks, insurance companies, people who've had this. Statisticians and actuaries who have that sort of skillset are beginning to deploy some of these early production apps. Where are we in terms of getting this out to the mainstream? What are some of the things that have to happen? >> To get it to mainstream, there are so many things we could do. First I think we will continue to see the wide range of silicon products but then there are a few things Intel is pushing. For example, we're developing this in Nervana, graph compiler that will encapsulate the hardware integration details and present a consistent API for developers to work with. And this is one thing that we hope that we can eventually help the developer community with. And also, we are collaborating with the end user. Like, from the enterprise segment. For example, we're working with the financial services industry, we're working with a manufacturing sector and also customers from the medical field. And online retailers, trying to help them to deliver or create the data science and analytics solutions on Intel-based hardware or Intel optimized software. So that's another thing that we do. And we're seeing actually very good progress in this area. Now we're also collaborating with many cloud service providers. For instance, we work with some of the top seven cloud service providers, both in the U.S. and also in China to democratize the, not only our hardware, but also our libraries and tools, BigDL, MKL, and other frameworks and libraries so that our customers, including individuals and businesses, can easily access to those building blocks from the cloud. So definitely we're working from different factors. >> So last question in the last couple of minutes. Let's kind of vibe on this collaboration theme. Tell us a little bit about the collaboration that you're having with, you mentioned customers in some highly regulated industries, for as an example. But a little bit to understand what's that symbiosis? What is Intel learning from your customers that's driving Intel's innovation of your technologies and big data? >> That's an excellent question. So Lisa, maybe I can start my sharing a couple of customer use cases. What kind of a solution that we help our customer to address. I think it's always wise not to start a conversation with the customer on technology that you deliver. You want to understand the customer's needs first. And then so that you can provide a solution that really address their biggest pinpoint rather than simply selling technology. So for example, we have worked with an online retailer to better understand their customers' shopping behavior and to assess their customers' preferences and interests. And based upon that analysis, the online retailer made different product recommendations and maximized its customers' purchase potential. And it drove up the retailer's sales. You know, that's one type of use case that we have worked. We also have partnered with the customers from the medical field. Actually, today at the Strata Conference we actually had somebody highlighting, we had a joint presentation with UCSF where we helped the medical center to automate the diagnosis and grading of meniscus lesions. And so today actually, that's all done manually by the radiologist but now that entire process is automated. The result is much more accurate, much more consistent, and much more timely. Because you don't have to wait for the availability of a radiologist to read all the 3D MRI images. And that can all be done by machines. You know, so those are the areas that we work with our customers, understand their business need, and give them the solution they are looking for. >> Wow, the impact there. I wish we had more time to dive into some of those examples. But we thank you so much, Ziya, for stopping by twice in one week to theCUBE and sharing your insights. And we look forward to having you back on the show in the near future. >> Thanks, so thanks Lisa, thanks George for having me. >> And for my co-host George Gilbert, I'm Lisa Martin. We are live at Big Data SV in San Jose. Come down, join us for the rest of the afternoon. We're at this cool place called Forager Tasting and Eatery. We will be right back with our next guest after a short break. (electronic outro music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media some of the challenges, barriers to overcome What are some of the things that you're So that's definitely one of the biggest trends. on the back end, So at the hardware level, and George, as you mentioned, What are some of the things that have to happen? and also customers from the medical field. So last question in the last couple of minutes. customers from the medical field. And we look forward to having you We will be right back with our

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

UCSFORGANIZATION

0.99+

GeorgePERSON

0.99+

LisaPERSON

0.99+

San JoseLOCATION

0.99+

ChinaLOCATION

0.99+

Ziya MaPERSON

0.99+

U.S.LOCATION

0.99+

International Women's DayEVENT

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

ZiyaPERSON

0.99+

one weekQUANTITY

0.99+

todayDATE

0.99+

twiceQUANTITY

0.99+

FirstQUANTITY

0.99+

Strata Data ConferenceEVENT

0.99+

one topicQUANTITY

0.98+

SparkTITLE

0.98+

bothQUANTITY

0.98+

IntelORGANIZATION

0.98+

one thingQUANTITY

0.98+

three days agoDATE

0.98+

Women in Data Science ConferenceEVENT

0.97+

Strata ConferenceEVENT

0.96+

firstQUANTITY

0.96+

BigDLTITLE

0.96+

TensorFlowTITLE

0.96+

one typeQUANTITY

0.95+

twoDATE

0.94+

MXNetTITLE

0.94+

CaffeTITLE

0.92+

theCUBEORGANIZATION

0.91+

oneQUANTITY

0.9+

Software and Services GroupORGANIZATION

0.9+

Forager Tasting and EateryORGANIZATION

0.88+

Vice PresidentPERSON

0.86+

Big Data TechnologiesORGANIZATION

0.84+

seven cloud service providersQUANTITY

0.81+

last couple daysDATE

0.81+

AtomCOMMERCIAL_ITEM

0.76+

Silicon ValleyLOCATION

0.76+

Big Data SV 2018EVENT

0.74+

a couple days agoDATE

0.72+

Big Data SVORGANIZATION

0.7+

XeonCOMMERCIAL_ITEM

0.7+

NervanaORGANIZATION

0.68+

Big DataEVENT

0.62+

lastDATE

0.56+

dataEVENT

0.54+

caseQUANTITY

0.52+

3DQUANTITY

0.48+

coupleQUANTITY

0.47+

yearsDATE

0.47+

NervanaTITLE

0.45+

BigORGANIZATION

0.32+

Blaine Mathieu, VANTIQ | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's The Cube, presenting Big Data, Silicon Valley. Brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to The Cube. Our continuing coverage of our event, Big Data SV continues. I am Lisa Martin joined by Peter Burris. We're in downtown San Jose at a really cool place called Forager Tasting and Eatery. Come down, hang out with us today as we have continued conversations around all things big data, everything in between. This is our second day here and we're excited to welcome to The Cube the CMO of VANTIQ, Blaine Mathieu. Blaine, great to meet you, great to have you on the program. >> Great to be here, thanks for inviting me. >> So, VANTIQ, you guys are up the street in Walnut Creek. What do you guys do, what are you about, what makes VANTIQ different? >> Well, in a nutshell, VANTIQ is a so called high productivity application development platform to allow developers to build, deploy, and manage so called event driven real time applications, the kind of applications that are critical for driving many of the digital transformation initiatives that enterprises are trying to get on top of these days. >> Digital trasformation, it's a term that can mean so many different things, but today, it's essential for companies to be able to compete, especially enterprise companies with newer companies that are more agile, more modern. But if we peel apart digital transformation, there's so many elements that are essential. How do you guys help companies, enterprises, say, evolve their application architectures that might currently not be able to support an actual transformation to a digital business? >> Well, I think that's a great question, thank you. I think the key to digital trasformation is really a lot around the concept of real time, okay. The reason Uber is disrupting or has disrupted the taxi industry is the old way of doing it was somebody called a taxi and then they waited 30 minutes for a taxi to show up and then they told the taxi where to go and hopefully they got there. Whereas, Uber, turned that into a real time business, right? You called, you pinged something on your phone. They knew your location. They knew the location of the driver. They matched those up, brought 'em together in real time. Already knew where to bring you to and ensured you had the right route and that location. All of this data flowing, all of these actions have been taken in real time. The same thing applies to a disruptor like Netflix, okay? In the old days, Blockbuster used to send you, you know, a leaflet in the mail telling you what the new movies are. Maybe it was personalized for you. Probably not. No, Netflix knows who you are instantly, gives you that information, again, in real time based on what you've done in the past and is able to give you, deliver the movie also, in real time pretty well. Every disruptor you look at around digital transformation is bringing a business or a process that was done slowly and impersonally to make it happen in real time. Unfortunately, enterprise applications and the architectures, as you said a second ago, that are being used in most applications today weren't designed to enable these real time use cases. A great example is sales force. So, a sales force is a pretty standard, what you'd call a request application. So, you make a request, a person, generally, makes a request of the system, system goes into a database, queries that database, find information and then returns it back to the user. And that whole process could take, you know, significant amounts of time, especially if the right data isn't in the database at the time and you have to go request it or find it or create it. A new type of application needs to be created that's not fundamentally database centric, but it's able to take these real time data streams coming in from devices, from people, from enterprise systems, process them in real time and then take an action. >> So, let's pretend I'm a CEO. >> Yeah. >> One of the key things you said, and I want you to explain it better, is event. What is event? What is an event and how does that translate into a digital business decision? >> This notion of complex event processing CEP has been around in technology for a long time and yet, it surprises me still a lot of folks we talk to, CEOs, have never heard of the concept. And, it's very simple really. An event is just something that happens in the context of business. That's as complex and as simple as it is. An event could be a machine increases in temperature by one degree, a car moves from one location to another location. It could be an enterprise system, like an ERP system, you know, approves a PO. It could be a person pressing a button on a mobile device. All of those, or it could be an IOT device putting off a signal about the state of a machine. Increasingly, we're getting a lot of events coming from IOT devices. So, really, any particular interesting business situation or a change in a situation that happens is an event And increasingly driven, as you know, by IOT, by augmented reality, by AI and machine learning, by autonomous vehicles, by all these new real time technologies are spinning off more and more events, streams of these events coming off in rapid fashion and we have to be able to do something about them. >> Let me take a crack at it and you tell me if I've got this right. That, historically, applications have been defined in terms of processes and so, in many respects, there was a very concrete, discreet, well established program, set of steps that were performed and then the transaction took place. And event, it seems to me is, yeah, we generally described it, but it changes in response to the data. >> Right, right. >> So, an event is kind of like an outside in driven by data. >> Right, right. >> System response, whereas, your traditional transaction processing is an inside out driven by a sequence of programmed steps, and that decision might have been made six years ago. So, the event is what's happening right now informed by data versus a transaction, traditional transaction is much more, what did we decide to do six years ago and it just gets sustained. Have I got that right? >> That's right. Absolutely right or six hours ago or even six minutes ago, which might seem wow, six minutes, that's pretty good, but take a use case for a field service agent trying to fix a machine or an air conditioner on top of a building. In today's world now, that air conditioner has hundreds of sensors that are putting off data about the state of that air conditioner in real time. A service tech has the ability to, while the machine is still putting off that data, be able to make repairs and changes and fixes, again, in the moment, see how that is changing the data coming off the machine, and then, continue to make the appropriate repairs in collaboration with a smart system or an application that's helping them. >> That's how identifying patterns about what the problem is, versus some of the old ways was where we had recipe of, you know, steps that you went through in the call center. >> Right, right. And the customer is getting more and more frustrated. >> They got their clipboard out and had the 52 steps they followed to see oh that didn't work, now the next step. No, data can help us do that much more efficiently and effectively if we're able to process it in real time. >> So, in many respects, what we're really talking about is an application world or a world looking forward where the applications, which historically have been very siloed, process driven, to a world where the application function is much more networked together and the application, the output of one application is having a significant impact through data on the performance of an application somewhere else. That seems like it's got the potential to be an extremely complex fabric. (laughing) So, do I wait until I figure all that out (laughing) and then I start building it? Or do I, I mean, how do I do it? Do I start small and create and grow into it? What's the best way for people to start working on this? >> Well, you're absolutely right. Building these complex, geeking out a little bit, you know, asynchronous, non-blocking, so called reactive applications, that's the concept that we've been using in computer science for some time, is very hard, frankly. Okay, it's much easier to build computing systems that process things step one, step, two, step three, in order, but if you have to build a system that is able to take real time inputs or changes at any point in the process at any time and go in a different direction, it's very complex. And, computer scientists have been writing applications like this for decades. It's possible to do, but that isn't possible to do at the speed that companies now want to transform themselves, right? By the time you spec out an application and spend two years writing it, your business competitors have already disrupted you. The requirements have already changed. You need to be much more rapid and agile. And so, the secret sauce to this whole thing is to be able to write these transformative applications or create them, not even write is actually the wrong word to use, to be able to create them. >> Generate them. >> Yeah, generate them in a way which is very fast, does not require a guru level developer and reactive Java or some super low level code that you'd have to use to otherwise do it, so that you can literally have business people help design the applications, conceptually build them almost in real time, get them out into the market, and then be able to modify them as you need to, you know, on the fly. >> If I can build on that for just one second. So, it used to be we had this thing called computer assisted software engineer. >> (laughs) Right, right. >> We were going to operate this very very high level language. It's kind of-- But then, we would use code and build a code and the two of them were separated and so the minute that we deployed, somebody would go off and maintain and the whole thing would break. >> Right, right. >> Do you have that problem? >> No, well, that's exactly right. So, the old, you know, the old, the previous way of doing it was about really modeling an application, maybe visually, drag and drop, but then fundamentally, you created a bunch of code and then your job, as you said after, was to maintain and deploy and manage. >> Try to sustain some connection back up to that beautiful visual model. >> And you probably didn't because that was too much. That was too much work, so forget about the model after that. Instead, what we're able to do these days is to build the applications visually, you know, really for the most part with either super low code or, in many cases, no code because we have the ability to abstract away a lot of the complexity, a lot of the complex code that you'd have to write, we can represent that, okay, with these logical abstractions, create the applications themselves, and then continue to maintain, add to, modify the application using the exact same structure. You're not now stuck on, now you're stuck with 20,000 lines of code that you have to, that you have to edit. You're continuing to run and maintain the application just the way you built it, okay. We've now got to the place in computer science where we can actually do these things. We couldn't do them, you know, 20 years ago with case, but we can absolutely do them now. >> So, I'm hearing from a customer internal perspective a lot of operational efficiencies that VANTIQ can drive. Let's look now from a customer's perspective. What are the business impacts you're able to make? You mentioned the word reactive a minute ago when you were talking about applications, but do you have an example where you've, VANTIQ, has enabled a customer, a business, to be more, to be proactive and be able to identify through, you know, complex event processing, what their customers are doing to be able to deliver relevant messages and really drive revenue, drive profit? >> Right, right. So many, you know, so many great examples. And, I mentioned field service a few minutes ago. I've got a lot of clients in that doing this real time field service using these event processing applications. One that I want to bring up right now is one of the largest global shoe manufacturers, actually, that's a client of VANTIQ. I, unfortunately, can't say the name right now 'cause they want to keep what they're doing under wraps, but we all definitely know the company. And they're using this to manage the security, primarily, around their real time global supply chain. So, they've got a big challenge with companies in different countries redirecting shipments of their shoes, selling them on the gray market, at different prices than what are allowed in different regions of the world. And so, through both sensorizing the packages, the barcode scanning, the enterprise systems bringing all that data together in real time, they can literally tell in the moment is something is be-- If a package is redirected to the wrong region or if literally a shoe or a box of shoes is being sold where it shouldn't be sold at the wrong price. They used to get a monthly report on the activities and then they would go and investigate what happened last month. Now, their fraud detection manager is literally sitting there getting this in real time, saying, oh, Singapore sold a pallet of shoes that they should not have been able to sell five minute ago. Call up the guy in Singapore and have him go down and see what's going on and fix that issue. That's pretty powerful when you think about it. >> Definitely, so like reduction in fraud or increase in fraud detection. Sounds like, too, there's a potential for a significant amount of cost savings to the business, not just meeting the external customer needs, but from a, from a cost perspective reduction. Not just some probably TCO, but in operational expenses. >> For sure, although, I would say most of the digital transformation initiatives, when we talk to CEOs and CIOs, they're not focused as much on cost savings, as they're focused on A, avoiding being disrupted by the next interesting startup, B, creating new lines of business, new revenue streams, finding out a way to do something differently dramatically better than they're currently doing it. It's not only about optimizing or squeezing some cost out of their current application. This thing that we are talking about, I guess you could say it's an improvement on their current process, but really, it's actually something they just weren't even really doing before. Just a total different way of doing fraud detection and managing their global supply chain that they just fundamentally weren't even doing. And now, of course, they're looking at many other use cases across the company, not just in supply chain, but, you know, smart manufacturing, so many use cases. Your point about savings, though, there's, you know, what value does the application itself bring? Then, there's the question of what does it cost to build and maintain and deploy the application itself, right? And, again, with these new visual development tools, they're not modeling tools, you're literally developing the application visually. You know, I've been in so many scenarios where we talked to large enterprises. You know, we talk about what we're doing, like we talk about right now, and they say, okay, we'd love to do a POC, proof of concept. We want to allocate six months for this POC, like normally you would probably do for building most enterprise applications. And, we inevitably say, well, how about Friday? How about we have the POC done by Friday? And, you know, we get the Germans laugh, you know, laugh uncomfortably and we go away and deliver the POC by Friday because of how much different it is to build applications this way versus writing low level Java or C-sharp code and sticking together a bunch of technologies and tools 'cause we abstract all that away. And, you know, the eyes drop open and the mouth drops open and it's incredible what modern technology can do to radically change how software is being developed. >> Wow, big impact in a short period of time. That's always a nice thing to be able to deliver. >> It is, it is to-- It's great to be able to surprise people like that. >> Exactly, exactly. Well, Blaine, thank you so much for stopping by, sharing what VANTIQ is doing to help companies be disruptive and for sharing those great customer examples. We appreciate your time. >> You're welcome. Appreciate the time. >> And for my co-host, Peter Burris, I'm Lisa Martin. You're watching The Cube's continuing coverage of our event, Big Data SV Live from San Jose, down the street from the Strata Data Conference. Stick around, we'll be right back with our next guest after a short breal. (techy music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by Silicon Angle Media the CMO of VANTIQ, Blaine Mathieu. So, VANTIQ, you guys are up the street in Walnut Creek. for driving many of the digital transformation that might currently not be able to support and the architectures, as you said a second ago, One of the key things you said, in the context of business. in response to the data. So, an event is kind of like an outside in So, the event is what's happening right now and changes and fixes, again, in the moment, of the old ways was where we had recipe of, you know, And the customer is getting more and more frustrated. they followed to see oh that didn't work, and the application, the output of one application And so, the secret sauce to this whole thing to modify them as you need to, you know, on the fly. So, it used to be we had this thing and so the minute that we deployed, So, the old, you know, the old, Try to sustain just the way you built it, okay. but do you have an example where you've, that they should not have been able to sell to the business, not just meeting and deliver the POC by Friday because to be able to deliver. It's great to be able to surprise people Well, Blaine, thank you so much for stopping by, Appreciate the time. down the street from the Strata Data Conference.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
BlainePERSON

0.99+

Lisa MartinPERSON

0.99+

Peter BurrisPERSON

0.99+

SingaporeLOCATION

0.99+

UberORGANIZATION

0.99+

two yearsQUANTITY

0.99+

NetflixORGANIZATION

0.99+

San JoseLOCATION

0.99+

VANTIQORGANIZATION

0.99+

Blaine MathieuPERSON

0.99+

20,000 linesQUANTITY

0.99+

30 minutesQUANTITY

0.99+

twoQUANTITY

0.99+

Silicon Angle MediaORGANIZATION

0.99+

52 stepsQUANTITY

0.99+

Walnut CreekLOCATION

0.99+

six monthsQUANTITY

0.99+

JavaTITLE

0.99+

one degreeQUANTITY

0.99+

FridayDATE

0.99+

second dayQUANTITY

0.99+

last monthDATE

0.99+

one secondQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

six years agoDATE

0.98+

bothQUANTITY

0.98+

Strata Data ConferenceEVENT

0.98+

Big Data SV LiveEVENT

0.98+

OneQUANTITY

0.98+

The CubeORGANIZATION

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

20 years agoDATE

0.98+

Big Data SV 2018EVENT

0.97+

six hours agoDATE

0.97+

six minutes agoDATE

0.97+

five minute agoDATE

0.97+

a minute agoDATE

0.96+

hundreds of sensorsQUANTITY

0.95+

The CubeTITLE

0.94+

BlockbusterORGANIZATION

0.91+

few minutes agoDATE

0.89+

step oneQUANTITY

0.89+

step threeQUANTITY

0.85+

Forager Tasting and EateryORGANIZATION

0.85+

decadesQUANTITY

0.84+

six minutesQUANTITY

0.84+

CTITLE

0.83+

Big DataORGANIZATION

0.81+

one locationQUANTITY

0.78+

one applicationQUANTITY

0.77+

second agoDATE

0.71+

CEPORGANIZATION

0.53+

bigORGANIZATION

0.52+

GermansPERSON

0.51+

techyORGANIZATION

0.41+

DataEVENT

0.31+

Matt Maccaux, Dell EMC | Big Data SV 2018


 

>> Male Narrator: Live from San Jose, it's theCube. Presenting Big Data Silicon Valley, brought to you by SilconANGLE Media and it's ecosystem partners. >> Welcome back to theCube's continuing coverage of our event, Big Data SV in downtown San Jose. I'm Lisa Martin, my co-host is Dave Vellante. Hey Dave. >> Hey Lisa, how's it going? >> Good. >> Doing a great job here, by the way. >> Well thank you, sir. >> Keeping the trains going. >> Yeah. >> Well done. >> We've had a really interesting couple of days, we started here yesterday interviewing lots of great guys and gals on Big Data and everything in between. A lots of different topics there, opportunities, challenges, digital transformation, how can customers really evolve on this journey? We're excited to welcome back to theCube, one of our distinguished alumni, Matt Maccaux, the Global Big Data Practice Lead from Dell EMC. Welcome back. >> Well thanks for having me, appreciate it, it's a pleasure to be here. >> Yeah, so lots of stuff going on. We've been here, as I mentioned, we're down the street from the Strata Data Conference and we've had a lot of great conversations, very educational, informative. You've been with the whole Dell EMC family for a while now. We'd love to get your perspective on, kind of, what's going on from your team's standpoint. What are you seeing in the enterprises with respect to Big Data and being able to really leverage data across the business as a value driver and a revenue generator? >> Yeah, it's interesting that what we see across the business in terms of, especially in the big enterprises, there, many organizations, even the more mature ones, are still struggling to get that extra dollar, that extra level of monetization out of their data assets. They, everyone talks about monetizing data and using data, treating it as an asset, but organizations are struggling with that, not because of the technology, the technology's been put in, they've ramped up their teams, their skills. It's, what we tend to see inhibiting this digital transformation growth is process. It's organizational strife and it's not looking to best practices, even within own, their own organization, we're doing things like DevOps. So, why would we treat the notion of creating a data model any different than we would regular application development? Well, organizations still carry that weight, that inertia, they still treat Big Data and analytics like they do the data warehouse, and the most effective organizations are starting to incorporate that agile methodology and agile thinking, no snowflakes, infrastructure's code, these concepts of quickly and rapidly repeatedly doing these things, those are the organizations that are really starting to pull away from their competitors in industry. So, Dell EMC, our consulting group and our product lines are all there to support that transformation journey by taking those best practices and DevOps DataOps and bringing that to the analytical space. >> Do you think that companies, Matt, have a pretty good sense as to how applications that they develop are going to affect, create value, creating value is, let's simplify it, increasing revenue or cutting cost? Generally people can predict with the impact, they can write a business case around it. My observation was that certainly in the early days of so-called Big Data, people really didn't have an understanding as to the relationship between their data and that value, and so, many companies mistakenly thought, "Well I need to figure out how to sell my data," versus understand how data affects monetization. I wonder if you could comment on that and how has that progressed throughout the years? >> Yeah, that's a good point, we, from a consulting practice, used to do a lot of, what we call, proof of values, where organizations, after they kicked the tires and covered some use cases, we took them through a very slow, methodical business case RY analysis. You're going to spend this much on infrastructure, you're going to hire these people, you're going to take this data, and poof, you're going to make this much money, you're going to save this much money. Well, we're doing less and less of that these days because organizations have a good feel for where they want to go and the potential upside for doing this where they're now tend to struggle is, "Well, how do I actually get there?" "There's still a lot of tools and a lot of technologies and which is right for my business?" "What is the right process and how do I build that consensus in the organization?" And so, from a business consulting perspective, we're doing less of the RY work and more of the governance, the sort of, governance work by aligning stakeholders, getting those repeatable patterns and architectures in place to help organizations take that first few wins and then scale it. >> Where do you see the action these days? I mean there's somehow I profile use cases, obviously getting people to click on ads, Big Data has helped with that, fraud detection has come such a long way in the last 10 years, ya know, no doubt, certainly risk assessment, ya know, from the financial services industry. Those are the obvious ones, where else do you see Big Data analytics to the changing the world, if you will? >> Yeah, so I'd say those static or batch-type workloads are well understood. That, hey, is there fraud on transactions that occurred yesterday or last night? What is the customer score, lifetime value score for customer? Where we see more trends in the enterprise space is streaming. So, what can we catch in real time and help our people make real time decisions? So, and that is dealing with unstructured data. So, I've got a call center and I'm listening to the voice that's coming in, putting some sentiment analysis on that and then providing a score or script to the customer call agent in real time. And those, sort of, streaming use cases, whether it's images or voice, that, I think, is the next paradigm for use cases that organizations want to tackle. 'Cause if you can prevent a customer from leaving in real time, right, say, you know what, it sounds like you're upset, what if we did X to help retain you, it's going to be significant. All these organizations have a good idea of the cost it takes to acquire a new customer and the cost of losing a customer, so if they can put that intelligence in upstream, they no longer have to spend so much money trying to capture new customers 'cause they can focus on the ones they have. So, I think that, sort of, time between customer and streaming is where the next set of, I think, money's to be found. >> So customer experience is critical for businesses in any organization, I'm wondering, kind of, what the juxtaposition is of businesses going, "Yes, we have to be able "to do things in real time, in enterprise, "we have to be agile, yet we have, in order "to really facilitate a really effective, relevant, "timely customer experience, many departments "and organizations in a business need access to data." From a political perspective, how does Dell EMC, how does your consulting practice help an enterprise be able to start opening up these barriers internally to be able to enable data sharing so that they can drive and take advantage of things like real-time streaming to ultimately improve the customer experience, revenue, et cetera? >> Yeah, it's going to sound really trite, but the first step is getting everyone in a room and talking about what good looks like, what are the low-hanging... And everyone's going to agree on those use cases, there going to say, "These are the things we have to do," right, "We want to lose fewer customers, we want to..." You know, whatever the case may be, so everyone will agree on that. So, the politics don't come into play there. So, "Well, what data do we require for that?" "Okay, well, we've got all this data, great, "no disagreement there." Well, where is the data located? Who's the owner or the steward of that data? And now, who's going to be responsible for monetizing that? And that's where we tend to see the breakdown because when these things cross the line of business and customer always crosses the line of business, you end up with turf wars. And so this, the emergence of the Chief Data Officer, who's responsible for the policy and the prioritization and the ownership of these things is such a key role now, that, and it's not a CIO responsible for data, it is a business aligned executive reporting to the chief, CEO, COO, CFO. Again, business alignment, that tends to be the decision maker or at least the thing that solves for those conflicts across those BUs. And when that happens, then we see real change. But, if there's not that role or that person that can put that line in the sand and say, "This is how we're going to do it," you end up with that political strife and then you end up with silos of information or point solutions across the enterprise and it doesn't serve anyone. >> What are you seeing in terms of that CDO role? I mean, initially the Chief Data Officer was really within regulated businesses, financial services, healthcare, government. And then you've seen it permeate, ya know, to more mainstream. Do you see that role as having legs? A lot of people have questioned that role. What Chief Digital Officer, Chief Data Officer is encroaching on the CIO territory? I'm inferring from your comments that you're optimistic about that role going forward. >> I am, as long as it's well-defined as having unique capabilities that's different than the CIO. Again, I think the first generation of Chief Data Officers were very CIO-liked or CIO-for-data and that's when you ended up with the turf wars. And then it was like, "Okay, well this is "what we're doing." But then you had someone who was sort of a peer for infrastructure and so, it just didn't seem to work out. And so, now we're seeing that role being redefined, it's less about the technology and the tools and the infrastructure, and it's more about the policies, the consistency, the architectures. >> You know I'd observe, I wonder if we can talk about this for a little bit, it's the CDO role. To me, one of the first things a CDO has to do is understand how a company gets value out of its data, what is the, and if it's a full profit company, what's the monetization, where does that come from? Not selling the data, as we were talking about earlier. And then there is what data, what data, where are, what data architecture, data sources, how do we give access to that? And then quality, data quality seems to be something that they worry about. And then skills, not, none, no technology in here. And then somehow they're going to form relationships with the line of business and it's simultaneous to figuring that out. Does that seem like a reasonable framework for the CIO, CDOs job? >> It does, and you call them Chief Data Governance Officer, I mean, it really falls under the umbrella of governance. It's about standards and consistency, but also these policies of, there are finite resources, whether we're talking people or computes. What do you do when there's not enough resources and more demand? How do you prioritize the things that the business does? Well, do you have policies and matrices that say, "Okay, well, is it material, actionable, timely?" "Then yes, then we'll proceed with this." "No, it doesn't pass." And it doesn't have to be about money. However the organization judges itself is what it should be based on. So, whether we're talking non-profit, we helped a school system recently better align kids with schedules and also learning abilities by sitting them next to each other in classes, there's no profit in that other than the education of children, so every organization judges itself or measures itself a little differently, but it comes back to those KPIs. What are your KPIs, how does that align to business initiatives? And then everything should flow from there. Now, I'm not saying it's easy work. Data governance is the hardest thing to do in this space and that's why I think so few organizations take it on 'cause it's a long, slow process and, ya know, you should've started 10 years ago on it and if you haven't, it feels like this mountain that is really high to climb. >> What you're saying is outcome driven. >> Yeah. >> Independent of the types of organizations. I want to talk about innovation, I've been asking a lot of people this week, do you feel like Big Data, ya know, the meme of Big Data that was created eight, 10 years ago, do you feel like it lived up to its promises? >> That's a loaded question. I think if you were to ask the back office enterprises, I would say yes. In terms of customers feeling it, probably not, because when you use an Uber app to hail a cab and pay $3.75 to go across town, it feels like a quality of life, but you don't know that that's a data-driven decision. As a consumer, your average consumer, you probably don't feel that. As you're clicking through Amazon and they know, sort of, the goods that you need, or the fact that they know what you're going to need and they've got it in a warehouse that they can get to you later that day, it doesn't feel like a Big Data solution, it just feels like, "Hey, the people I'm doing business with, they know me better." People don't really understand that that's a Big Data and analytics concept, so, has it lived up to the hype? Externally, I think the perception is that it has not, but the businesses that really get it, feel that absolutely it has. That's 'cause you, do you agree it's kind of bifurcated? >> Matt Maccaux: Yeah, it is. >> The Spotify's and the Ubers and the Airbnb's that are crushing it and then there's a lot of traditional enterprises that are still stove pipe and struggling. >> Yeah, it's funny, when we talk to customers, we've got our introductory power points, right, it always talks about the new businesses and the old businesses and, and I'm finding that that doesn't play very well anymore with enterprise customers. They're like, "We're never going to be the Uber "of our industry, it's not going to happen "if I'm a fortune 100 legacy, it's not going to happen." "What I really want to do, though, "is help my customers or make more money here, "I'm not going to be the Uber, it's just not going to happen." "We're not the culture, we're not the, we're not set up "that way, we have all of this technical legacy stuff, "but I really want to get more value out of my data, "how do I do that?" And so that message resonates. >> Isn't that in some ways, though, how do you feel about this, is it a recipe for disruption, where that's not going to happen, but something could happen where somebody digitizes your business? >> Yes, absolutely, if there are organizations, if you're in the fortune 500 and you are not worried about someone coming along and disrupting you, then you are probably not doing the right job. I would be kept awake every night, whether it was financial services or industrial manufacturing. >> Dave Vellante: Grocery. >> Nobody thought that the taxis, who the hell would come in and disrupt the cab industry? Ya got to hire all these people, the cars are junk, the customer experience is awful. Well, someone has come along and there's been an industry related to this, now they have their bumps in the road, so are they going to be disrupted again or what's the next level of disruption? But, I think it is technology that fuels that, but it's also the cultural shift as part of that, which is outside the technologies, the socioeconomic trends that I think drive that, as well. >> But even, ya know, and we've got just a few seconds left, the cultural shift internally. It sounds like, from what you're describing, if an enterprise is going to recognize, "I'm not going to compete with an Uber or an Airbnb "or a Netflix, but I've got to be able to compete "with my existing peers of enterprise organizations," the CDO role sounds like it's a matter of survivability. >> Yes. >> Without putting that in place, you can't capitalize on the value of data monetized and et cetera. Well guys, I wish we had more time 'cause I think we're opening a can of worms here, but Dave, Matt thanks so much for having this conversation. Thank you for stopping by. >> Thanks for having me here, it was a real pleasure. >> Likewise. We want to thank you for watching theCube. We are continuing our coverage of our event, Big Data SV in downtown San Jose. For Dave Vellante, my co-host, I'm Lisa Martin. Stick around, we'll be right back with our next guest after a short break. (upbeat music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SilconANGLE Media Welcome back to theCube's continuing coverage by the way. We're excited to welcome back to theCube, it's a pleasure to be here. We'd love to get your perspective on, and bringing that to the analytical space. applications that they develop are going to affect, and more of the governance, the sort of, Those are the obvious ones, where else do you see the cost it takes to acquire a new customer these barriers internally to be able Again, business alignment, that tends to be I mean, initially the Chief Data Officer and the infrastructure, and it's more about To me, one of the first things a CDO has to do Data governance is the hardest thing to do Independent of the types or the fact that they know what you're going to need The Spotify's and the Ubers and the Airbnb's and the old businesses and, and I'm finding then you are probably not doing the right job. their bumps in the road, so are they going to be "or a Netflix, but I've got to be able to compete that in place, you can't capitalize We want to thank you for watching theCube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

DavePERSON

0.99+

Matt MaccauxPERSON

0.99+

Lisa MartinPERSON

0.99+

MattPERSON

0.99+

$3.75QUANTITY

0.99+

AmazonORGANIZATION

0.99+

SpotifyORGANIZATION

0.99+

LisaPERSON

0.99+

AirbnbORGANIZATION

0.99+

yesterdayDATE

0.99+

UberORGANIZATION

0.99+

Dell EMCORGANIZATION

0.99+

last nightDATE

0.99+

SilconANGLE MediaORGANIZATION

0.99+

UbersORGANIZATION

0.99+

first stepQUANTITY

0.99+

10 years agoDATE

0.99+

NetflixORGANIZATION

0.99+

oneQUANTITY

0.98+

eight,DATE

0.97+

this weekDATE

0.96+

Strata Data ConferenceEVENT

0.96+

Big Data SV 2018EVENT

0.96+

Male NarratorTITLE

0.95+

first generationQUANTITY

0.94+

Big DataORGANIZATION

0.93+

San JoseLOCATION

0.92+

Silicon ValleyLOCATION

0.87+

theCubeORGANIZATION

0.83+

theCubeTITLE

0.83+

Big DataTITLE

0.8+

DevOpsTITLE

0.79+

last 10 yearsDATE

0.79+

Big Data SVEVENT

0.73+

first thingsQUANTITY

0.72+

Live fromTITLE

0.7+

first fewQUANTITY

0.7+

daysQUANTITY

0.64+

CDOTITLE

0.62+

500QUANTITY

0.59+

coupleQUANTITY

0.59+

100QUANTITY

0.55+

Octavian Tanase, NetApp | Big Data SV 2018


 

>> Announcer: Live from San Jose it's The Cube presenting Big Data, Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partners. >> Good morning. Welcome to The Cube. We are on day two of our coverage our event Big Data SV. I'm Lisa Martin with my cohost Dave Vellante. We're down the street from the Strata Data Conference. This is The Cube's tenth big data event and we had a great day yesterday learning a lot from myriad guests on very different nuances of big data journey where things are going. We're excited to welcome back to The Cube an alumni, Octavian Tanase, the Senior Vice President of Data ONTAP fron Net App. Octavian, welcome back to The Cube. >> Glad to be here. >> So you've been at the Strata Data Conference for the last couple of days. From a big data perspective, what are some of the things that you're hearing, in terms of from a customer's perspective on what's working, what challenges, opportunities? I'm very excited to be here and learn about the innovation of our partners in the industry and share with our partners and our customers what we're doing to enable them to drive more value out of that data. The reality is that data has become the 21st Century gold or oil that powers the business and everybody's looking to apply new techniques, a lot of times machine learning, deep learning, to draw more value of the data, make better decisions and compete in the marketplace. Octavian, you've been at NetApp now eight years and I've been watching NetApp, as we were talking about offline, for decades and I've seen the ebb and flow and this company has transformed many, many times. The latest, obviously cloud came in, flash came into play and then you're also going through a major transition in the customer based to clustered ONTAP. You seemed to negotiate that. NetApp is back, thriving, stock's up. What's happening at NetApp? What's the culture like these days? Give us the update. >> I think we've been very fortunate to have a CEO like George Kurian, who has been really focused on helping us do basically fewer things better, really focus on our core business, simplify our operations and continue to innovate and this is probably the area that I'm most excited about. It's always good to make sure that you accelerate the business, make it simpler for your customers and your partners to do business with you, but what you have to do is innovate. We are a product company. We are passionate about innovation. I believe that we are innovating with more pace than many of the startups in the space so that's probably the most exciting thing that has been part of our transformation. >> So let's talk about big data. Back in the day if you had a big data problem you would buy a big Unix box, maybe buy some Oracle licenses, try to put all your data into that box and that became your data warehouse. The brilliance of Hadoop was hey we can leave the data where it is. There's too much data to put into the box so we're going to bring five megabytes to code to a petabyte of data. And the other piece of it is CFOs loved it, because we're going to reduce the cost of our expensive data warehouse and we're going to buy off the shelf components: white box, servers and off the shelf disk drives. We're going to put that together and life will be good. Well as things matured, the old client-server days, it got very expensive, you needed enterprise grade. So where does NetApp fit into that equation, because originally big storage companies like NetApp, they weren't part of the equation? Has that changed? >> Absolutely. One of the things that has enabled that transformation, that change is we made a deliberate decision to focus on software defined and making sure that the ONTAP operating system is available wherever data is being created: on the edge in an IoT device, in the traditional data center or in the cloud. So we are in the unique position to enable analytics, big data, wherever those applications reside. One of the things that we've recently done is we've partnered with IDC and what the study, what the analysis has shown is that deploying in analytics, a Hadoop or NoSQL type of solution on top of NetApp is half the cost of DAS. So when you consider the cost of servers, the licenses that you're going to have to pay for, these commercial implementations of Hadoop as well as the storage and the data infrastructure, you are much better off choosing NetApp than a white box type of solution. >> Let's unpack that a little bit, because if I infer correctly from what you said normally you would say the operational costs are going to be dramatically lower, it's easier to manage a professional system like a NetApp ONTAP, it's integrated, great software, but am I hearing you correctly, you're saying the acquisition costs are actually less than if I'm buying white box? A lot of people are going to be skeptical about that, say Octavian no way, it's cheaper to buy white box stuff. Defend that statement. >> Absolutely. If you're looking at the whole solution that includes the server and the storage, what NetApp enables you to do if you're running the solution on top of ONTAP you reduce the need for so many servers. If you reduce that number you also reduce the licensing cost. Moreover, if you actually look at the core value proposition of the storage layer there, DAS typically makes three copies of the data. We don't. We are very greedy and we're making sure that you're using shared storage and we are applying a bunch of storage efficiency techniques to further compress, compact that data for world class storage efficiency. >> So cost efficiency is obviously a great benefit for any company when they're especially evolving, from a digital perspective. What are some of the business level benefits? You mentioned speed a minute ago. What is Data ONTAP and even ONTAP in the cloud enabling your enterprise customers to achieve at the business level, maybe from faster time to market, identifying with machine learning and AI new products? Give me an example of maybe a customer that you think really articulates the value that ONTAP in the cloud can deliver. >> One of the things that's really important is to have your data management capability, whatever the data is being produced so ONTAP being consumed either as a VM or a service ... I don't know if you've seen some of the partnerships that we have with AWS and Azure. We're able to offer the same rich data management capabilities, not only the traditional data center, but in the cloud. What that really enables customers to do is to simplify and have the same operating system, the same data management platform for the both the second platform traditional applications as well as for the third platform applications. I've seen a company like Adobe be very successful in deploying their infrastructure, their services not only on prem in their traditional data center, but using ONTAP Cloud. So we have more than about 1,500 customers right now that have adopted ONTAP in the AWS cloud. >> What are you seeing in terms of the adoption of flash and I'm particularly interested in the intersection of flash adoption and the developer angle, because we've seen, in certain instances, certain organizations are able to share data off of flash much more efficiently that you would be, for instance, of a spinning disk? Have you seen a developer impact in your customer base? >> Absolutely I think most of customers initially have adopted flash, because of high throughput and low latency. I think over time customers really understood and identified with the overall value proposition in cost of ownership in flash that it enables them to consolidate multiple workloads in a smaller footprint. So that enables you to then reduce the cost to operate that infrastructure and it really gives you a range of applications that you can deploy that you were never able to do that. Everybody's looking to do in place, in line analytics that now are possible, because of this fast media. Folks are looking to accelerate old applications in which they cannot invest anymore, but they just want to run faster. Flash also tends to be more reliable than traditional storage, so customers definitely appreciate that fewer things could go wrong so overall the value proposition of flash, it's all encompassing and we believe that in the near future flash will be the defacto standard in everybody's data center, whether it's on prem or in the cloud. >> How about backup and recovery in big data? We obviously, in the enterprise, very concerned about data protection. What's similar in big data? What's different and what's NetApp's angle on that? >> I think data protection and data security will never stop being important to our customers. Security's top of mind for everybody in the industry and it's a source of resume changing events, if you would, and they're typically not promotions. So we have invested a tremendous deal in certifications for HIPAA, for FIPS, we are enabling encryption, both at rest and in flight. We've done a lot of work to make sure that the encryption can happen in software layer, to make sure that we give the customers best storage class efficiency and what we're also leveraging is the innovation that ONTAP has done over many years to protect the data, replicate its snapshots, peering the data to the cloud. These are techniques that we're commonly using to reduce the cost of ownership, also protect the data the customers deploy. >> So security's still a hot topic and, like you said, it probably always will be, but it's a shared responsibility, right? So customers leveraging NetApps safe or on prem hybrid also using Azure or AWS, who's your target audience? If you're talking to the guys and gals that are still managing storage are you also having the CSO or the security guys come in, the gals, to understand we've got this appointment in Azure or AWS so we're going to bring in ONTAP to facilitate this? There's a shared responsibility of security. Who's at the table, from your perspective, in your customers that you need to help understand how they facilitate true security? >> It's definitely been a transformative event where more and more people in IQ organizations are involved in the decisions that are required to deploy the applications. There was a time when we would talk only to the storage admin. After a while we started talking to the application admin, the virtualization admin and now you're talking to the line of business who has that vested interest to make sure that they can harness the power of the data in their environment. So you have the CSO, you have the traditional infrastructure people, you have the app administration and you have the app owner, the business owner that are all at the table that are coming and looking to choose the best of breed solution for their data management. >> What are the conversations like with your CXO, executives? Everybody talks about digital transformation. It's kind of an overused term, but there's real substance when you actually peel the onion. What are you seeing as NetApp's role in effecting digital transformations within your customer base? >> I think we have a vision of how we can help enterprises take advantage of the digital transformation and adopt it. I think we have three tenants of that vision. Number one is we're helping customers harness the power of the cloud. Number two, we're looking to enable them to future proof their investments and build the next generation data center. And number three, nobody starts with a fresh slate so we're looking to help customers modernize their current infrastructure through storage. We have a lot of expertise in storage. We've helped, over time, customers time and again adopt disruptive technologies in nondisruptive ways. We're looking to adopt these technologies and trends on behalf of our customers and then help them use them in a seamless safe way. >> And continue their evolution to identify new revenue streams, new products, new opportunities and even probably give other lines of business access to this data that they need to understand is there value here, how can we harness it faster than our competitors, right? >> Absolutely. It's all about deriving value out of the data. I think earlier I called it the gold of the 21st Century. This is a trend that will continue. I believe there will be no enterprise or center that won't focus on using machine learning, deep learning, analytics to derive more value out of the data to find more customer touch points, to optimize their business to really compete in the marketplace. >> Data plus AI plus cloud economics are the new innovation drivers of the next 10, 20 years. >> Completely agree. >> Well Octavian thanks so much for spending time with us this morning sharing what's new at NetApp, some of the visions that you guys have and also some of the impact that you're making with customers. We look forward to having you back on the program in the near future. >> Thank you. Appreciate having the time. >> And for my cohost Dave Vellante I'm Lisa Martin. You're watching The Cube live on day two of coverage of our event, Big Data SV. We're at this really cool venue, Forager Tasting Room. Come down here, join us, get to hear all these great conversations. Stick around and we'll be right back with our next guest after a short break. (electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media We're down the street from the Strata Data Conference. in the customer based to clustered ONTAP. that you accelerate the business, Back in the day if you had a big data problem and making sure that the ONTAP operating system A lot of people are going to be skeptical about that, that includes the server and the storage, that ONTAP in the cloud can deliver. that have adopted ONTAP in the AWS cloud. to operate that infrastructure and it really gives you We obviously, in the enterprise, peering the data to the cloud. that you need to help understand that are required to deploy the applications. What are the conversations like with your CXO, executives? and build the next generation data center. out of the data to find more customer touch points, are the new innovation drivers of the next 10, 20 years. We look forward to having you back on the program Appreciate having the time. get to hear all these great conversations.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

George KurianPERSON

0.99+

Lisa MartinPERSON

0.99+

Octavian TanasePERSON

0.99+

AdobeORGANIZATION

0.99+

OctavianPERSON

0.99+

AWSORGANIZATION

0.99+

eight yearsQUANTITY

0.99+

San JoseLOCATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

NetAppTITLE

0.99+

HadoopTITLE

0.99+

five megabytesQUANTITY

0.99+

OracleORGANIZATION

0.99+

second platformQUANTITY

0.99+

21st CenturyDATE

0.99+

HIPAATITLE

0.99+

Strata Data ConferenceEVENT

0.99+

yesterdayDATE

0.99+

ONTAPTITLE

0.99+

The CubeTITLE

0.99+

IDCORGANIZATION

0.98+

bothQUANTITY

0.98+

OneQUANTITY

0.98+

UnixCOMMERCIAL_ITEM

0.98+

NetAppORGANIZATION

0.97+

The CubeORGANIZATION

0.97+

Silicon ValleyLOCATION

0.96+

ONTAP CloudTITLE

0.95+

more than about 1,500 customersQUANTITY

0.95+

NetAppsTITLE

0.93+

Big Data SVEVENT

0.93+

Big Data SV 2018EVENT

0.93+

day twoQUANTITY

0.93+

Forager Tasting RoomLOCATION

0.88+

NoSQLTITLE

0.87+

AzureORGANIZATION

0.86+

third platform applicationsQUANTITY

0.81+

a minute agoDATE

0.81+

Number twoQUANTITY

0.8+

Senior Vice PresidentPERSON

0.79+

three tenantsQUANTITY

0.78+

decadesQUANTITY

0.74+

a petabyte of dataQUANTITY

0.73+

tenth bigQUANTITY

0.71+

Number oneQUANTITY

0.71+

three copiesQUANTITY

0.7+

this morningDATE

0.69+

number threeQUANTITY

0.68+

ONTAPORGANIZATION

0.67+

Data ONTAPORGANIZATION

0.64+

eventQUANTITY

0.64+

Net AppTITLE

0.64+

10QUANTITY

0.64+

halfQUANTITY

0.6+

flashTITLE

0.58+

muchQUANTITY

0.58+

Big DataEVENT

0.57+

yearsQUANTITY

0.55+

Sastry Malladi, FogHorn | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partner. (upbeat electronic music) >> Welcome back to The Cube. I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV, in downtown San Jose down the street from the Strata Data Conference. We're joined by a new guest to theCUBE, Sastry Malladi, the CTO Of FogHorn. Sastry, welcome to theCUBE. >> Thank you, thank you, Lisa. >> So FogHorn, cool name, what do you guys do, who are you? Tell us all that good stuff. >> Sure. We are a startup based in Silicon Valley right here in Mountain View. We started about three years ago, three plus years ago. We provide edge computing intelligence software for edge computing or fog computing. That's how our company name got started is FogHorn. For our particularly, for our IoT industrial sector. All of the industrial guys, whether it's transportation, manufacturing, oil and gas, smart cities, smart buildings, any of those different sectors, they use our software to predict failure conditions in real time, or do condition monitoring, or predictive maintenance, any of those use cases and successfully save a lot of money. Obviously in the process, you know, we get paid for what we do. >> So Sastry... GE populized this concept of IIoT and the analytics and, sort of the new business outcomes you could build on it, like Power by the Hour instead of selling a jet engine. >> Sastry: That's right. But there's... Actually we keep on, and David Floor did some pioneering research on how we're going to have to do a lot of analytics on the edge for latency and bandwidth. What's the FogHorn secret sauce that others would have difficulty with on the edge analytics? >> Okay, that's a great question. Before I directly answer the question, if you don't mind, I'll actually even describe why that's even important to do that, right? So a lot of these industrial customers, if you look at, because we work with a lot of them, the amount of data that's produced from all of these different machines is terabytes to petabytes of data, it's real. And it's not just the traditional digital sensors but there are video, audio, acoustic sensors out there. The amount of data is humongous, right? It's not even practical to send all of that to a Cloud environment and do data processing, for many reasons. One is obviously the connectivity, bandwidth issues, and all of that. But the two most important things are cyber security. None of these customers actually want to connect these highly expensive machines to the internet. That's one. The second is the lack of real-time decision making. What they want to know, when there is a problem, they want to know before it's too late. We want to notify them it is a problem that is occurring so that have a chance to go fix it and optimize their asset that is in question. Now, existing solutions do not work in this constrained environment. That's why FogHorn had to invent that solution. >> And tell us, actually, just to be specific, how constrained an environment you can operate in. >> We can run in about less than 100 to 150 megabytes of memory, single-core to dual-core of CPU, whether it's an ARM processor, an x86 Intel-based processor, almost literally no storage because we're a real-time processing engine. Optionally, you could have some storage if you wanted to store some of the results locally there but that's the kind of environment we're talking about. Now, when I say 100 megabytes of memory, it's like a quarter of Raspberry Pi, right? And even in that environment we have customers that run dozens of machinery models, right? And we're not talking -- >> George: Like an ensemble. >> Like an anomaly detection, a regression, a random forest, or a clustering, or a gamut, some of those. Now, if we get into more deep learning models, like image processing and neural net and all of that, you obviously need a little bit more memory. But what we have shown, we could still run, one of our largest smart city buildings customer, elevator company, runs in a raspberry Pi on millions of elevators, right? Dozens of machinery algorithms on top of that, right? So that's the kind of size we're talking about. >> Let me just follow up with one question on the other thing you said, with, besides we have to do the low-latency locally. You said a lot of customers don't want to connect these brown field, I guess, operations technology machines to the internet, and physically, I mean there was physical separation for security. So it's like security, Bill Joy used to say "Security by obscurity." Here it's security by -- >> Physical separation, absolutely. Tell me about it. I was actually coming from, if you don't mind, last week I was in Saudi Arabia. One of the oil and gas plants where we deployed our software, you have to go to five levels of security even to get to there, It's a multibillion dollar plant and refining the gas and all of that. Completely offline, no connectivity to the internet, and we installed, in their existing small box, our software, connected to their live video cameras that are actually measuring the stuff, doing the processing and detecting the specific conditions that we're looking for. >> That's my question, which was if they want to be monitoring. So there's like one low level, really low hardware low level, the sensor feeds. But you could actually have a richer feed, which is video and audio, but how much of that, then, are you doing the, sort of, inferencing locally? Or even retraining, and I assume that since it's not the OT device, and it's something that's looking at it, you might be more able to send it back up the Cloud if you needed to do retraining? >> That's exactly right. So the way the model works is particularly for image processing because you need, it's a more complex process to train than create a model. You could create a model offline, like in a GPU box, an FPGA box and whatnot. Import and bring the model back into this small little device that's running in the plant, and now the live video data is coming in, the model is inferencing the specific thing. Now there are two ways to update and revise the model: incremental revision of the model, you could do that if you want, or you can send the results to a central location. Not internet, they do have local, in this example for example a PIDB, an OSS PIDB, or some other local service out there, where you have an opportunity to gather the results from each of these different locations and then consolidate and retrain the model, put the model back again. >> Okay, the one part that I didn't follow completely is... If the model is running ultimately on the device, again and perhaps not even on a CPU, but a programmable logic controller. >> It could, even though a programmable controller also typically have some shape of CPU there as well. These days, most of the PLCs, programmable controllers, have either an RM-based processor or an x86-based processor. We can run either one of those too. >> So, okay, assume you've got the model deployed down there, for the, you know, local inferencing. Now, some retraining is going to go on in the Cloud, where you have, you're pulling in the richer perspective from many different devices. How does that model get back out to the device if it doesn't have the connectivity between the device and the Cloud? >> Right, so if there's strictly no connectivity, so what happens is once the model is regenerated or retrained, they put a model in a USB stick, it's a low attack. USB stick, bring it to the PLC device and upload the model. >> George: Oh, so this is sort of how we destroyed the Iranian centrifuges. >> That's exactly right, exactly right. But you know, some other environments, even though it's not connectivity to the Cloud environment, per se, but the devices have the ability to connect to the Cloud. Optionally, they say, "Look, I'm the device "that's coming up, do you have an upgraded model for me?" Then it can pull the model. So in some of the environments it's super strict where there are absolutely no way to connect this device, you put it in a USB stick and bring the model back here. Other environments, device can query the Cloud but Cloud cannot connect to the device. This is a very popular model these days because, in other words imagine this, an elevator sitting in a building, somebody from the Cloud cannot reach the elevator, but an elevator can reach the Cloud when it wants to. >> George: Sort of like a jet engine, you don't want the Cloud to reach the jet engine. >> That's exactly right. The jet engine can reach the Cloud it if wants to, when it wants to, but the Cloud cannot reach the jet engine. That's how we can pull the model. >> So Sastry, as a CTO you meet with customers often. You mentioned you were in Saudi Arabia last week. I'd love to understand how you're leveraging and gaging with customers to really help drive the development of FogHorn, in terms of being differentiated in the market. What are those, kind of bi-directional, symbiotic customer relationships like? And how are they helping FogHorn? >> Right, that's actually a great question. We learn a lot from customers because we started a long time ago. We did an initial version of the product. As we begin to talk to the customers, particularly that's part of my job, where I go talk to many of these customers, they give us feedback. Well, my problem is really that I can't even do, I can't even give you connectivity to the Cloud, to upgrade the model. I can't even give you sample data. How do you do that modeling, right? And sometimes they say, "You know what, "We are not technical people, help us express the problem, "the outcome, give me tools "that help me express that outcome." So we created a bunch of what we call OT tools, operational technology tools. How we distinguish ourselves in this process, from the traditional Cloud-based vendor, the traditional data science and data analytics companies, is that they think in terms of computer scientists, computer programmers, and expressions. We think in terms of industrial operators, what can they express, what do they know? They don't really necessarily care about, when you tell them, "I've got an anomaly detection "data science machine algorithm", they're going to look at you like, "What are you talking about? "I don't understand what you're talking about", right? You need to tell them, "Look, this machine is failing." What are the conditions in which the machine is failing? How do you express that? And then we translate that requirement, or that into the underlying models, underlying Vel expressions, Vel or CPU expression language. So we learned a ton from user interface, capabilities, latency issues, connectivity issues, different protocols, a number of things that we learn from customers. >> So I'm curious with... More of the big data vendors are recognizing data in motion and data coming from devices. And some, like Hortonworks DataFlow NiFi has a MiNiFi component written in C plus plus, really low resource footprint. But I assume that that's really just a transport. It's almost like a collector and that it doesn't have the analytics built in -- >> That's exactly right, NiFi has the transport, it has the real-time transport capability for sure. What it does not have is this notion of that CEP concept. How do you combine all of the streams, everything is a time series data for us, right, from the devices. Whether it's coming from a device or whether it's coming from another static source out there. How do you express a pattern, a recognition pattern definition, across these streams? That's where our CPU comes in the picture. A lot of these seemingly similar software capabilities that people talk about, don't quite exactly have, either the streaming capability, or the CPU capability, or the real-time, or the low footprint. What we have is a combination of all of that. >> And you talked about how everything's time series to you. Is there a need to have, sort of an equivalent time series database up in some central location? So that when you subset, when you determine what relevant subset of data to move up to the Cloud, or you know, on-prem central location, does it need to be the same database? >> No, it doesn't need to be the same database. It's optional. In fact, we do ship a local time series database at the edge itself. If you have a little bit of a local storage, you can down sample, take the results, and store it locally, and many customers actually do that. Some others, because they have their existing environment, they have some Cloud storage, whether it's Microsoft, it doesn't matter what they use, we have connectors from our software to send these results into their existing environments. >> So, you had also said something interesting about your, sort of, tool set, as being optimized for operations technology. So this is really important because back when we had the Net-Heads and the Bell-Heads, you know it was a cultural clash and they had different technologies. >> Sastry: They sure did, yeah. >> Tell us more about how selling to operations, not just selling, but supporting operations technology is different from IT technology and where does that boundary live? >> Right, so typical IT environment, right, you start with the boss who is the decision maker, you work with them and they approve the project and you go and execute that. In an industrial, in an OT environment, it doesn't quite work like that. Even if the boss says, "Go ahead and go do this project", if the operator on the floor doesn't understand what you're talking about, because that person is in charge of operating that machine, it doesn't quite work like that. So you need to work bottom up as well, to convincing them that you are indeed actually solving their pain point. So the way we start, where rather than trying to tell them what capabilities we have as a product, or what we're trying to do, the first thing we ask is what is their pain point? "What's your problem? What is the problem "you're trying to solve?" Some customers say, "Well I've got yield, a lot of scrap. "Help me reduce my scrap. "Help me to operate my equipment better. "Help me predict these failure conditions "before it's too late." That's how the problem starts. Then we start inquiring them, "Okay, what kind of data "do you have, what kind of sensors do you have? "Typically, do you have information about under what circumstances you have seen failures "versus not seeing failures out there?" So in the process of inauguration we begin to understand how they might actually use our software and then we tell them, "Well, here, use your software, "our software, to predict that." And, sorry, I want 30 more seconds on that. The other thing is that, typically in an IT environment, because I came from that too, I've been in this position for 30 plus years, IT, UT and all of that, where we don't right away talk about CEP, or expressions, or analytics, and we don't talk about that. We talk about, look, you have these bunch of sensors, we have OT tools here, drag and drop your sensors, express the outcome that you're trying to look for, what is the outcome you're trying to look for, and then we drive behind the scenes what it means. Is it analytics, is it machine learning, is it something else, and what is it? So that's kind of how we approach the problem. Of course, if, sometimes you do surprisingly occasionally run into very technical people. From those people we can right away talk about, "Hey, you need these analytics, you need to use machinery, "you need to use expressions" and all of that. That's kind of how we operate. >> One thing, you know, that's becoming clearer is I think this widespread recognition that's data intensive and low latency work to be done near the edge. But what goes on in the Cloud is actually closer to simulation and high-performance compute, if you want to optimize a model. So not just train it, but maybe have something that's prescriptive that says, you know, here's the actionable information. As more of your data is video and audio, how do you turn that into something where you can simulate a model, that tells you the optimal answer? >> Right, so this is actually a good question. From our experience, there are models that require a lot of data, for example, video and audio. There are some other models that do not require a lot of data for training. I'll give you an example of what customer use cases that we have. There's one customer in a manufacturing domain, where they've been seeing a lot of finished goods failures, there's a lot of scrap and the problem then was, "Hey, predict the failures, "reduce my scrap, save the money", right? Because they've been seeing a lot of failures every single day, we did not need a lot of data to train and create a model to that. So, in fact, we just needed one hour's worth of data. We created a model, put the thing, we have reduced, completely eliminated their scrap. There are other kinds of models, other kinds of models of video, where we can't do that in the edge, so we're required for example, some video files or simulated audio files, take it to an offline model, create the model, and see whether it's accurately predicting based on the real-time video coming in or not. So it's a mix of what we're seeing between those two. >> Well Sastry, thank you so much for stopping by theCUBE and sharing what it is that you guys at FogHorn are doing, what you're hearing from customers, how you're working together with them to solve some of these pretty significant challenges. >> Absolutely, it's been a pleasure. Hopefully this was helpful, and yeah. >> Definitely, very educational. We want to thank you for watching theCUBE, I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV in downtown San Jose. Come stop by Forager Tasting Room, hang out with us, learn as much as we are about all the layers of big data digital transformation and the opportunities. Stick around, we will be back after a short break. (upbeat electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media down the street from the Strata Data Conference. what do you guys do, who are you? Obviously in the process, you know, the new business outcomes you could build on it, What's the FogHorn secret sauce that others Before I directly answer the question, if you don't mind, how constrained an environment you can operate in. but that's the kind of environment we're talking about. So that's the kind of size we're talking about. on the other thing you said, with, and refining the gas and all of that. the Cloud if you needed to do retraining? Import and bring the model back If the model is running ultimately on the device, These days, most of the PLCs, programmable controllers, if it doesn't have the connectivity USB stick, bring it to the PLC device and upload the model. we destroyed the Iranian centrifuges. but the devices have the ability to connect to the Cloud. you don't want the Cloud to reach the jet engine. but the Cloud cannot reach the jet engine. So Sastry, as a CTO you meet with customers often. they're going to look at you like, and that it doesn't have the analytics built in -- or the real-time, or the low footprint. So that when you subset, when you determine If you have a little bit of a local storage, So, you had also said something interesting So the way we start, where rather than trying that tells you the optimal answer? and the problem then was, "Hey, predict the failures, and sharing what it is that you guys at FogHorn are doing, Hopefully this was helpful, and yeah. We want to thank you for watching theCUBE,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

GeorgePERSON

0.99+

Lisa MartinPERSON

0.99+

Saudi ArabiaLOCATION

0.99+

Sastry MalladiPERSON

0.99+

MicrosoftORGANIZATION

0.99+

one hourQUANTITY

0.99+

SastryPERSON

0.99+

Silicon ValleyLOCATION

0.99+

GEORGANIZATION

0.99+

100 megabytesQUANTITY

0.99+

LisaPERSON

0.99+

Bill JoyPERSON

0.99+

twoQUANTITY

0.99+

FogHornORGANIZATION

0.99+

last weekDATE

0.99+

Mountain ViewLOCATION

0.99+

30 more secondsQUANTITY

0.99+

David FloorPERSON

0.99+

one questionQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

San JoseLOCATION

0.99+

30 plus yearsQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

three plus years agoDATE

0.99+

one customerQUANTITY

0.98+

oneQUANTITY

0.98+

secondQUANTITY

0.98+

C plus plusTITLE

0.98+

OneQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

150 megabytesQUANTITY

0.98+

two waysQUANTITY

0.97+

Strata Data ConferenceEVENT

0.97+

IranianOTHER

0.97+

five levelsQUANTITY

0.95+

millions of elevatorsQUANTITY

0.95+

about less than 100QUANTITY

0.95+

one partQUANTITY

0.94+

VelOTHER

0.94+

One thingQUANTITY

0.92+

dozens of machinery modelsQUANTITY

0.92+

eachQUANTITY

0.91+

IntelORGANIZATION

0.91+

FogHornPERSON

0.86+

2018DATE

0.85+

first thingQUANTITY

0.85+

single-coreQUANTITY

0.85+

NiFiORGANIZATION

0.82+

Power by the HourORGANIZATION

0.81+

about three years agoDATE

0.81+

Forager Tasting RORGANIZATION

0.8+

a tonQUANTITY

0.8+

CTOPERSON

0.79+

multibillion dollarQUANTITY

0.79+

DataEVENT

0.79+

Bell-HeadsORGANIZATION

0.78+

every single dayQUANTITY

0.76+

The CubeORGANIZATION

0.75+

CloudCOMMERCIAL_ITEM

0.73+

Dozens of machinery algorithmsQUANTITY

0.71+

PiCOMMERCIAL_ITEM

0.71+

petabytesQUANTITY

0.7+

raspberryORGANIZATION

0.69+

Big DataORGANIZATION

0.68+

CloudTITLE

0.67+

dual-coreQUANTITY

0.65+

SastryORGANIZATION

0.62+

NetORGANIZATION

0.61+

Daniel Raskin, Kinetica | Big Data SV 2018


 

>> Narrator: Live, from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners (mellow electronic music) >> Welcome back to theCUBE, on day two of our coverage of our event, Big Data SV. I'm Lisa Martin, my co-host is Peter Burris. We are the down the street from the Strata Data Conference, we've had a great day yesterday, and great morning already, really learning and peeling back the layers of big data, challenges, opportunities, next generation, we're welcoming back to theCUBE an alumni, the CMO of Kinetica, Dan Raskin. Hey Dan, welcome back to theCUBE. >> Thank you, thank you for having me. >> So, I'm a messaging girl, look at your website, the insight engine for the extreme data economy. Tell us about the extreme data economy, and what is that, what does it mean for your customers? >> Yeah, so it's a great question, and, from our perspective, we sit, we're here at Strata, and you see all the different vendors kind of talking about what's going on, and there's a little bit of word spaghetti out there that makes it really hard for customers to think about how big data is affecting them today, right? And so, what we're actually looking at is the idea of, the world's changed. That, big data from five years ago, doesn't necessarily address all the use cases today. If you think about what customers are going through, you have more users, devices, and things coming on, there's more data coming back than ever before, and it's not just about creating the data driven business, and building these massive data lakes that turn into data swamps, it's really about how do you create the data-powered business. So when we're using that term, we're really trying to call out that the world's changed, that, in order for businesses to compete in this new world, they have to think about to take data and create CoreIP that differentiates, how do I use it to affect the omnichannel, how do I use it to deal with new things in the realm of banking and Fintech, how do I use it to protect myself against disruption in telco, and so, the extreme data economy is really this idea that you have business in motion, more things coming online ever before, how do I create a data strategy, where data is infused in my business, and creates CoreIP that helps me maintain category leadership or grow. >> So as you think about that challenge, there's a number of technologies that come into play. Not least of which is the industry, while it's always to a degree been driven by what hardware can do, that's moderated a bit over time, but today, in many respects, a lot of what is possible is made possible, by what hardware can do, and what hardware's going to be able to do. We've been using similar AI algorithms for a long time. But we didn't have the power to use them! We had access to data, but we didn't have the power to acquire and bring it in. So how is the relationship between your software, and your platform, and some of the new hardware that's becoming available, starting to play out in a way of creating value for customers? >> Right, so, if you think about this in terms of this extreme data concept, and you think about it in terms of a couple of things, one, streaming data, just massive amounts of streaming data coming in. Billions of rows that people want to take and translate into value. >> And that data coming from-- >> It's coming from users, devices, things, interacting with all the different assets, more edge devices that are coming online, and the Wild West essentially. You look at the world of IoT and it's absolutely insane, with the number of protocols, and device data that's coming back to a company, and then you think about how do you actually translate this into real-time insight. Not near real-time, where it's taking seconds, but true millisecond response times where you can infuse this into your business, and one of our whole premises about Kinetica is the idea of this massive parallel compute. So the idea of not using CPUs anymore, to actually drive the powering behind your intelligence, but leveraging GPUs, and if you think about this, a CPU has 64 cores, 64 parallel things that you can do at a time, a GPU can have up to 6,000 cores, 6,000 parallel things, so it's kind of like lizard brain verse modern brain. How do you actually create this next generation brain that has all these neural networks, for processing the data, in a way that you couldn't. And then on top of that, you're using not just the technology of GPUs, you're trying to operationalize it. So how do you actually bring the data scientist, the BI folks, the business folks all together to actually create a unified operational process, and the underlying piece is the Kinetica engine and the GPU used to do this, but the power is really in the use cases of what you can do with it, and how you actually affect different industries. >> So can you elaborate a little bit more on the use cases, in this kind of game changing environment? >> Yeah, so there's a couple of common use cases that we're seeing, one that affects every enterprise is the idea of breaking down silos of business units, and creating the customer 360 view. How do I actually take all these disparate data feeds, bring them into an engine where I can visualize concepts about my customer and the environment that they're living in, and provide more insight? So if you think about things like Whole Foods and Amazon merging together, you now have this power of, how do I actually bridge the digital and physical world to create a better omnichannel experience for the user, how do I think about things in terms of what preferences they have, personalization, how to actually pair that with sensor data to affect how they actually navigate in a Whole Foods store more efficiently, and that's affecting every industry, you could take that to banking as well and think about the banking omminchannel, and ATMs, and the digital bank, and all these Fintech upstarts that are working to disrupt them. A great example for us is the United States Postal Service, where we're actually looking at all the data, the environmental data, around the US Postal Service, we're able to visualize it in real-time, we're able to affect the logistics of how they actually navigate through their routes, we're able to look things like postal workers separating out of their zones, and potentially kicking off alerts around that, so effectively making the business more efficient. But, we've moved into this world where we always used to talk about brick and mortar going to cloud, we're now in this world where the true value is how you bridge the digital and physical world, and create more transformative experiences, and that's what we want to do with data. So it could be logistics, it could be omnichannel, it could be security, you name it. It affects every single industry that we're talking about. >> So I got two questions, what is Kinetica's contribution to that, and then, very importantly, as a CMO, how are you thinking about making sure that the value that people are creating, or can create with Kinetica, gets more broadly diffused into an ecosystem. >> Yeah, so the power that we're bringing is the idea of how to operationalize this in a way where again, you're using your data to create value, so, having a single engine where you're collecting all of this data, massive volumes of data, terabytes upon terabytes of data, enabling it where you can query the data, with millisecond response times, and visualize it, with millisecond response times, run machine learning algorithms against it to augment it, you still have that human ability to look at massive sets of data, and do ad hoc discovery, but can run machining learning algorithms against that and complement it with machine learning. And then the operational piece of bringing the data scientists into the same platform that the business is using, so you don't have data recency issues, is a really powerful mix. The other piece I would just add is the whole piece around data discovery, you can't really call it big data if, in order to analyze the data, you have to downsize and downsample to look at a subset of data. It's all about looking at the entire set. So that's where we really bring value. >> So, to summarize very quickly, you are providing a platform that can run very, very fast, in a parallel system, and memories in these parallel systems, so that large amounts of data can be acted upon. >> That's right. >> Now, so, the next question is, there's not going to be a billion people that are going to use your tool to do things, how are you going to work with an ecosystem and partners to get the value that you're able to create with this data, out into the engine enterprise. >> It's a great question, and probably the biggest challenge that I have, which is, how do you get above the word spaghetti, and just get into education around this. And so I think the key is getting into examples, of how it's affecting the industry. So don't talk about the technology, and streaming from Kafka into a GPU-powered engine, talk about the impact to the business in terms of what it brings in terms of the omnichannel. You look at something like Japan in the 2020 Olympics, and you think about that in terms of telco, and how are the mobile providers going to be able to take all the data of what people are doing, and to related that to ad-tech, to relate that to customer insight, to relate that to new business models of how they could sell the data, that's the world of education we have to focus on, is talk about the transformative value it brings from the customer perspective, the outside-in as opposed to the inside-out. >> On that educational perspective, as a CMO, I'm sure you meet with a lot of customers, do you find that you might be in this role of trying to help bridge the gaps between different roles in an organization, where there's data silos, and there's probably still some territorial culture going on? What are you finding in terms of Kinetica's ability to really help educate and maybe bring more stakeholders, not just to the table, but kind of build a foundation of collaboration? >> Yeah, it's a really interesting question because I think it means, not just for Kinetica, but all vendors in the space, have to get out of their comfort zone, and just stop talking speeds and feeds and scale, and in fact, when we were looking at how to tell our story, we did an analysis of where most companies were talking, and they were focusing a lot more on the technical aspirations that developers sell, which is important, you still need to court the developer, you have community products that they can download, and kick the tires with, but we need to extend our dialogue, get out of our customer comfort zone, and start talking more to CIOs, CTOs, CDOs, and that's just reaching out to different avenues of communication, different ways of engaging. And so, I think that's kind of a core piece that I'm taking away from Strata, is we do a wonderful job of speaking to developers, we all need to get out of our comfort zone and talk to a broader set of folks, so business folks. >> Right, 'cause that opens up so many new potential products, new revenue streams, on the marketing side being able to really target your customer base audience, with relevant, timely offers, to be able to be more connected. >> Yeah, the worst scenario is talking to an enterprise around the wonders of a technology that they're super excited about, but they don't know the use case that they're trying to solve, start with the use case they're trying to solve, start with thinking about how this could affect their position in the market, and work on that, in partnership. We have to do that in collaboration with the customers. We can't just do that alone, it's about building a partnership and learning together around how you use data in a different way. >> So as you imagine, the investments that Kinetica is going to make over the next few years, with partners, with customers, what do you hope Kinetica will be in 2020? >> So, we want it to be that transformative engine for enterprises, we think we are delivering something that's quite unique in the world, and, you want to see this on a global basis, affecting our customer's value. I almost want to take us out of the story, and if I'm successful, you're going to hear wonderful enterprise companies across telco, banking, and other areas just telling their story, and we happen to be the engine behind it. >> So you're an ingredient in their success. >> Yes, a core ingredient in their success. >> So if we think about over the course of the next technology, set of technology waves, are they any particular applications that you think you're going to be stronger in? So I'll give you an example, do you envision that Kinetica can have a major play in how automation happens inside infrastructure, or how developers start seeing patterns in data, imagine how those assets get created. Where are some of the kind of practical, but not really, or rarely talked about applications that you might find yourselves becoming more of an ingredient because they themselves become ingredients to some of these other big use cases? >> There are a lot of commonalities that we're starting to see, and the interesting piece is the architecture that you implement tends to be the same, but the context of how you talk about it, and the impact it has tends to be different, so, I already mentioned the customer 360 view? First and foremost, break down silos across your organization, figure out how do you get your data into one place where you can run queries against it, you can visualize it, you can do machine learning analysis, that's a foundational element, and, I have a company in Asia called Lippo that is doing that in their space, where all of the sudden they're starting to glean things they didn't know about their customer before to create, doing that ad hoc discovery, so that's one area. The other piece is this use case of how do you actually operationalize data scientists, and machine learning, into your core business? So, that's another area that we focus on. There are simple entry points, things like Tableau Acceleration, where you put us underneath the existing BI infrastructure, and all of the sudden, you're a hundred times faster, and now your business folks can sit at the table, and make real-time business decisions, where in the past, if they clicked on certain things, they'd have to wait to get those results. Geospatial visualization's a no-brainer, the idea of taking environmental data, pairing it with your customer data, for example, and now learning about interactions. And I'd say the other piece is more innovation driven, where we would love sit down with different innovation groups in different verticals and talk with them about, how are you looking to monetize your data in the future, what are the new business models, how does things like voice interaction affect your data strategy, what are the different ways you want to engage with your data, so there's a lot of different realms we can go to. >> One of the things you said as we wrap up here, that I couldn't agree with more, is, the best value articulation I think a brand can have, period, is through the voice of their customer. And being able to be, and I think that's one of the things that Paul said yesterday is, defining Kinetica's success based on the success of your customers across industry, and I think really doesn't get more objective than a customer who has, not just from a developer perspective, maybe improved productivity, or workforce productivity, but actually moved the business forward, to a point where you're maybe bridging the gaps between the digital and physical, and actually enabling that business to be more profitable, open up new revenue streams because this foundation of collaboration has been established. >> I think that's a great way to think about it-- >> Which is good, 'cause he's your CEO. >> (laughs) Yes, that sustains my job. But the other piece is, I almost get embarrassed talking about Kinetica, I don't want to be the car salesman, or the vacuum salesman, that sprinkles dirt on the floor and then vacuums it up, I'd rather us kind of fade to the behind the scenes power where our customers are out there telling wonderful stories that have an impact on how people live in this world. To me, that's the best marketing you can do, is real stories, real value. >> Couldn't agree more. Well Dan, thanks so much for stopping by, sharing what things that Kinetica is doing, some of the things you're hearing, and how you're working to really build this foundation of collaboration and enablement within your customers across industries. We look forward to hearing the kind of cool stuff that happens with Kinetica, throughout the rest of the year, and again, thanks for stopping by and sharing your insights. >> Thank you for having me. >> I want to thank you for watching theCUBE, I'm Lisa Martin with my co-host Peter Burris, we are at Big Data SV, our second day of coverage, at a cool place called the Forager Tasting Room, in downtown San Jose, stop by, check us out, and have a chance to talk with some of our amazing analysts on all things big data. Stick around though, we'll be right back with our next guest after a short break. (mellow electronic music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We are the down the street from the Strata Data Conference, and what is that, what does it mean for your customers? and it's not just about creating the data driven business, So how is the relationship between your software, if you think about this in terms of this is really in the use cases of what you can do with it, and the digital bank, and all these Fintech upstarts making sure that the value that people are creating, is the idea of how to operationalize this in a way you are providing a platform that are going to use your tool to do things, and how are the mobile providers going to be able and kick the tires with, but we need to extend our dialogue, on the marketing side being able to really target We have to do that in collaboration with the customers. the engine behind it. that you think you're going to be stronger in? and the impact it has tends to be different, so, One of the things you said as we wrap up here, To me, that's the best marketing you can do, some of the things you're hearing, and have a chance to talk with some of our amazing analysts

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Peter BurrisPERSON

0.99+

Lisa MartinPERSON

0.99+

PaulPERSON

0.99+

AmazonORGANIZATION

0.99+

Dan RaskinPERSON

0.99+

Whole FoodsORGANIZATION

0.99+

Daniel RaskinPERSON

0.99+

64 coresQUANTITY

0.99+

AsiaLOCATION

0.99+

DanPERSON

0.99+

2020DATE

0.99+

San JoseLOCATION

0.99+

two questionsQUANTITY

0.99+

KineticaORGANIZATION

0.99+

LippoORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

second dayQUANTITY

0.99+

yesterdayDATE

0.99+

6,000 parallelQUANTITY

0.99+

64 parallelQUANTITY

0.99+

2020 OlympicsEVENT

0.99+

Strata Data ConferenceEVENT

0.99+

telcoORGANIZATION

0.98+

theCUBEORGANIZATION

0.98+

oneQUANTITY

0.98+

single engineQUANTITY

0.97+

FirstQUANTITY

0.97+

Wild WestLOCATION

0.97+

todayDATE

0.97+

five years agoDATE

0.96+

Big Data SVORGANIZATION

0.96+

one areaQUANTITY

0.95+

StrataORGANIZATION

0.95+

United States Postal ServiceORGANIZATION

0.94+

day twoQUANTITY

0.93+

Narrator: LiveTITLE

0.93+

OneQUANTITY

0.93+

one placeQUANTITY

0.9+

FintechORGANIZATION

0.88+

up to 6,000 coresQUANTITY

0.88+

yearsDATE

0.88+

US Postal ServiceORGANIZATION

0.88+

Billions of rowsQUANTITY

0.87+

terabytesQUANTITY

0.85+

JapanLOCATION

0.82+

hundred timesQUANTITY

0.82+

terabytes of dataQUANTITY

0.81+

StrataTITLE

0.8+

Tableau AccelerationTITLE

0.78+

single industryQUANTITY

0.78+

CoreIPTITLE

0.76+

360 viewQUANTITY

0.75+

Silicon ValleyLOCATION

0.73+

billion peopleQUANTITY

0.73+

2018DATE

0.73+

Data SVEVENT

0.72+

KineticaCOMMERCIAL_ITEM

0.72+

Forager Tasting RoomORGANIZATION

0.68+

BigEVENT

0.67+

millisecondQUANTITY

0.66+

KafkaPERSON

0.6+

Big DataORGANIZATION

0.59+

Data SVORGANIZATION

0.58+

big dataORGANIZATION

0.56+

nextDATE

0.55+

lotQUANTITY

0.54+

BigORGANIZATION

0.47+

Seth Dobrin, IBM | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and it's ecosystem partners. >> Welcome back to theCUBE's continuing coverage of our own event, Big Data SV. I'm Lisa Martin, with my cohost Dave Vellante. We're in downtown San Jose at this really cool place, Forager Eatery. Come by, check us out. We're here tomorrow as well. We're joined by, next, one of our CUBE alumni, Seth Dobrin, the Vice President and Chief Data Officer at IBM Analytics. Hey, Seth, welcome back to theCUBE. >> Hey, thanks for having again. Always fun being with you guys. >> Good to see you, Seth. >> Good to see you. >> Yeah, so last time you were chatting with Dave and company was about in the fall at the Chief Data Officers Summit. What's kind of new with you in IBM Analytics since then? >> Yeah, so the Chief Data Officers Summit, I was talking with one of the data governance people from TD Bank and we spent a lot of time talking about governance. Still doing a lot with governance, especially with GDPR coming up. But really started to ramp up my team to focus on data science, machine learning. How do you do data science in the enterprise? How is it different from doing a Kaggle competition, or someone getting their PhD or Masters in Data Science? >> Just quickly, who is your team composed of in IBM Analytics? >> So IBM Analytics represents, think of it as our software umbrella, so it's everything that's not pure cloud or Watson or services. So it's all of our software franchise. >> But in terms of roles and responsibilities, data scientists, analysts. What's the mixture of-- >> Yeah. So on my team I have a small group of people that do governance, and so they're really managing our GDPR readiness inside of IBM in our business unit. And then the rest of my team is really focused on this data science space. And so this is set up from the perspective of we have machine-learning engineers, we have predictive-analytics engineers, we have data engineers, and we have data journalists. And that's really focus on helping IBM and other companies do data science in the enterprise. >> So what's the dynamic amongst those roles that you just mentioned? Is it really a team sport? I mean, initially it was the data science on a pedestal. Have you been able to attack that problem? >> So I know a total of two people that can do that all themselves. So I think it absolutely is a team sport. And it really takes a data engineer or someone with deep expertise in there, that also understands machine-learning, to really build out the data assets, engineer the features appropriately, provide access to the model, and ultimately to what you're going to deploy, right? Because the way you do it as a research project or an activity is different than using it in real life, right? And so you need to make sure the data pipes are there. And when I look for people, I actually look for a differentiation between machine-learning engineers and optimization. I don't even post for data scientists because then you get a lot of data scientists, right? People who aren't really data scientists, and so if you're specific and ask for machine-learning engineers or decision optimization, OR-type people, you really get a whole different crowd in. But the interplay is really important because most machine-learning use cases you want to be able to give information about what you should do next. What's the next best action? And to do that, you need decision optimization. >> So in the early days of when we, I mean, data science has been around forever, right? We always hear that. But in the, sort of, more modern use of the term, you never heard much about machine learning. It was more like stats, math, some programming, data hacking, creativity. And then now, machine learning sounds fundamental. Is that a new skillset that the data scientists had to learn? Did they get them from other parts of the organization? >> I mean, when we talk about math and stats, what we call machine learning today has been what we've been doing since the first statistics for years, right? I mean, a lot of the same things we apply in what we call machine learning today I did during my PhD 20 years ago, right? It was just with a different perspective. And you applied those types of, they were more static, right? So I would build a model to predict something, and it was only for that. It really didn't apply it beyond, so it was very static. Now, when we're talking about machine learning, I want to understand Dave, right? And I want to be able to predict Dave's behavior in the future, and learn how you're changing your behavior over time, right? So one of the things that a lot of people don't realize, especially senior executives, is that machine learning creates a self-fulfilling prophecy. You're going to drive a behavior so your data is going to change, right? So your model needs to change. And so that's really the difference between what you think of as stats and what we think of as machine learning today. So what we were looking for years ago is all the same we just described it a little differently. >> So how fine is the line between a statistician and a data scientist? >> I think any good statistician can really become a data scientist. There's some issues around data engineering and things like that but if it's a team sport, I think any really good, pure mathematician or statistician could certainly become a data scientist. Or machine-learning engineer. Sorry. >> I'm interested in it from a skillset standpoint. You were saying how you're advertising to bring on these roles. I was at the Women in Data Science Conference with theCUBE just a couple of days ago, and we hear so much excitement about the role of data scientists. It's so horizontal. People have the opportunity to make impact in policy change, healthcare, etc. So the hard skills, the soft skills, mathematician, what are some of the other elements that you would look for or that companies, enterprises that need to learn how to embrace data science, should look for? Someone that's not just a mathematician but someone that has communication skills, collaboration, empathy, what are some of those, openness, to not lead data down a certain, what do you see as the right mix there of a data scientist? >> Yeah, so I think that's a really good point, right? It's not just the hard skills. When my team goes out, because part of what we do is we go out and sit with clients and teach them our philosophy on how you should integrate data science in the enterprise. A good part of that is sitting down and understanding the use case. And working with people to tease out, how do you get to this ultimate use case because any problem worth solving is not one model, any use case is not one model, it's many models. How do you work with the people in the business to understand, okay, what's the most important thing for us to deliver first? And it's almost a negotiation, right? Talking them back. Okay, we can't solve the whole problem. We need to break it down in discreet pieces. Even when we break it down into discreet pieces, there's going to be a series of sprints to deliver that. Right? And so having these soft skills to be able to tease that in a way, and really help people understand that their way of thinking about this may or may not be right. And doing that in a way that's not offensive. And there's a lot of really smart people that can say that, but they can come across at being offensive, so those soft skills are really important. >> I'm going to talk about GDPR in the time we have remaining. We talked about in the past, the clocks ticking, May the fines go into effect. The relationship between data science, machine learning, GDPR, is it going to help us solve this problem? This is a nightmare for people. And many organizations aren't ready. Your thoughts. >> Yeah, so I think there's some aspects that we've talked about before. How important it's going to be to apply machine learning to your data to get ready for GDPR. But I think there's some aspects that we haven't talked about before here, and that's around what impact does GDPR have on being able to do data science, and being able to implement data science. So one of the aspects of the GDPR is this concept of consent, right? So it really requires consent to be understandable and very explicit. And it allows people to be able to retract that consent at any time. And so what does that mean when you build a model that's trained on someone's data? If you haven't anonymized it properly, do I have to rebuild the model without their data? And then it also brings up some points around explainability. So you need to be able to explain your decision, how you used analytics, how you got to that decision, to someone if they request it. To an auditor if they request it. Traditional machine learning, that's not too much of a problem. You can look at the features and say these features, this contributed 20%, this contributed 50%. But as you get into things like deep learning, this concept of explainable or XAI becomes really, really important. And there were some talks earlier today at Strata about how you apply machine learning, traditional machine learning to interpret your deep learning or black box AI. So that's really going to be important, those two things, in terms of how they effect data science. >> Well, you mentioned the black box. I mean, do you think we'll ever resolve the black box challenge? Or is it really that people are just going to be comfortable that what happens inside the box, how you got to that decision is okay? >> So I'm inherently both cynical and optimistic. (chuckles) But I think there's a lot of things we looked at five years ago and we said there's no way we'll ever be able to do them that we can do today. And so while I don't know how we're going to get to be able to explain this black box as a XAI, I'm fairly confident that in five years, this won't even be a conversation anymore. >> Yeah, I kind of agree. I mean, somebody said to me the other day, well, it's really hard to explain how you know it's a dog. >> Seth: Right (chuckles). But you know it's a dog. >> But you know it's a dog. And so, we'll get over this. >> Yeah. >> I love that you just brought up dogs as we're ending. That's my favorite thing in the world, thank you. Yes, you knew that. Well, Seth, I wish we had more time, and thanks so much for stopping by theCUBE and sharing some of your insights. Look forward to the next update in the next few months from you. >> Yeah, thanks for having me. Good seeing you again. >> Pleasure. >> Nice meeting you. >> Likewise. We want to thank you for watching theCUBE live from our event Big Data SV down the street from the Strata Data Conference. I'm Lisa Martin, for Dave Vellante. Thanks for watching, stick around, we'll be rick back after a short break.

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media Welcome back to theCUBE's continuing coverage Always fun being with you guys. Yeah, so last time you were chatting But really started to ramp up my team So it's all of our software franchise. What's the mixture of-- and other companies do data science in the enterprise. that you just mentioned? And to do that, you need decision optimization. So in the early days of when we, And so that's really the difference I think any good statistician People have the opportunity to make impact there's going to be a series of sprints to deliver that. in the time we have remaining. And so what does that mean when you build a model Or is it really that people are just going to be comfortable ever be able to do them that we can do today. I mean, somebody said to me the other day, But you know it's a dog. But you know it's a dog. I love that you just brought up dogs as we're ending. Good seeing you again. We want to thank you for watching theCUBE

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Lisa MartinPERSON

0.99+

SethPERSON

0.99+

DavePERSON

0.99+

IBMORGANIZATION

0.99+

Seth DobrinPERSON

0.99+

20%QUANTITY

0.99+

50%QUANTITY

0.99+

TD BankORGANIZATION

0.99+

San JoseLOCATION

0.99+

two peopleQUANTITY

0.99+

tomorrowDATE

0.99+

IBM AnalyticsORGANIZATION

0.99+

two thingsQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

one modelQUANTITY

0.99+

five yearsQUANTITY

0.98+

20 years agoDATE

0.98+

Big Data SVEVENT

0.98+

five years agoDATE

0.98+

GDPRTITLE

0.98+

theCUBEORGANIZATION

0.98+

oneQUANTITY

0.98+

Strata Data ConferenceEVENT

0.97+

todayDATE

0.97+

first statisticsQUANTITY

0.95+

CUBEORGANIZATION

0.94+

Women in Data Science ConferenceEVENT

0.94+

bothQUANTITY

0.94+

Chief Data Officers SummitEVENT

0.93+

Big Data SV 2018EVENT

0.93+

couple of days agoDATE

0.93+

yearsDATE

0.9+

Forager EateryORGANIZATION

0.9+

firstQUANTITY

0.86+

WatsonTITLE

0.86+

Officers SummitEVENT

0.74+

Data OfficerPERSON

0.73+

SVEVENT

0.71+

PresidentPERSON

0.68+

StrataTITLE

0.67+

Big DataORGANIZATION

0.66+

earlier todayDATE

0.65+

Silicon ValleyLOCATION

0.64+

yearsQUANTITY

0.6+

ChiefEVENT

0.44+

KaggleORGANIZATION

0.43+

Maribel Lopez, Lopez Research | Big Data SV 2018


 

>> Narrator: Live, from San Jose. It's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconAngle Media, and its ecosystem partners. >> Welcome come back to theCUBE, we are live in San Jose, at our event, Big Data SV. I'm Lisa Martin. And we are down the street from the Strata Data Conference. We've had a great day so far, talking with a lot of folks from different companies that are all involved in the big data unraveling process. I'm excited to welcome back to theCUBE one of our extinguished alumni, Maribel Lopez; the founder and principal analyst at Lopez research. Welcome back to theCUBE. >> Thank you. I'm excited to be here. >> Yeah, so you've been, a startup conference started a couple days ago. What are some the trends and things that you're hearing that are really kind of top of mind for not just the customers that are attending, the companies that are creating or are trying to create solutions around this big data challenge and opportunity? >> Yeah absolutely, I mean I think we talked a lot about data in the years past. How do you gather the data? How do you store the data? How you might want to process the data? This year seems to be all about how do I make something interesting happen with the data? How do I make an intelligent inside? How do I cure prostate cancer? How do I make sure I can classify images? It's a really different show, and we've also changed some of the terminology a lot more in machine learning now, and artificial intelligence, and frankly a lot of discussion around ethics. So it's been very interesting. >> Data ethics you mean? >> Data ethics; how do we do privacy? How do we maintain the right level of data so that we don't have bias in our data? How do we get Diversity Inclusion going? Lots really interesting powerful human topics, not just about the data. >> I love that the human topics especially where you know AI and ML come into play. You talked, data diversity. Or bias that we were just at that women and data science conference a couple of days ago talking to a lot of female leaders in in data science, computer science, both in academia as well as in industry. And one of the interesting topics about the gender disparity, is the fact that that is limiting the analyses on data in terms of, there may be a few perspectives looking on it. So there's an inherent bias there. So that's one issue, and I'd like to get your thoughts on that. Another is with that thought, lack of thought diversity, I guess I would say going into analyzing the data, companies might be potentially limiting themselves on the types of products that they can create, how to monetize the data and actually drive new revenue streams. On the kind of thought diversity will start there. What are some of the things that you're hearing, and what are some of your recommendations for your clients on how to get some of that bias out of data analysis? >> Yes it's interesting. One is trying to find multiple sources of data. So there's data that you have and that you own. But there is a wide range of openly available data now. There's some challenges around making sure that that data is clean before you integrated with your data. But basically, diversifying your data sources with third party data is one big thing that we're talking about. In previous analytical generations, I think we talked a lot about how to have a hypothesis, and you were trying to prove a hypothesis. And now I think we're trying to be a little more open and looser, and not really lead the data where per se, but try to find the right patterns and correlations in the data. And then just awareness in general. Like we don't believe we're biased. But if we have data that's biased who gets put into the system. So we have to really be thoughtful about what we put into the system. So I think that those three things combined have really changed the way people are looking at it. And there's a lot of awareness now around that. Because we assume at some point, the machines might be making certain decisions for us. And we want to make sure that they have the best information to do that. And that they don't limit our opportunities as a society. >> Where are companies in terms of the clients that you see, culturally in terms of embracing the openness? 'Cause you're right! From a scientific scientific method perspective. People go into, I'm going to hypothesize this because I think I'm going to find this. And maybe wanting the data to say this. Where are companies, we'll say enterprises, in becoming culturally more open to not leading the data somewhere and bringing up bias? >> Well, there are two interesting things here, right? I think there are some people that have gone down the data route for a while now, sort of the industry leading companies. They're in this mindset now trying to make sure they don't leave the data, they don't create biases in the data. They have ways to explain how the data and the analysis of the learning came about, not just for regulation, but so that they can make sure they ethically done the right thing. But then I think there's the other 95 percent of companies that they're not even there yet. They don't know that this is a problem yet. So they're still dealing with the "I've got a pool in the data." "I've got to do something with it." They don't even know what they want to do with it let alone if it's biased or not. So we're not quite at the leading the witness point there with a lot of organizations. >> But that's something that you expect to see maybe down the road. >> I'm hoping we'll get ahead of it. I'm really hoping that we'll get ahead of it. >> It's a good positive outlook on it, yeah? >> I think that, I think because the real analysis of the data problem in a big machine learning, deep learning way is so new, and the people are actually out seeking guidance, that there is an opportunity to get ahead of it. The second thing that's happening is, people don't have data scientists, right? So they don't necessarily have the people that can code this. So what they're doing now, is they're depending on the vendor landscape to provide them with an entry level set of tools. So if you're Microsoft, if you're Google, if you're Amazon, you're trying very hard to make sure that you're giving tools that have the right ethics in them, and that can help kickstart people's Machine Learning efforts. So I think that's going to be a real win for us. And we talked a lot today at the Strata conference about how, oh you don't have enough images, you can't do that. Or you don't have enough data, you can't do that. Or you don't have enough data scientists. And some of what came back is that, some of the best and the brightest have coded some things that you can start to use to kickstart that will get you to a better place than you ever could have started with yourself. So that was pretty exciting, you know. Transfer learning as an example of taking you know, image node from Google and some algorithms, and using those to take your images and try to figure out if somebody has Alzheimer's or not. Encode things Alzheimer's or not characteristic. So, very cool stuff, very exciting and nice to see that we've got some minds working on this for us. >> Yeah, definitely. Where you're meeting with clients that don't have a data scientist, or chief analytics officer? Sounds like a lot of the technologies need to or some have built in sort of enablement for a difference data citizen within a company. If you talking to clients that don't have a data scientist or data science team, who are your constituents there? Where are companies that don't maybe have that skill gap? Who do they go to in their organization to start evaluating the data that they have to get to know what and start to understand what their potential is? >> Yeah, there's a couple of places people go. They go to their business decision analytics people. So the people that were working with their BI dashboards, for example. The second place they go is to the cloud computing guys, cuz we're hearing a lot about cloud computing and maybe I can buy some of the stuff from the cloud. I'm just going to roll up and get all my machine learning in the cloud, right? So we're not there yet. So the biggest thing that I talk to people about right now is, what are the realities around Machine Learning and AI? We've made tremendous progress but you know you read the newspaper, and something is going to get rid of your job, and AI's going to take over the world, and we're kind of far from that reality. First of all it's very dystopian and negative. But even if it weren't that, you know what you can do today, is not that. So there's a lot of stages in between. So the first thing is just trying to get people comfortable with. No you can't just buy one product, and throw in some data, and you've got everything you need. >> Right. >> We're not there yet. But we're getting closer. You can add some components, you can get some new information, you could do some new correlations. So just getting a reality and grounding of where we are, and that we have a lot of opportunity, and that it's moving very fast. that's the other thing. >> Right. >> IT leaders are used to all evaluated once a year, evaluated once every couple of years. These things are moving in monthly increments. Like really huge changes in product categories. So you kind of have to keep on top of it to make sure you know what's available to you. >> Right. And if they don't they miss out on not only the ability to monetize data streams, but essentially going out of business. Because somebody will come in may be more nimble and agile, and be able to do it faster. >> Yeah. And we already saw those with the digital native companies that started born in the cloud companies, we used to call them. Well, now, everybody can be using the cloud. So the question then is like what's the next wave of that? The next wave of that is around understanding how to use your data, understanding how to get third-party data, and being able to rapidly make decisions and change models based on that. >> One of the things that's interesting about big data is you know it was a big buzzword, and it seems to be becoming less of a buzzword now. Gartner even was saying I think the number was 85 percent of big data projects and I think that's more in tested environments fail. And I often say, "Failure in a lot of cases is not a bad effort." Because it spawns genesis of new products, new ideas, et cetera. But when you're talking with clients who go, alright, we've embraced Hadoop, we've got this big data lake, now it's turning really swampy. We don't know-- >> We've got lakes, we've got oceans, we've got ponds. Yeah. >> Right. What's the conversation there where you're helping a customer clean that swamp up, get broader visibility across their datasets and enable different lines of business. Not just you know, the BI folks or the cloud folks or IT. But marketing, logistics, sales. What's that conversation like to clean up the swamp and do more enablement for visibility? >> I think one of the things that we got really hung up on was, you know, creating a data ocean, right? We're going to bring everything all in one place, it's going to be this one massive data source. >> It sounded great. >> It's going to be awesome. And this is not the reality of the world, right? So I think the first thing in the cleaning up that we have to do, is being able to figure out what's the source of truth for any given dataset that somebody needs. So you see 15 salespeople walk in and they all have different versions of the data that shouldn't happen. >> Right. >> So we need to get to the point where they know where the source of truth is for that data. The second is sort of governance around the data. We spent a lot of time dumping the data but not a lot of time in terms of getting governance around who can access it, what they can do with it, for how long they could have access to it. Is it just internal? Is it internal and external? So I think that's the second thing around like harassing and haranguing the swamps, and the lakes and the ponds, right? And then assuming that you do that, I think the other thing is, You know, if you have a hammer everything looks like a nail. Well, in reality you know when you construct things you have nails, you have screws, you have bolts, right? And picking the right tool for the job is something that the IT leadership has to work with. And the only way that they get that right is to work very closely with the different lines of business so they can understand the problem. Because the business leader knows the problem, they don't know the solution. If you put them together which we've talked about forever, frankly. But now I think we're seeing more imperatives for those two to work closely together. And sometimes it's even driven by security, just to make sure that the data isn't leaking into other places or that it's secure and that they've met regulatory compliance. So we're in a much better space than we were two, three, five years ago cuz we're thinking about the real problems now. Not just how do you collect it, and how do you store it. But how do we actually make it an actionable manageable set of solutions. >> Exactly, and make it work for the business. Well Maribel, I wish we had more time, but thank you so much for stopping by theCUBE, sharing the insights that you've seen. Not just at a conference, but also with your clients. >> Thank you. >> We want to thank you for watching theCUBE. Again, I'm Lisa Martin, live from Big Data SV, in Downtown San Jose. Get involved in the conversation #BigDataSV. Come see us at the Forager Eatery & Tasting Room, and I'll be right back with our next guest. (upbeat music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconAngle Media, that are all involved in the big data unraveling process. I'm excited to be here. just the customers that are attending, a lot about data in the years past. so that we don't have bias in our data? and I'd like to get your thoughts on that. and looser, and not really lead the data where per se, that you see, culturally in terms of embracing the openness? and the analysis of the learning came about, But that's something that you expect to see I'm really hoping that we'll get ahead of it. and the brightest have coded some things that they have to get to know and maybe I can buy some of the stuff from the cloud. and that we have a lot of opportunity, to make sure you know and be able to do it faster. that started born in the cloud companies, and it seems to be becoming less of a buzzword now. we've got oceans, we've got ponds. What's that conversation like to clean up the swamp that we got really hung up on was, you know, So you see 15 salespeople walk in and they all have is something that the IT leadership has to work with. sharing the insights that you've seen. and I'll be right back with our next guest.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

MaribelPERSON

0.99+

AmazonORGANIZATION

0.99+

Maribel LopezPERSON

0.99+

San JoseLOCATION

0.99+

GoogleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

15 salespeopleQUANTITY

0.99+

SiliconAngle MediaORGANIZATION

0.99+

85 percentQUANTITY

0.99+

95 percentQUANTITY

0.99+

GartnerORGANIZATION

0.99+

one issueQUANTITY

0.99+

twoQUANTITY

0.99+

todayDATE

0.99+

oneQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

bothQUANTITY

0.98+

Strata Data ConferenceEVENT

0.98+

Big Data SVORGANIZATION

0.98+

second thingQUANTITY

0.98+

one productQUANTITY

0.98+

first thingQUANTITY

0.98+

three thingsQUANTITY

0.97+

once a yearQUANTITY

0.97+

secondQUANTITY

0.96+

This yearDATE

0.96+

OneQUANTITY

0.96+

FirstQUANTITY

0.96+

theCUBEORGANIZATION

0.96+

Downtown San JoseLOCATION

0.96+

StrataEVENT

0.94+

two interesting thingsQUANTITY

0.94+

five years agoDATE

0.94+

Big DataORGANIZATION

0.9+

couple days agoDATE

0.87+

couple of days agoDATE

0.85+

onceQUANTITY

0.78+

#BigDataSVORGANIZATION

0.75+

one placeQUANTITY

0.75+

second placeQUANTITY

0.75+

every couple of yearsQUANTITY

0.75+

ForagerLOCATION

0.7+

DataORGANIZATION

0.69+

Narrator: LiveTITLE

0.69+

waveEVENT

0.68+

years pastDATE

0.66+

threeQUANTITY

0.66+

AlzheimerOTHER

0.66+

BigEVENT

0.65+

HadoopTITLE

0.64+

Big Data SVEVENT

0.59+

Eatery & Tasting RoomORGANIZATION

0.57+

Lopez ResearchORGANIZATION

0.55+

SV 2018EVENT

0.54+

thingQUANTITY

0.53+

LopezORGANIZATION

0.49+

Kunal Agarwal, Unravel Data | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's theCube! Presenting Big Data: Silicon Valley Brought to you by SiliconANGLE Media and its ecosystem partners. (techno music) >> Welcome back to theCube. We are live on our first day of coverage at our event BigDataSV. I am Lisa Martin with my co-host George Gilbert. We are at this really cool venue in downtown San Jose. We invite you to come by today, tonight for our cocktail party. It's called Forager Tasting Room and Eatery. Tasty stuff, really, really good. We are down the street from the Strata Data Conference, and we're excited to welcome to theCube a first-time guest, Kunal Agarwal, the CEO of Unravel Data. Kunal, welcome to theCube. >> Thank you so much for having me. >> So, I'm a marketing girl. I love the name Unravel Data. (Kunal laughs) >> Thank you. >> Two year old company. Tell us a bit about what you guys do and why that name... What's the implication there with respect to big data? >> Yeah, we are a application performance management company. And big data applications are just very complex. And the name Unravel is all about unraveling the mysteries of big data and understanding why things are not performing well and not really needing a PhD to do so. We're simplifying application performance management for the big data stack. >> Lisa: Excellent. >> So, so, um, you know, one of the things that a lot of people are talking about with Hadoop, originally it was this cauldron of innovation. Because we had the "let a thousand flowers bloom" in terms of all the Apache projects. But then once we tried to get it into operation, we discovered there's a... >> Kunal: There's a lot of problems. (Kunal laughs) >> There's an overhead, there's a downside to it. >> Maybe tell us, tell us why you both need to know, you need to know how people have done this many, many times. >> Yeah. >> How you need to learn from experience and then how you can apply that even in an environment where someone hasn't been doing it for that long. >> Right. So, if I back a little bit. Big data is powerful, right? It's giving companies an advantage that they never had, and data's an asset to all of these different companies. Now they're running everything from BI, machine learning, artificial intelligence, IOT, streaming applications on top of it for various reasons. Maybe it is to create a new product to understand the customers better, etc., But as you rightly pointed out, when you start to implement all of these different applications and jobs, it's very, very hard. It's because big data is very complex. With that great power comes a lot of complexity, and what we started to see is a lot of companies, while they want to create these applications and provide that differentiation to their company, they just don't have enough expertise as well in house to go and write good applications, maintain these applications, and even manage the underlying infrastructure and cluster that all these applications are running on. So we took it upon ourselves where we thought, Hey, if we simplify application performance management and if we simplify ongoing management challenges, then these companies would run more big data applications, they would be able to expand their use cases, and not really be fearful of, Hey, we don't know how to go and solve these problems. Do we actually rely on our system that is so complex and new? And that's the gap the Unravel fills, which is we monitor and manage not only one componenent of the big data ecosystem, but like you pointed out, it's a, it's a full zoo of all of these systems. You have Hadoop, and you have Spark, and you have Kafka for data injection. You may have some NoSQL systems and newer MPP platforms as well. So the vision of Unravel is really to be that one place where you can come in and understand what's happening with your applications and your system overall and be able to resolve those problems in an automatic, simple way. >> So, all right, let's start at the concrete level of what a developer might get out of >> Kunal: Right. >> something that's wrapped in Unravel and then tell us what the administrator experiences. >> Kunal: Absolutely. So if you are a big data developer you've got in a business requirement that, Hey, go and make this application that understands our customers better, right? They may choose a tool of their liking, maybe Hive, maybe Spark, maybe Kafka for data injection. And what they'll do is they'll write an app first in dev, in their dev environment or the QA environment. And they'll say, Hey, maybe this application is failing, or maybe this application is not performing as fast as I want it to, or even worse that this application is starting to hog a lot of resources, which may slow down my other applications. Now to understand what's causing these kind of problems today developers really need a PhD to go and decipher them. They have to look at tons of law rogs, uh, raw logs metrics, configuration settings and then try to stitch the story up in their head, trying to figure out what is the effect, what is the cause? Maybe it's this problem, maybe it's some other problem. And then do trial and error to try, you know to solving that particular issue. Now what we've seen is big data developers come in variety of flavors. You have the hardcore developers who truly understand Spark and Hadoop and everything, but then 80% of the people submitting these applications are data scientist or business analysts, who may understand SQL, who may know Python, but don't necessarily know what distributed computing and parallel processing and all of these things really are, and where can inefficiencies and problems really lie. So we give them this one view, which will connect all of these different data sources and then tell them in plain English, this is the problem, this is why this problem happened, and this is how you can go and resolve it, thereby getting them unstuck and making it very simple for them to go in and get the performance that they're getting. >> So, these, these, um, they're the developers up front and you're giving them a whole new, sort of, toolchain or environment to solve the operational issues. >> Kunal: Right. >> So that the, if it's DevOps, its really dev is much more sufficient. >> Yes, yes, I mean, all companies want to run fast. They don't want to be slowed down. If you have a problem today, they'll file a ticket, it'll go to the operations team, you wait a couple of days to get some more information back. That just means your business has slowed down. If things are simple enough where the application developers themselves can resolve a lot of these issues, that'll get the business unstuck and get them moving on further. Now, to the other point which you were asking, which is what about the operations and the app support people? So, Unravel's a great tool for them too because that helps them see what's happening holistically in the cluster. How are other applications behaving with each other? It's usually a multitenant, multiapplication environment that these big data jobs are running on. So, is my apps slowing down George's apps? Am I stealing resources from your applications? More so, not just about an individual application issue itself. So Unravel will give you visibility into each app, as well as the overall cluster to help you understand cluster-wide problems. >> Love to get at, maybe peel apart your target audience a little bit. You talked about DevOps. But also the business analysts, data scientists, and we talk about big data. Data is, has such tremendous power to fuel a company and, you know, like you said use it to deliver and, create and deliver new products. Are you talking with multiple audiences within a company? Do you start at DevOps and they bring in their peers? Or do you actually start, maybe, at the Chief Data Officer level? What's that kind of entrance for Unravel? >> So the word I use to describe this is DataOps, instead of DevOps, right? So in the older world you had developers, and you had operations people. Over here you have a data team and operations people, and that data team can comprise of the developers, the data scientists, the business analysts, etc., as well. But you're right. Although we first target the operations role because they have to manage and monitor the system and make sure everything is running like a well-oiled machine, they are now spreading it out to be end-users, meaning the developers themselves saying, "Don't come to me for every problem. "Look at Unravel, try solve it here, "and if you cannot, then come to me." This is all, again, improving agility within the company, making sure that people have the necessary tools and insights to carry on with their day. >> Sounds like an enabler, >> Yeah, absolutely. >> That operations would push down to the DevOp, the developers themselves. >> And even the managers and the CDOs, for example, they want to see their ROI that they're getting from their big data investments. They want to see, they have put in these millions of dollars, have got an infrastructure and these services set up, but how are we actually moving the needle forward? Are there any applications that we're actually putting in business, and is that driving any business value? So we will be able to give them a very nice dashboard helping them understand what kind of throughput are you getting from your system, how many applications were you able to develop last week and onboard to your production environment? And what's the rate of innovation that's really happening inside your company on those big data ecosystems? >> It sort of brings up an interesting question on two prongs. One is the well-known, but inexact number about how many big data projects, >> Kunal: Yeah, yeah. >> I don't know whether they fail or didn't pay off. So there's going in and saying, "Hey, we can help you manage this "because it was too complicated." But then there's also the, all the folks who decided, "Well, we really don't want "to run it all on-prem. "We're not going to throw away everything we did there, "but we're going to also put a lot of new investment >> Kunal: Exactly, exactly. >> in the cloud. Now, Wikibon has a term for that, which true private cloud, which is when you have the operational processes that you use in the public cloud and you can apply them on-prem. >> Right. >> George: But there's not many products that help you do that. How can Unravel work...? >> Kunal: That's a very good questions, George. We're seeing the world move more and more to a cloud environment, or I should say an on-demand environment where you're not so bothered about the infrastructure and the services, but you want Spark as a dial tone. You want Kafka as a dial tone. You want a machine-learning platform as a dial tone. You want to come in there, you want to put in your data, and you want to just start running it. Unravel has been designed from the ground up to monitor and manage any of these environments. So, Unravel can solve problems for your applications running on-premise and similarly all the applications that are running on cloud. Now, on the cloud there are other levels of problems as well so, of course, you'd have applications that are slow, applications that are failing; we can solve those problems. But if you look at a cloud environment, a lot of these now provide you an autoscaling capability, meaning, Hey, if this app doesn't run in the amount of time that we were hoping it to run, let's add extra hardware and run this application. Well, if you just keep throwing machines at the problem, it's not going to solve your issue. Now, it doesn't decrease the time that it will take linearly with how many servers that you're actually throwing in there, so what we can help companies understand is what is the resource requirement of a particular application? How should we be intelligently allocating resources to make sure that you're able to meet your time SLAs, your constraints of, here I need to finish this with x number of minutes, but at the same time be intelligent about how much cost you're spending over there. Do you actually need 500 containers to go and run this app? Well, you may have needed 200. How do you know that? So, Unravel will also help you get efficient with your run, not just faster, but also can it be a good multitenant citizen, can it use limited resources to actually run this applications as well? >> So, Kunal, some of the things I'm hearing from a customer's standpoint that are potential positive business outcomes are internal: performance boost. >> Kunal: Yeah. >> It also sounds like, sort of... productivity improvements internally. >> And then also the opportunity to have the insight to deliver new products, but even I'm thinking of, you know, helping make a retailer, for example, be able to do more targeted marketing, so >> the business outcomes and the impact that Unravel can make really seem to have pretty strong internal and external benefits. >> Kunal: Yes. >> Is there a favorite customer story, (Kunal laughs) don't have to mention names, that you really think speaks to your capabilities? >> So, 100% Improving performance is a very big factor of what Unravel can do. Decreasing costs by improving productivity, by limiting the amount of resources that you're using, is a very, very big factor. Now, amongst all of these companies that we work with, one key factor is improving reliability, which means, Hey, it's fine that he can speed up this application, but sometimes I know the latency that I expect from an app, maybe it's a second, maybe it's a minute, depending on the type of application. But what businesses cannot tolerate is this app taking five x amount more time today. If it's going to finish in a minute, tell me it'll finish in a minute and make sure it finishes in a minute. And this is a big use case for all of the big data vendors because a lot of the customers are moving from Teradata, or from Vertica, or from other relation databases, on to Hortonworks or Cloudera or Amazon EMR. Why? Because it's one tenth the amount of cost for running these workloads. But, all the customers get frustrated and say, "I don't mind paying 10 x more money, "but because over there it used to work. "Over here, there are just so many complications, "and I don't have reliability with these applications." So that's a big, big factor of, you know, how we actually help these customers get value out of the Unravel product. >> Okay, so, um... A question I'm, sort of... why aren't there so many other Unravels? >> Kunal: Yeah. (Kunal laughs) >> From what I understood from past conversations. >> Kunal: Yeah. >> You can only really build the models that are at the heart of your capabilities based on tons and tons of telemetry >> Kunal: Yeah. >> that cloud providers or, or, sort of, internet scale service providers have accumulated in that, because they all have sort of a well-known set of configurations and well-known kind of typology. In other words, there're not a million degrees of freedom on any particular side that you can, you have a well-scoped problem, and you have tons of data. So it's easier to build the models. So who, who else could do this? >> Yeah, so the difference between Unravel and other monitoring products is Unravel is not a monitoring product. It's an intelligent performance management suite. What that means is we don't just give you graphs and metrics and say, "Here are all the raw information, "you go figure it out." Instead, we have to take it a step further where we are actually giving people answers. In order to develop something like that, you need full stack information; that's number one. Meaning information from applications all the way down to infrastructure and everything in between. Why? Because problems can lie anywhere. And if you don't have that full stack info, you're blind-siding yourself, or limiting the scope of the problems that you can actually search for. Secondly is, like you were rightly pointing out, how do I create answers from all this raw data? So you have to think like how an expert with big data would think, which is if there is a problem what are the kinds of checks, balances, places that that person would look into, and how would that person establish that this is indeed the root cause of the problem today? And then, how would that person actually resolve this particular problem? So, we have a big team of scientists, researchers. In fact, my co-founder is a professor of computer science at Duke University who has been researching data-based optimization techniques for the last decade. We have about 80 plus publications in this area, Starfish being one of them. We have a bunch of other publications, which talk about how do you automate problem discovery, root cause analysis, as well as resolution, to get best performance out of these different databases? And you're right. A lot of work has gone on the research side, but a lot of work has gone in understanding the needs of the customers. So we worked with some of the biggest companies out there, which have some of the biggest big data clusters, to learn from them, what are some everyday, ongoing management challenges that you face, and then taking that problem to our datasets and figuring out, how can we automate problem discovery? How can we proactively spot a lot of these errors? I joke around and I tell people that we're big data for big data. Right? All these companies that we serve, they are gathering all of this data, and they're trying to find patterns, and they're trying to find, you know, some sort of an insight with their data. Our data is system generated data, performance data, application data, and we're doing the exact same thing, which is figuring out inefficiencies, problems, cause and effect of things, to be able to solve it in a more intelligent, smart way. >> Well, Kunal, thank you so much for stopping by theCube >> Kunal: Of course. >> And sharing how Unravel Data is helping to unravel the complexities of big data. (Kunal laughs) >> Thank you so much. Really appreciate it. >> Now you're a Cube almuni. (Kunal laughs) >> Absolutely. Thanks so much for having me. >> Kunal, thanks. >> Yeah, and we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at our own event BigData SV in downtown San Jose, California. Stick around. George and I will be right back with our next guest. (quiet crowd noise) (techno music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We invite you to come by today, I love the name Unravel Data. Tell us a bit about what you guys do and not really needing a PhD to do so. So, so, um, you know, one of the things that Kunal: There's a lot of problems. there's a downside to it. tell us why you both need to know, and then how you can apply that even in an environment of the big data ecosystem, but like you pointed out, and then tell us what the administrator experiences. and this is how you can go and resolve it, and you're giving them a whole new, sort of, So that the, if it's DevOps, Now, to the other point which you were asking, to fuel a company and, you know, like you said So in the older world you had developers, DevOp, the developers themselves. and is that driving any business value? One is the well-known, but inexact number "Hey, we can help you manage this and you can apply them on-prem. that help you do that. and you want to just start running it. So, Kunal, some of the things I'm hearing It also sounds like, sort of... that Unravel can make really seem to have So that's a big, big factor of, you know, A question I'm, sort of... and you have tons of data. What that means is we don't just give you graphs to unravel the complexities of big data. Thank you so much. Now you're a Cube almuni. Thanks so much for having me. Yeah, and we want to thank you

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

Kunal AgarwalPERSON

0.99+

GeorgePERSON

0.99+

KunalPERSON

0.99+

LisaPERSON

0.99+

80%QUANTITY

0.99+

HortonworksORGANIZATION

0.99+

100%QUANTITY

0.99+

VerticaORGANIZATION

0.99+

Unravel DataORGANIZATION

0.99+

TeradataORGANIZATION

0.99+

todayDATE

0.99+

500 containersQUANTITY

0.99+

OneQUANTITY

0.99+

Two yearQUANTITY

0.99+

two prongsQUANTITY

0.99+

last weekDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

tonightDATE

0.99+

200QUANTITY

0.99+

first dayQUANTITY

0.99+

San JoseLOCATION

0.99+

SparkTITLE

0.99+

ClouderaORGANIZATION

0.99+

each appQUANTITY

0.99+

PythonTITLE

0.98+

a minuteQUANTITY

0.98+

EnglishOTHER

0.98+

oneQUANTITY

0.98+

Duke UniversityORGANIZATION

0.98+

fiveQUANTITY

0.98+

KafkaTITLE

0.98+

HadoopTITLE

0.98+

BigData SVEVENT

0.97+

first-timeQUANTITY

0.97+

Strata Data ConferenceEVENT

0.97+

one key factorQUANTITY

0.96+

millions of dollarsQUANTITY

0.95+

about 80 plus publicationsQUANTITY

0.95+

SQLTITLE

0.95+

DevOpsTITLE

0.94+

firstQUANTITY

0.94+

BigDataSVEVENT

0.94+

tons and tonsQUANTITY

0.94+

bothQUANTITY

0.94+

UnravelORGANIZATION

0.93+

SecondlyQUANTITY

0.91+

million degreesQUANTITY

0.91+

San Jose, CaliforniaLOCATION

0.91+

HiveTITLE

0.91+

last decadeDATE

0.91+

UnravelTITLE

0.9+

Guy Churchward, DataTorrent | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data, Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big Data SV, continues, this is our first day. We are down the street from the Strata Data Conference. Come by, we're at this really cool venue, the Forager Tasting Room. We've got a cocktail party tonight. You're going to hear some insights there as well as tomorrow morning. I am Lisa Martin, joined by my co-host, George Gilbert, and we welcome back to theCUBE, for I think the 900 millionth time, the president and CEO of DataTorrent, Guy Churchward. Hey Guy, welcome back! >> Thank you, Lisa, I appreciate it. >> So you're one of our regular VIP's. Give us the update on DataTorrent. What's new, what's going on? >> We actually talked to you a couple of weeks ago. We did a big announcement which was around 3.10, so it's a new release that we have. In all small companies, and we're a small startup, in the big data and analytic space, there is a plethora of features that I can reel through. But it actually makes something a little bit more fundamental. So in the last year... In fact, I think we chatted with you maybe six months ago. We've been looking very carefully at how customers purchase and what they want and how they execute against technology, and it's very very different to what I expected when I came into the company about a year ago off the EMC role that I had. And so, although the features are there, there's a huge amount of underpinning around the experience that a customer would have around big data applications. I'm reminded of, I think it's Gartner that quoted that something like 80% of big data applications fail. And this is one of the things that we really wanted to look at. We have very large customers in production, and we did the analysis of what are we doing well with them, and why can't we do that en masse, and what are people really looking for? So that was really what the release was about. >> Let's elaborate on this a little bit. I want to drill into something where you said many projects, as we've all heard, have not succeeded. There's a huge amount of complexity. The terminology we use is, without tarring and feathering any one particular product, the open source community is kind of like, you're sort of harnessing a couple dozen animals and a zookeeper that works in triplicate... How does DataTorrent tackle that problem? >> Yeah, I mean, in fact I was desperately interested in writing a blog recently about using the word community after open source, because in some respects, there isn't a huge community around the open source movement. What we find is it's the du jour way in which we want to deliver technology, so I have a huge amount of developers that work on a thing called Apache Apex, which is a component in a solution, or in an architecture and in an outcome. And we love what we do, and we do the best we do, and it's better than anybody else's thing. But that's not an application, that's not an outcome. And what happens is, we kind of don't think about what else a customer has to put together, so then they have to go out to the zoo and pick loads of bits and pieces and then try to figure out how to stitch them all together in the best they can. And that takes an inordinately long time. And, in general, people who love this love tinkering with technologies, and their projects never get to production. And large enterprises are used to sitting down and saying, "I need a bulletproof application. "It has to be industrialized. "I need a full SLA on the back of it. "This thing has to have lights out technology. "And I need it quick." Because that was the other thing, as an aspect, is this market is moving so fast, and you look at things like digital economy or any other buzz term, but it really means that if you realize you need to do something, you're probably already too late. And therefore, you need it speedy, expedited. So the idea of being able to wait for 12 months, or two years for an application, also makes no sense. So the arch of this is basically deliver an outcome, don't try and change the way in which open source is currently developed, because they're in components, but embrace them. And so what we did is we sort of looked at it and said, "Well what do people really want to do?" And it's big data analytics, and I want to ingest a lot of information, I want to enrich it, I want to analyze it, and I want to take actions, and then I want to go park it. And so, we looked at it and said, "Okay, so the majority "of stuff we need is what we call a cache stack, "which is KAFKA, Apache Apex, Spark and Hadoop, "and then put complex compute on top." So you would have heard of terms like machine learning, and dimensional compute, so we have their modules. So we actually created an opinionated stack... Because otherwise you have a thousand to choose from and people get confused with choice. I equate it to going into a menu at a restaurant, there's two types of restaurants, you walk into one and you can turn pages and pages and pages and pages of stuff, and you think that's great, I got loads of choice, but the choice kind of confuses you. And also, there's only one chef at the back, and he can't cook everything well. So you know if he chooses the components and puts them together, you're probably not going to get the best meal. And then you go to restaurants that you know are really good, they generally give you one piece of paper and they say, "Here's your three entrees." And you know every single one of them. It's not a lot of choice, but at the end of the day, it's going to be a really good meal. >> So when you go into a customer... You're leading us to ask you the question which is, you're selling the prix fixe tasting menu, and you're putting all the ingredients together. What are some of those solutions and then, sort of, what happens to the platform underneath? >> Yeah, so what you don't want to do is to take these flexible, microdata services, which are open source projects, and hard glue them together to create an application that then has no flexibility. Because, again, one of the myths that I used to assume is applications would last us seven to 10 years. But what we're finding in this space is this movement towards consumerization of enterprise applications. In other words, I need an app and I need it tomorrow because I'm competitively disadvantaged, but it might be wrong, so I then need to adjust it really quick. It's this idea of continual developed, continual adjustment. But that flies in the face of all of this gluing and enterprise-ilities. And I want to base it on open source, and open source, by default, doesn't glue well together. And so what we did is we said okay, not only do you have to create an opinionated stack, and you do that because you want them all to scale into all industries, and they don't need a huge amount of choice, just pick best of breed. But you need to then put a sleeve around them so they all act as though they are a single application. And so we actually announced a thing calls Epoxy. It's a bit of a riff on gluing, but it's called DataTorrent Epoxy. So we have, it's like a microdata service bus, and you can then interchange the components. For instance, right now, Apache Apex is this string-based processing engine in that component. But if there's a better unit, we're quite happy to pull it out, chuck it away, and then put another one in. This isn't a ubiquitous snap-on toolset, because, again, the premise is use open source, get the innovation from there. It has to be bulletproof and enterprise-ility and move really fast. So those are the components I was working on. >> Guy, as CEO, I'm sure you speak with a lot of customers often. What are some of the buying patterns that you're seeing across industries, and what are some of the major business value that DataTorrent can help deliver to your customers? >> The buying patterns when we get involved, and I'm kind of breaking this down into a slightly different way, because we normally get involved when a project's in flight, one of the 80% that's failing, and in general, it's driven by a strategic business partner that has an agenda. And what you see is proprietary application vendors will say, "We can solve everything for you." So they put the tool in and realize it doesn't have the flexibility, it does have enterprise-ility, but it can't adjust fast. And then you get the other type who say, "Well we'll go to a distro or we'll go "to a general purpose practitioner, "and they'll build an application for us." And they'll take open source components, but they'll glue it together with proprietary mush, and then that doesn't then grow past. And then you get the other ones, which is, "Well if I actually am not guided by anybody, "I'll buy a bunch of developers, stick them in my company, "and I've got control on that." But they fiddle around a lot. So we arrive in and, in general, they're in this middle process of saying, "I'm at a competitive disadvantage, "I want to move forward and I want to move forward fast, "and we're working on one of those three channels." The types of outcomes, we just, and back to the expediency of this, we had a telco come to us recently, and it was just before the iPhone X launched, and they wanted to do AB testing on the launch on their platform. We got them up and running within three months. Subsequent from that launch, they then repurposed the platform and some of the components with some augmentation, and they've come out with three further applications. They've all gone into production. So the idea is then these fast cycles of microdata services being stitched together with the Epoxy resin type approach-- >> So faster time to value, lower TCO-- >> Exactly. >> Being able to get to meet their customers' needs faster-- >> Exactly, so it's outcome-based and time to value, and it's time to proof. Because this is, again, the thing that Gartner picked up on, is Hadoop's difficult, this market's complex and people kick the tires a lot. And I sort of joke with customers, "Hey if you want to "obsess about components rather than the outcome, "then your successor will probably come see us "once you're out and your group's failed." And I don't mean that in an obnoxious way. It's not just DataTorrent that solves this same thing, but this it the movement, right? Deal with open source, get enterprise-ilities, get us up and running within a quarter or two, and then let us have some use and agile repurposing. >> Following on that, just to understand going in with a solution to an economic buyer, but then having the platform be reusable, is it opinionated and focused on continuous processing applications, or does it also address both the continuous processing and batch processing? >> Yeah, it's a good answer. In general, and again Gatekeeper, you've got batch and you've got realtime and string, and so we deal with data in motion, which is string-based processing. A string-based processing engine can deal with batch as well, but a batch cannot deal with string. >> George: So you do both-- >> Yeah >> And the idea being that you can have one programming model for both. >> Exactly. >> It's just a window, batch is just a window. >> And the other thing is, a myth bust, is for the last maybe eight plus years, companies assume that the first thing you do in big data analytics is collect all the data, create a data lake, and so they go in there, they ingest the information, they put it into a data lake, and then they poke the data lake posthumously. But the data in the data lake is, by default, already old. So the latency of sticking it into a data lake and then sorting it, and then basically poking it, means that if anybody deals with the data that's in motion, you lose. Because I'm analyzing as it's happening and then you would be analyzing it after at rest, right? So now the architecture of choice is ingest the information, use high performance storage and compute, and then, in essence, ingest, normalize, enrich, analyze, and act on data in motion, in memory. And then when I've used it, then throw it off into a data lake because then I can basically do posthumous analytics and use that for enrichment later. >> You said something also interesting where the DataTorrent customers, the initial successful ones sort of tended to be larger organizations. Those are typically the ones with skillsets to, if anyone's going to be able to put pieces together, it's those guys. Have you not... Well, we always expected big data applications, or sort of adaptive applications, to go mainstream when they were either packaged apps to take all the analysis and embed it, or when you had end to end integrated products to make it simple. Where do you think, what's going to drive this mainstream? >> Yeah, it depends on how mainstream you want mainstream. It's kind of like saying how fast is a fast car. If you want a contractor that comes into IT to create a dashboard, go buy Tableau, and that's mainstream analytics, but it's not. It's mainstream dashboarding of data. The applications that we deal with, by default, the more complex data, they're going to be larger organizations. Don't misunderstand when I say, "We deal with these organizations." We don't have a professional services arm. We work very closely with people like HCL, and we do have a jumpstart team that helps people get there. But our job is teach someone, it's like a kid with a bike and the training wheels, our job is to teach them how to ride the bike, and kick the wheels off, and step away. Because what we don't want to do is to put a professional services drip feed into them and just keep sucking the money out. Our job is to get them there. Now, we've got one company who actually are going to go live next month, and it's a kid tracker, you know like a GPS one that you put on bags and with your kids, and it'll be realtime tracking for the school and also for the individuals. And they had absolutely zero Hadoop experience when we got involved with them. And so we've brought them up, we've helped them with the application, we've kicked the wheels off and now they're going to be sailing. I would say, in a year's time, they're going to be comfortable to just ignore us completely, and in the first year, there's still going to be some handholding and covering up a bruise as they fall off the bike every so often. But that's our job, it's IP, technology, all about outcomes and all about time to value. >> And from a differentiation standpoint, that ability to enable that self service and kick off the training wheels, is that one of the biggest differentiators that you find DataTorret has, versus the Tableau's and the other competitors on the market? >> I don't want to say there's no one doing what we're doing, because that will sound like we're doing something odd. But there's no one doing what we're doing. And it's almost like Tesla. Are they an electric car or are they a platform? They've spurred an industry on, and Uber did the same thing, and Lyft's done something and AirBNB has. And what we've noticed is customer's buying patterns are very specific now. Use open source, get up their enterprise-ilities, and have that level of agility. Nobody else is really doing that. The only people that will do that is your contract with someone like Hortonworks or a Cloudera, and actually pay them a lot of money to build the application for you. And our job is really saying, "No, instead of you paying "them on professional services, we'll give you the sleeve, "we'll make it a little bit more opinionated, "and we'll get you there really quickly, "and then we'll let you and set you free." And so that's one. We have a thing called the Application Factory. That's the snap on toolset where they can literally go to a GUI and say, "I'm in the financial market, "I want a fraud prevention application." And we literally then just self assemble the stack, they can pick it up, and then put their input and output in. And then, as we move forward, we'll have partners who are building the spoke applications in verticals, and they will put them up on our website, so the customers can come in and download them. Everything is subscription software. >> Fantastic, I wish we had more time, but thanks so much for finding some time today to come by theCUBE, tell us what's new, and we look forward to seeing you on the show again very soon. >> I appreciate it, thank you very much. >> We want to thank you for watching theCUBE. Again, Lisa Martin with my co-host George Gilbert, we're live at our event, Big Data SV, in downtown San Jose, down the street from the Strata Data Conference. Stick around, George and I will be back after a short break with our next guest. (light electronic jingle)

Published Date : Mar 8 2018

SUMMARY :

presenting Big Data, Silicon Valley, brought to you and we welcome back to theCUBE, So you're one of our regular VIP's. and we did the analysis of what are we doing well with them, I want to drill into something where you said many projects, So the idea of being able to wait for 12 months, So when you go into a customer... And so what we did is we said okay, not only do you have What are some of the buying patterns that you're seeing And then you get the other ones, which is, And I sort of joke with customers, "Hey if you want to and so we deal with data in motion, And the idea being that you can have one and then you would be analyzing it after at rest, right? or when you had end to end integrated products and now they're going to be sailing. and actually pay them a lot of money to build and we look forward to seeing you We want to thank you for watching theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

two yearsQUANTITY

0.99+

GeorgePERSON

0.99+

12 monthsQUANTITY

0.99+

UberORGANIZATION

0.99+

AirBNBORGANIZATION

0.99+

LisaPERSON

0.99+

TeslaORGANIZATION

0.99+

80%QUANTITY

0.99+

two typesQUANTITY

0.99+

GartnerORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

San JoseLOCATION

0.99+

iPhone XCOMMERCIAL_ITEM

0.99+

DataTorrentORGANIZATION

0.99+

sevenQUANTITY

0.99+

Guy ChurchwardPERSON

0.99+

tomorrow morningDATE

0.99+

LyftORGANIZATION

0.99+

last yearDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

six months agoDATE

0.99+

next monthDATE

0.99+

three monthsQUANTITY

0.99+

bothQUANTITY

0.99+

oneQUANTITY

0.98+

EMCORGANIZATION

0.98+

first dayQUANTITY

0.98+

tonightDATE

0.98+

Silicon ValleyLOCATION

0.98+

tomorrowDATE

0.98+

one chefQUANTITY

0.98+

10 yearsQUANTITY

0.98+

one pieceQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

ClouderaORGANIZATION

0.97+

three entreesQUANTITY

0.97+

Strata Data ConferenceEVENT

0.97+

first thingQUANTITY

0.97+

first yearQUANTITY

0.96+

single applicationQUANTITY

0.96+

todayDATE

0.95+

couple of weeks agoDATE

0.95+

telcoORGANIZATION

0.95+

900 millionth timeQUANTITY

0.95+

one companyQUANTITY

0.94+

HCLORGANIZATION

0.94+

a quarterQUANTITY

0.94+

DataTorretORGANIZATION

0.93+

three channelsQUANTITY

0.93+

twoQUANTITY

0.92+

Big Data SVEVENT

0.92+

Big Data SV 2018EVENT

0.91+

three further applicationsQUANTITY

0.86+

ApexTITLE

0.84+

a yearQUANTITY

0.82+

TableauORGANIZATION

0.81+

HadoopPERSON

0.81+

about a year agoDATE

0.8+

couple dozen animalsQUANTITY

0.8+

productQUANTITY

0.78+

eight plus yearsQUANTITY

0.77+

ApacheORGANIZATION

0.76+

agileTITLE

0.76+

GuyPERSON

0.73+

EpoxyORGANIZATION

0.71+

TableauTITLE

0.71+

DataTorrentPERSON

0.7+

around 3.10DATE

0.69+

SparkTITLE

0.68+

restaurantsQUANTITY

0.66+

GatekeeperTITLE

0.66+

modelQUANTITY

0.63+

Dr. Tendu Yogurtcu, Syncsort | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's theCUBE. Presenting data, Silicon Valley brought to you by Silicon Angle Media and it's ecosystem partners. >> Welcome back to theCUBE. We are live in San Jose at our event, Big Data SV. I'm Lisa Martin, my co-host is George Gilbert and we are down the street from the Strata Data Conference. We are at a really cool venue: Forager Eatery Tasting Room. Come down and join us, hang out with us, we've got a cocktail par-tay tonight. We also have an interesting briefing from our analysts on big data trends tomorrow morning. I want to welcome back to theCUBE now one of our CUBE VIP's and alumna Tendu Yogurtcu, the CTO at Syncsort, welcome back. >> Thank you. Hello Lisa, hi George, pleasure to be here. >> Yeah, it's our pleasure to have you back. So, what's going on at Syncsort, what are some of the big trends as CTO that you're seeing? >> In terms of the big trends that we are seeing, and Syncsort has grown a lot in the last 12 months, we actually doubled our revenue, it has been really an successful and organic growth path, and we have more than 7,000 customers now, so it's a great pool of customers that we are able to talk and see the trends and how they are trying to adapt to the digital disruption and make data as part of their core strategy. So data is no longer an enabler, and in all of the enterprise we are seeing data becoming the core strategy. This reflects in the four mega trends, they are all connected to enable business as well as operational analytics. Cloud is one, definitely. We are seeing more and more cloud adoption, even our financial services healthcare and banking customers are now, they have a couple of clusters running in the cloud, in public cloud, multiple workloads, hybrid seems to be the new standard, and it comes with also challenges. IT governance as well as date governance is a major challenge, and also scoping and planning for the workloads in the cloud continues to be a challenge, as well. Our general strategy for all of the product portfolio is to have our products following design wants and deploy any of our strategy. So whether it's a standalone environment on Linux or running on Hadoop or Spark, or running on Premise or in the Cloud, regardless of the Cloud provider, we are enabling the same education with no changes to run all of these environments, including hybrid. Then we are seeing the streaming trend, with the connected devices with the digital disruption and so much data being generated, being able to stream and process data on the age, with the Internet of things, and in order to address the use cases that Syncsort is focused on, we are really providing more on the Change Data Capture and near real-time and real-time data replication to the next generation analytics environments and big data environments. We launched last year our Change Data Capture, CDC, product offering with data integration, and we continue to strengthen that vision merger we had data replication, real-time data replication capabilities, and we are now seeing even Kafka database becoming a consumer of this data. Not just keeping the data lane fresh, but really publishing the changes from multiple, diverse set of sources and publishing into a Kafka database and making it available for applications and analytics in the data pipeline. So the third trend we are seeing is around data science, and if you noticed this morning's keynote was all about machine learning, artificial intelligence, deep learning, how to we make use of data science. And it was very interesting for me because we see everyone talking about the challenge of how do you prepare the data and how do you deliver the the trusted data for machine learning and artificial intelligence use and deep learning. Because if you are using bad data, and creating your models based on bad data, then the insights you get are also impacted. We definitely offer our products, both on the data integration and data quality side, to prepare the data, cleanse, match, and deliver the trusted data set for data scientists and make their life easier. Another area of focus for 2018 is can we also add supervised learning to this, because with the premium quality domain experts that we have now in Syncsort, we have a lot of domain experts in the field, we can infuse the machine learning algorithms and connect data profiling capabilities we have with the data quality capabilities recommending business rules for data scientists and helping them automate the mandate tasks with recommendations. And the last but not least trend is data governance, and data governance is almost a umbrella focus for everything we are doing at Syncsort because everything about the Cloud trend, the streaming, and the data science, and developing that next generation analytics environment for our customers depends on the data governance. It is, in fact, a business imperative, and the regulatory compliance use cases drives more importance today than governance. For example, General Data Protection Regulation in Europe, GDPR. >> Lisa: Just a few months away. >> Just a few months, May 2018, it is in the mind of every C-level executive. It's not just for European companies, but every enterprise has European data sourced in their environments. So compliance is a big driver of governance, and we look at governance in multiple aspects. Security and issuing data is available in a secure way is one aspect, and delivering the high quality data, cleansing, matching, the example Hilary Mason this morning gave in the keynote about half of what the context matters in terms of searches of her name was very interesting because you really want to deliver that high quality data in the enterprise, trust of data set, preparing that. Our Trillium Quality for big data, we launched Q4, that product is generally available now, and actually we are in production with very large deployment. So that's one area of focus. And the third area is how do you create visibility, the farm-to-table view of your data? >> Lisa: Yeah, that's the name of your talk! I love that. >> Yes, yes, thank you. So tomorrow I have a talk at 2:40, March 8th also, I'm so happy it's on the Women's Day that I'm talking-- >> Lisa: That's right, that's right! Get a farm-to-table view of your data is the name of your talk, track data lineage from source to analytics. Tell us a little bit more about that. >> It's all about creating more visibility, because for audit reasons, for understanding how many copies of my data is created, valued my data had been, and who accessed it, creating that visibility is very important. And the last couple of years, we saw everyone was focused on how do I create a data lake and make my data accessible, break the data silos, and liberate my data from multiple platforms, legacy platforms that the enterprise might have. Once that happened, everybody started worrying about how do I create consumable data set and how do I manage this data because data has been on the legacy platforms like Mainframe, IMBI series has been on relational data stores, it is in the Cloud, gravity of data originating in the Cloud is increasing, it's originating from mobile. Hadoop vendors like Hortonworks and Cloudera, they are creating visibility to what happens within the Hadoop framework. So we are deepening our integration with the Cloud Navigator, that was our announcement last week. We already have integration both with Hortonworks and Cloudera Navigator, this is one step further where we actually publish what happened to every single granular level of data at the field level with all of the transformations that data have been through outside of the cluster. So that visibility is now published to Navigator itself, we also publish it through the RESTful API, so governance is a very strong and critical initiative for all of the businesses. And we are playing into security aspect as well as data lineage and tracking aspect and the quality aspect. >> So this sounds like an extremely capable infrastructure service, so that it's trusted data. But can you sell that to an economic buyer alone, or do you go in in conjunction with anther solution like anti-money laundering for banks or, you know, what are the key things that they place enough value on that they would spend, you know, budget on it? >> Yes, absolutely. Usually the use cases might originate like anti-money laundering, which is very common, fraud detection, and it ties to getting a single view of an entity. Because in anti-money laundering, you want to understand the single view of your customer ultimately. So there is usually another solution that might be in the picture. We are providing the visibility of the data, as well as that single view of the entity, whether it's the customer view in this case or the product view in some of the use cases by delivering the matching capabilities and the cleansing capabilities, the duplication capabilities in addition to the accessing and integrating the data. >> When you go into a customer and, you know, recognizing that we still have tons of silos and we're realizing it's a lot harder to put everything in one repository, how do customers tell you they want to prioritize what they're bringing into the repository or even what do they want to work on that's continuously flowing in? >> So it depends on the business use case. And usually at the time that we are working with the customer, they selected that top priority use case. The risk here, and the anti-money laundering, or for insurance companies, we are seeing a trend, for example, building the data marketplace, as that tantalize data marketplace concept. So depending on the business case, many of our insurance customers in US, for example, they are creating the data marketplace and they are working with near real-time and microbatches. In Europe, Europe seems to be a bit ahead of the game in some cases, like Hadoop production was slow but certainly they went right into the streaming use cases. We are seeing more directly streaming and keeping it fresh and more utilization of the Kafka and messaging frameworks and database. >> And in that case, where they're sort of skipping the batch-oriented approach, how do they keep track of history? >> It's still, in most of the cases, microbatches, and the metadata is still associated with the data. So there is an analysis of the historical what happened to that data. The tools, like ours and the vendors coming to picture, to keep track, of that basically. >> So, in other words, by knowing what happened operationally to the data, that paints a picture of a history. >> Exactly, exactly. >> Interesting. >> And for the governance we usually also partner, for example, we partner with Collibra data platform, we partnered with ASG for creating that business rules and technical metadata and providing to the business users, not just to the IT data infrastructure, and on the Hadoop side we partner with Cloudera and Hortonworks very closely to complete that picture for the customer, because nobody is just interested in what happened to the data in Hadoop or in Mainframe or in my relational data warehouse, they are really trying to see what's happening on Premise, in the Cloud, multiple clusters, traditional environments, legacy systems, and trying to get that big picture view. >> So on that, enabling a business to have that, we'll say in marketing, 360 degree view of data, knowing that there's so much potential for data to be analyzed to drive business decisions that might open up new business models, new revenue streams, increase profit, what are you seeing as a CTO of Syncsort when you go in to meet with a customer, data silos, when you're talking to a Chief Data Officer, what's the cultural, I guess, not shift but really journey that they have to go on to start opening up other organizations of the business, to have access to data so they really have that broader, 360 degree view? What's that cultural challenge that they have to, journey that they have to go on? >> Yes, Chief Data Officers are actually very good partners for us, because usually Chief Data Officers are trying to break the silos of data and make sure that the data is liberated for the business use cases. Still most of the time the infrastructure and the cluster, whether it's the deployment in the Cloud versus on Premise, it's owned by the IT infrastructure. And the lines of business are really the consumers and the clients of that. CDO, in that sense, almost mitigates and connects to those line of businesses with the IT infrastructure with the same goals for the business, right? They have to worry about the compliance, they have to worry about creating multiple copies of data, they have to worry about the security of the data and availability of the data, so CDOs actually help. So we are actually very good partners with the CDOs in that sense, and we also usually have IT infrastructure owner in the room when we are talking with our customers because they have a big stake. They are like the gatekeepers of the data to make sure that it is accessed by the right... By the right folks in the business. >> Sounds like maybe they're in the role of like, good cop bad cop or maybe mediator. Well Tendu, I wish we had more time. Thanks so much for coming back to theCUBE and, like you said, you're speaking tomorrow at Strata Conference on International Women's Day: Get a farm-to-table view of your data. Love the title. >> Thank you. >> Good luck tomorrow, and we look forward to seeing you back on theCUBE. >> Thank you, I look forward to coming back and letting you know about more exciting both organic innovations and acquisitions. >> Alright, we look forward to that. We want to thank you for watching theCUBE, I'm Lisa Martin with my co-host George Gilbert. We are live at our event Big Data SV in San Jose. Come down and visit us, stick around, and we will be right back with our next guest after a short break. >> Tendu: Thank you. (upbeat music)

Published Date : Mar 7 2018

SUMMARY :

brought to you by Silicon Angle Media and we are down the street from the Strata Data Conference. Hello Lisa, hi George, pleasure to be here. Yeah, it's our pleasure to have you back. and in all of the enterprise we are seeing data and delivering the high quality data, Lisa: Yeah, that's the name of your talk! it's on the Women's Day that I'm talking-- is the name of your talk, track data lineage and make my data accessible, break the data silos, that they place enough value on that they would and the cleansing capabilities, the duplication So it depends on the business use case. It's still, in most of the cases, operationally to the data, that paints a picture And for the governance we usually also partner, and the cluster, whether it's the deployment Love the title. to seeing you back on theCUBE. and letting you know about more exciting and we will be right back with our next guest Tendu: Thank you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

GeorgePERSON

0.99+

May 2018DATE

0.99+

George GilbertPERSON

0.99+

SyncsortORGANIZATION

0.99+

LisaPERSON

0.99+

EuropeLOCATION

0.99+

HortonworksORGANIZATION

0.99+

USLOCATION

0.99+

Hilary MasonPERSON

0.99+

San JoseLOCATION

0.99+

ASGORGANIZATION

0.99+

2018DATE

0.99+

TenduPERSON

0.99+

Silicon Angle MediaORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

360 degreeQUANTITY

0.99+

tomorrowDATE

0.99+

CollibraORGANIZATION

0.99+

more than 7,000 customersQUANTITY

0.99+

last weekDATE

0.99+

last yearDATE

0.99+

tomorrow morningDATE

0.99+

one aspectQUANTITY

0.99+

third areaQUANTITY

0.99+

LinuxTITLE

0.99+

Cloud NavigatorTITLE

0.99+

2:40DATE

0.98+

Women's DayEVENT

0.98+

Tendu YogurtcuPERSON

0.98+

GDPRTITLE

0.98+

SparkTITLE

0.97+

tonightDATE

0.97+

Big Data SVEVENT

0.97+

KafkaTITLE

0.97+

International Women's DayEVENT

0.97+

bothQUANTITY

0.97+

CDCORGANIZATION

0.96+

NavigatorTITLE

0.96+

Strata Data ConferenceEVENT

0.96+

single viewQUANTITY

0.96+

HadoopTITLE

0.95+

third trendQUANTITY

0.95+

one stepQUANTITY

0.95+

single viewQUANTITY

0.95+

Dr.PERSON

0.94+

theCUBEORGANIZATION

0.94+

CUBEORGANIZATION

0.94+

this morningDATE

0.94+

CloudTITLE

0.92+

last 12 monthsDATE

0.91+

Change Data CaptureORGANIZATION

0.9+

todayDATE

0.9+

EuropeanOTHER

0.88+

last couple of yearsDATE

0.88+

General Data Protection Regulation in EuropeTITLE

0.86+

Strata ConferenceEVENT

0.84+

oneQUANTITY

0.83+

one repositoryQUANTITY

0.83+

tons of silosQUANTITY

0.82+

one areaQUANTITY

0.82+

Q4DATE

0.82+

Big Data SV 2018EVENT

0.81+

four mega trendsQUANTITY

0.76+

March 8thDATE

0.76+

Matthew Baird, AtScale | Big Data SV 2018


 

>> Announcer: Live from San Jose. It's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and it's ecosystem partners. (techno music) >> Welcome back to theCUBE, our continuing coverage on day one of our event, Big Data SV. I'm Lisa Martin with George Gilbert. We are down the street from the Strata Data Conference. We've got a great, a lot of cool stuff going on. You can see the cool set behind me. We are at Forager Tasting Room & Eatery. Come down and join us, be in our audience today. We have a cocktail event tonight, who doesn't want to join that? And we have a nice presentation tomorrow morning of our Wikibon's 2018 Big Data Forecast and Review. Joining us next is Matthew Baird the co-founder of AtScale. Matthew, welcome to theCUBE. >> Thanks for having me. Fantastic venue, by the way. >> Isn't it cool? >> This is very cool. >> Yeah, it is. So, talking about Big Data, you know, Gardner says, "85% of Big Data projects have failed." I often say failure is not a bad F word, because it can spawn the genesis of a lot of great business opportunities. Data lakes were big a few years ago, turned into swamps. AtScale has this vision of Data Lake 2.0, what is that? >> So, you're right. There have been a lot of failures, there's no doubt about it. And you're also right that is how we evolve, and we're a Silicon Valley based company. We don't give up when faced with these things. It's just another way to not do something. So, what we've seen and what we've learned through our customers is they need to have a solution that is integrated with all the technologies that they've adopted in the enterprise. And it's really about, if you're going to make a data lake, you're going to have data on there that is the crown jewels of your business. How are you going to get that in the hands of your constituents, so that they can analyze it, and they can use it to make decisions? And how can we, furthermore, do that in a way that supplies governance and auditability on top of it, so that we aren't just sending data out into the ether and not knowing where it goes? We have a lot of customers in the insurance, health insurance space, and with financial customers that the data absolutely must be managed. I think one of the biggest changes is around that integration with the current technologies. There's a lot of movement into the Cloud. The new data lake is kind of focused more on these large data stores, where it was HDFS with Hadoop. Now it's S3, Google's object storage, and Azure ADLS. Those are the sorts of things that are backing the new data lake I believe. >> So if we take these, where the Data Lake Store didn't have to be something that's a open source HDFS implementation, it could even be through just through a HDSF API. >> Matthew: Yeah, absolutely. >> What are some of the, how should we think about the data sources and feeds, for this repository, and then what is it on top that we need to put to make the data more consumable? >> Yeah, that's a good point. S3, Google Object Storage, and Azure, they all have a characteristic of, they are large stores. You can store as much as you want. They generally on the Clouds, and in the open source on-prem software for landing the data exists, for streaming the data and landing it, but the important thing there is it's cost-effective. S3 is a cost-effective storage system. HDFS is a mostly cost-effective storage system. You have to manage it, so it has a slightly higher cost, but the advice has been, get it to the place you're going to store it. Store it in a unified format. You get a halo effect when you have a unified format, and I think the industry is coalescing around... I'd probably say ParK's in the lead right now, but once ParK can be read by, let's take Amazon for instance, can be read by Athena, can be read by Redshift Spectrum, it can be read by their EMR, now you have this halo effect where your data's always there, always available to be consumed by a tool or a technology that can then deliver it to your end users. >> So when we talk about ParK, we're talking about columnar serialization format, >> Matthew: Yes. but there's more on top of that that needs to be layered, so that you can, as we were talking about earlier, combine the experience of a data warehouse, and the curated >> Absolutely data access where there's guard rails, >> Matthew: Yes >> and it's simple, versus sort of the wild west, but where I capture everything in a data lake. How do you bring those two together? >> Well, specifically for AtScale, we allow you to integrate multiple data access tools in AtScale, and then we use the appropriate tool to access the data for the use case. So let me give you an example, in the Amazon case, Redshift is wonderful for accessing interactive data, which BI users want, right? They want fast queries, sub-second queries. They don't want to pay to have all the raw data necessarily stored in Redshift 'cause that's pretty expensive. So they have this Redshift spectrum, it's sitting in S3, that's cost effective. So when we go and we read raw data to build these summary tables, to deliver the data fast, we can read from Spectrum, we can put it all together, drop it into Redshift, a much smaller volume of data, so it has faster characteristics for being accessed. And it delivers it to the user that way. We do that in Hadoop when we access via Hive for building aggregate tables, but Spark or Impala, is a much faster interactive engine, so we use those. As I step back and look at this, I think the Data Lake 2.0, from a technical perspective is about abstraction, and abstraction's sort of what separates us from the animals, right? It's a concept where we can pack a lot of sophistication and complexity behind an interface that allows people to just do what they want to do. You don't know how, or maybe you do know how a car engine works, I don't really, kind of, a little bit, but I do know how to press the gas pedal and steer. >> Right. >> I don't need to know these things, and I think the Data Lake 2.0 is about, well I don't need to know how Century, or Ranger, or Atlas, or any of these technologies work. I need to know that they're there, and when I access data, they're going to be applied to that data, and they're going to deliver me the stuff that I have access to and that I can see. >> So a couple things, it sounded like I was hearing abstraction, and you said really that's kind of the key, that sounds like a differentiator for AtScale, is giving customers that abstraction they need. But I'm also curious from a data value perspective, you talked about in Redshift from an expense perspective. Do you also help customers gain abstraction by helping them evaluate value of data and where they ought to keep it, and then you give them access to it? Or is that something that they need to do, kind of bring to the table? >> We don't really care, necessarily, about the source of the data, as long as it can be expressed in a way that can be accessed by whatever engine it is. Lift and shift is an example. There's a big move to move from Teradata or from Netezza into a Cloud-based offering. People want to lift it and shift it. It's the easiest way to do this. Same table definitions, but that's not optimized necessarily for the underlying data store. Take BigQuery for example, BigQuery's an amazing piece of technology. I think there's nothing like it out there in the market today, but if you really want BigQuery to be cost-effective, and perform and scale up to concurrency of... one of our customers is going to roll out about 8,000 users on this. You have to do things in BigQuery that are BigQuery-friendly. The data structures, the way that you store the data, repeated values, those sorts of things need to be taken into consideration when you build your schema out for consumption. With AtScale they don't need to think about that, they don't need to worry about it, we do it for them. They drop the schema in the same way that it exists on their current technology, and then behind the scenes, what we're doing is we're looking at signals, we're looking at queries, we're looking at all the different ways that people access the data naturally, and then we restructure those summary tables using algorithms and statistics, and I think people would broadly call it ML type approaches, to build out something that answers those questions, and adapts over time to new questions, and new use cases. So it's really about, imagine you had the best data engineering team in the world, in a box, they're never tired, they never stop, and they're always interacting with what the customers really want, which is "Now I want to look at the data this way". >> It's sounds actually like what your talking about is you have a whole set of sources, and targets, and you understand how they operate, but why I say you, I mean your software. And so that you can take data from wherever it's coming in, and then you apply, if it's machine learning or whatever other capabilities to learn from the access methods, how to optimize that data for that engine. >> Matthew: Exactly. >> And then the end users have an optimal experience and it's almost like the data migration service that Amazon has, it's like, you give us your Postgres or Oracle database, and we'll migrate it to the cloud. It sounds like you add a lot of intelligence to that process for decision support workloads. >> Yes. >> And figure out, so now you're going to... It's not Postgres to Postgres, but it might be Teradata to Redshift, or S3, that's going to be accessed by Athena or Redshift, and then let's put that in the right format. >> I think you sort of hit something that we've noticed is very powerful, which is if you can set up, and we've done this with a number of customers, if you can set up at the abstraction layer that is AtScale, on your on-prem data, literally in, say hours, you can move it into the Cloud, obviously you have to write the detail to move it into the Cloud, but once it's in the Cloud you take the same AtScale instance, you re-point it at that new data source, and it works. We've done that with multiple customers, and it's fast and effective, and it let's you actually try out things that you may not have the agility to do before because there's differences in how the SQL dialects work, there's differences in, potentially, how the schema might be built. >> So a couple things I'm interested in, I'm hearing two A-words, that abstraction that we've talked about a number of times, you also mention adaptability. So when you're talking with customers, what are some of the key business outcomes they need to drive, where adaptability and abstraction are concerned, in terms of like cost reduction, revenue generation. What are some of those see-swee business objectives that AtScale can help companies achieve? >> So looking at, say, a customer, a large retailer on the East Coast, everybody knows the stores, they're everywhere, they sell hardware. they have a 20-terabyte cube that they use for day-to-day revenue analytics. So they do period over period analysis. When they're looking at stores, they're looking at things like, we just tried out a new marketing approach... I was talking to somebody there last week about how they have these special stores where they completely redo one area and just see how that works. They have to be able to look at those analytics, and they run those for a short amount of time. So if you're window for getting data, refreshing data, building cubes, which in the old world could take a week, you know my co-founder at Yahoo, he had a week and a half build time. That data is now two weeks old, maybe three weeks old. There might be bugs in it-- >> And the relevance might be, pshh... >> And the relevance goes down, or you can't react as fast. I've been at companies where... Speed is so important these days, and the new companies that are grasping data aggressively, putting it somewhere where they can make decisions on it on a day-to-day basis, they're winning. And they're spending... I was at a company that was spending three million dollars on pay-per-click data, a month. If you can't get data everyday, you're on the wrong campaigns, and everything goes off the rails, and you only learn about it a week later, that's 25% of your spend, right there, gone. >> So the biggest thing, sorry George, it really sounds to me like what AtScale can facilitate for probably customers in any industry is the ability to truly make data-driven business decisions that can really directly affect revenue and profit. >> Yes, and in an agile format. So, you can build-- >> That's the third A; agile, adaptability, abstraction. >> There ya go, the three A's. (Lisa laughs) We had the three V's, now we have the three A's. >> Yes. >> The fact that you're building a curated model, so in retail the calendars are complex. I'm sure everybody that uses Tableau is good at analyzing data, but they might not know what your rules are around your financial calendar, or around the hierarchies of your product. There's a lot of things that happen where you want an enterprise group of data modelers to build it, bless it, and roll it out, but then you're a user, and you say, wait, you forgot x, y, and z, I don't want to wait a week, I don't want to wait two weeks, three weeks, a month, maybe more. I want that data to be available in the model an hour later 'cause that's what I get with Tableau today. And that's where we've taken the two approaches of enterprise analytics and self-service, and tried to create a scenario where you get the best of both worlds. >> So, we know that an implication of what you're telling us is that insights are perishable, and latency is becoming more and more critical. How do you plan to work with streaming data where you've got a historical archive, but you've got fresh data coming in? But fresh could mean a variety of things. Tell us what some of those scenarios look like. >> Absolutely, I think there's two approaches to this problem, and I'm seeing both used in practice, and I'm not exactly sure, although I have some theories on which one's going to win. In one case, you are streaming everything into, sort of a... like I talked about, this data lake, S3, and you're putting it in a format like ParK, and then people are accessing it. The other way is access the data where it is. Maybe it's already in, this is a common BI scenario, you have a big data store, and then you have a dimensional data store, like Oracle has your customers, Hadoop has machine data about those customers accessing on their mobile devices or something. If there was some way to access those data without having to move the Oracle stuff into the big data store, that's a Federation story that I think we've talked about in the Bay Area for a long time, or around the world for a long time. I think we're getting closer to understanding how we can do that in practice, and have it be tenable. You don't move the big data around, you move the small data around. For data coming in from outside sources it's probably a little bit more difficult, but it is kind of a degenerate version of the same story. I would say that streaming is gaining a lot of momentum, and with what we do, we're always mapping, because of the governance piece that we've built into the product, we're always mapping where did the data come from, where did it land, and how did we use it to build summary tables. So if we build five summary tables, 'cause we're answering different types of questions, we still need to know that it goes back to this piece of data, which has these security constraints, and these audit requirements, and we always track it back to that, and we always apply those to our derived data. So when you're accessing this automatically ETLed summary tables, it just works the way it is. So I think that there are two ways that this is going to expand and I'm excited about Federation because I think the time has come. I'm also excited about streaming. I think they can serve two different use cases, and I don't actually know what the answer will be, because I've seen both in customers, it's some of the biggest customers we have. >> Well Matthew thank you so much for stopping by, and four A's, AtScale can facilitate abstraction, adaptability, and agility. >> Yes. Hashtag four A's. >> There we go. I don't even want credit for that. (laughs) >> Oh wow, I'm going to get five more followers, I know it! (George laughs) >> There ya go! >> We want to thank you for watching theCUBE, I am Lisa Martin, we are live in San Jose, at our event Big Data SV, I'm with George Gilbert. Stick around, we'll be back with our next guest after a short break. (techno music)

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media, We are down the street from the Strata Data Conference. Thanks for having me. because it can spawn the genesis that is the crown jewels of your business. So if we take these, that can then deliver it to your end users. and the curated and it's simple, versus sort of the wild west, And it delivers it to the user that way. and they're going to deliver me the stuff and then you give them access to it? The data structures, the way that you store the data, And so that you can take data and it's almost like the data migration service but it might be Teradata to Redshift, and it let's you actually try out things they need to drive, and just see how that works. And the relevance goes down, or you can't react as fast. is the ability to truly make data-driven business decisions Yes, and in an agile format. We had the three V's, now we have the three A's. where you get the best of both worlds. How do you plan to work with streaming data and then you have a dimensional data store, and four A's, AtScale can facilitate abstraction, Yes. I don't even want credit for that. We want to thank you for watching theCUBE,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
MatthewPERSON

0.99+

George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

Matthew BairdPERSON

0.99+

GeorgePERSON

0.99+

San JoseLOCATION

0.99+

YahooORGANIZATION

0.99+

three weeksQUANTITY

0.99+

AmazonORGANIZATION

0.99+

25%QUANTITY

0.99+

GardnerPERSON

0.99+

two approachesQUANTITY

0.99+

OracleORGANIZATION

0.99+

two weeksQUANTITY

0.99+

RedshiftTITLE

0.99+

S3TITLE

0.99+

three million dollarsQUANTITY

0.99+

two waysQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

one caseQUANTITY

0.99+

85%QUANTITY

0.99+

last weekDATE

0.99+

a monthQUANTITY

0.99+

CenturyORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

a weekQUANTITY

0.99+

BigQueryTITLE

0.99+

bothQUANTITY

0.99+

20-terabyteQUANTITY

0.99+

GoogleORGANIZATION

0.99+

a week and a halfQUANTITY

0.99+

a week laterDATE

0.99+

Data Lake 2.0COMMERCIAL_ITEM

0.99+

twoQUANTITY

0.99+

tomorrow morningDATE

0.99+

AtScaleORGANIZATION

0.99+

AtlasORGANIZATION

0.99+

Bay AreaLOCATION

0.98+

LisaPERSON

0.98+

ParKTITLE

0.98+

TableauTITLE

0.98+

five more followersQUANTITY

0.98+

an hour laterDATE

0.98+

RangerORGANIZATION

0.98+

NetezzaORGANIZATION

0.98+

tonightDATE

0.97+

todayDATE

0.97+

both worldsQUANTITY

0.97+

about 8,000 usersQUANTITY

0.97+

theCUBEORGANIZATION

0.97+

Strata Data ConferenceEVENT

0.97+

oneQUANTITY

0.97+

Big Data SV 2018EVENT

0.97+

TeradataORGANIZATION

0.96+

AtScaleTITLE

0.96+

Big Data SVEVENT

0.93+

East CoastLOCATION

0.93+

HadoopTITLE

0.92+

two different use casesQUANTITY

0.92+

day oneQUANTITY

0.91+

one areaQUANTITY

0.91+

Scott Gnau, Hortonworks | Big Data SV 2018


 

>> Narrator: Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to the Cube's continuing coverage of Big Data SV. >> This is out tenth Big Data event, our fifth year in San Jose. We are down the street from the Strata Data Conference. We invite you to come down and join us, come on down! We are at Forager Tasting Room & Eatery, super cool place. We've got a cocktail event tonight, and a endless briefing tomorrow morning. We are excited to welcome back to the Cube, Scott Gnau, the CTO of Hortonworks. Hey, Scott, welcome back. >> Thanks for having me, and I really love what you've done with the place. I think there's as much energy here as I've seen in the entire show. So, thanks for having me over. >> Yeah! >> We have done a pretty good thing to this place that we're renting for the day. So, thanks for stopping by and talking with George and I. So, February, Hortonworks announced some news about Hortonworks DataFlow. What was in that announcement? What does that do to help customers simplify data in motion? What industries is it going to be most impactful for? I'm thinking, you know, GDPR is a couple months away, kind of what's new there? >> Well, yeah, and there are a couple of topics in there, right? So, obviously, we're very committed to, which I think is one of our unique value propositions, is we're committed to really creating an easy to use data management platform, as it were, for the entire lifecycle of data, from one data created at the edge and as data are streaming from one place to another place, and, at rest, analytics get run, analytics get pushed back out to the edge. So, that entire lifecycle is really the footprint that we're looking at, and when you dig a level into that, obviously, the data in motion piece is usually important, and So I think one a the things that we've looked at is we don't want to be just a streaming engine or just a tool for creating pipes and data flows and so on. We really want to create that entire experience around what needs to happen for data that's moving, whether it be acquisition at the edge in a protected way with provenance and encryption, whether it be applying streaming analytics as the data are flowing and everywhere kind of in between, and so that's what HDF represents, and what we released in our latest release, which, to your point, was just a few weeks ago, is a way for our customers to go build their data in motion applications using a very simple drag and drop GUI interface. So, they don't have to understand all of the different animals in the zoo, and the different technologies that are in play. It's like, "I want to do this." Okay, here's a GUI tool, you can have all of the different operators that are represented by the different underlying technologies that we provide as Hortonworks DataFlow, and you can stream them together, and then, you can make those applications and test those applications. One of the biggest enhancements that we did, is we made it very easy then for once those things are built in a laptop environment or in a dev environment, to be published out to production or to be published out to other developers who might want to enhance them and so on. So, the idea is to make it consumable inside of an enterprise, and when you think about data in motion and IOT and all those use cases, it's not going to be one department, one organization, or one person that's doing it. It's going to be a team of people that are distributed just like the data and the sensors, and, so, being able to have that sharing capability is what we've enhanced in the experience. >> So, you were just saying, before we went live, that you're here having speed dates with customers. What are some of the things... >> It's a little bit more sincere than that, but yeah. >> (laughs) Isn't speed dating sincere? It's 2018, I'm not sure. (Scott laughs) What are some of the things that you're hearing from customers, and how is that helping to drive what's coming out from Hortonworks? >> So, the two things that I'm hearing right, number one, certainly, is that they really appreciate our approach to the entire lifecycle of data, because customers are really experiencing huge data volume increases and data just from everywhere, and it's no longer just from the ERP system inside the firewall. It's from third party, it's from Sensors, it's from mobile devices, and, so, they really do appreciate kind of the territory that we cover with the tools and technologies we bring to market, and, so, that's been very rewarding. Clearly, customers who are now well into this path, they're starting to think about, in this new world, data governance, and data governance, I just took all of the energy out of the room, governance, it sounds like, you know, hard. What I mean by data governance, really, is customers need to understand, with all of this diverse, connected data everywhere, in the cloud, on PRIM, then Sensors, third party, partners, is, frankly, they need a trail of breadcrumbs that say what is it, where'd it come from, who had access to it, and then, what did they do with it? If you start to piece that together, that's what they really need to understand, the data estate that belongs to them, so they can turn that into refined product, and, so, when you then segway in one of your earlier questions, that GDPR is, certainly, a triggering point where if it's like, okay, the penalties are huge, oh my God, it's a whole new set of regulations that I have to comply with, and when you think about that trail of breadcrumbs that I just described, that actually becomes a roadmap for compliance under regulations like GDPR, where if a European customer calls up and says, "Forget my data.", the only way that you can guarantee that you forgot that person's data, is to actually understand where it all is, and that requires proper governance, tools, and techniques, and, so, when I say governance, it's, really, not like, you know, the governor and the government, and all that. That's an aspect, but the real, important part is how do I keep all of that connectivity so that I can understand the landscape of data that I've got access to, and I'm hearing a lot of energy around that, and when you think about an IOT kind of world, distributed processing, multiple hybrid cloud footprints, data is just everywhere, and, so, the perimeter is no longer fixed, it's kind of variable, and being able to keep track of that is a very important thing for our customers. >> So, continuing on that theme, Scott. Data lakes seem to be the first major new repository we added after we had data warehouses and data marts, and it looked like the governance solutions were sort of around that perimeter of the data lake. Tell us, you were alluding to, sort of, how many more repositories, whether at rest or in motion, there are for data. Do we have to solve the governance problem end-to-end before we can build meaningful applications? >> So, I would argue personally, that governance is one of the most strategic things for us as an industry, collectively, to go solve in a universal way, and what I mean by that, is throughout my career, which is probably longer than I'd like to admit, in an EDW centric world, where things are somewhat easier in terms of the perimeter and where the data came from, data sources were much more controlled, typically ERP systems, owned wholly by a company. Even in that era, true data governance, meta data management, and that provenance was never really solved adequately. There were 300 different solutions, none of which really won. They were all different, non-compatible, and the problem was easier. In this new world, with connected data, the problem is infinitely more difficult to go solve, and, so, that same kind of approach of 300 different proprietary solutions I don't think is going to work. >> So, tell us, how does that approach have to change and who can make that change? >> So, one of the things, obviously, that we're driving is we're leveraging our position in the open community to try to use the community to create that common infrastructure, common set of APIs for meta data management, and, of course, we call that Apache Atlas, and we work with a lot of partners, some of whom are customers, some of whom are other vendors, even some of whom could be considered competitors, to try to drive an Apache open source kind of project to become that standard layer that's common into which vendors can bring their applications. So, now, if I have a common API for tracking meta data in that trail of breadcrumbs that's commonly understood, I can bring in an application that helps customers go develop the taxonomy of the rules that they want to implement, and, then, that helps visualize all of the other functionality, which is also extremely important, and that's where I think specialization comes into play, but having that common infrastructure, I think, is a really important thing, because that's going to enable data, data lakes, IOT to be trusted, and if it's not trusted, it's not going to be successful. >> Okay, there's a chicken and an egg there it sounds like, potentially. >> Am I the chicken or the egg? >> Well, you're the CTO. (Lisa laughs) >> Okay. >> The thing I was thinking of was, the broader the scope of trust that you're trying to achieve at first, the more difficult the problem, do you see customers wanting to pick off one high value application, not necessarily that's about managing what's in Atlas, in the meta data, so much as they want to do an IOT app and they'll implement some amount of governance to solve that app. In other words, which comes first? Do they have to do the end-to-end meta data management and governance, or do they pick a problem off first? >> In this case, I think it's chicken or egg. I mean, you could start from either point. I see customers who are implementing applications in the IOT space, and they're saying, "Hey, this requires a new way to think of governance, "so, I'm going to go and build that out, but I'm going to "think about it being pluggable into the next app." I also see a lot of customers, especially in highly regulated industries, and especially in highly regulated jurisdictions, who are stepping back and saying, "Forget the applications, this is a data opportunity, "and, so, I want to go solve my data fabric, "and I want to have some consistency across "that data fabric into which I can publish data "for specific applications and guarantee "that, wholistically, I am compliant "and that I'm sitting inside of our corporate mission "and all of those things." >> George: Okay. >> So, one of the things you mention, and we talk about this a lot, is the proliferation of data. It's so many, so many different sources, and companies have an opportunity, you had mentioned the phrase data opportunity, there is massive opportunity there, but you said, you know, from even a GDR perspective alone, I can't remove the data if I don't know where it is to the breadcrumbs. As a marketer, we use terms like get a 360 degree view of your customer. Is that actually really something that customers can achieve leveraging a data. Can they actually really get, say a retailer, a 360, a complete view of their customer? >> Alright, 358. >> That's pretty good! >> And we're getting there. (Lisa laughs) Yeah, I mean, obviously, the idea is to get a much broader view, and 360 is a marketing term. I'm not a marketing person, >> Yes. But it, certainly, creates a much broader view of highly personalized information that help you interact with your customer better, and, yes, we're seeing customers do that today and have great success with it and actually change and build new business models based on that capability, for sure. The folks who've done that have realized that in this new world, the way that that works is you have to have a lot of people have access to a lot of data, and that's scary, because that's not the way it used to be, right? >> Right. >> It used to be you go to the DBA and you ask for access, and then, your boss has to sign off and say it's what you asked for. In this world, you need to have access to all of it. So, when you think about this new governance capability where as part of the governance integrated with security, personalized information can be encrypted, it can be blurred out, but you still have access to the data to look at the relationships to be found in the data to build out those sophisticated models. So, that's where not only is it a new opportunity for governance just because the sources, the variety at the different landscape, but it's, ultimately, very much required, because if you're the CSO, you're not going to give access to the marketing team all of its customer data unless you understand that, right, but it has to be, "I'm just giving it to you, "and I know that it's automatically protected." versus, "I'm going to let you ask for it." to be successful. >> Right. >> I guess, following up on that, it sounds like what we were talking about, chicken or egg. Are you seeing an accelerating shift from where data is sort of collected, centrally, from applications, or, what we hear on Amazon, is the amount coming off the edge is accelerating. >> It is, and I think that that is a big drive to, frankly, faster clouded option, you know, the analytic space, particularly, has been a laggard in clouded option for many reasons, and we've talked about it previously, but one of the biggest reasons, obviously, is that data has gravity, data movement is expensive, and, so, now, when you think about where data is being created, where it lives, being further out on the edge, and may live its entire lifecycle in the cloud, you're seeing a reversal of gravity more towards cloud, and that, again, creates more opportunities in terms of driving a more varied perimeter and just keeping track of where all the assets are. Finally, I think it also leads to this notion of managing entire lifecycle of data. One of the implications of that is if data is not going to be centralized, it's going to live in different places, applications have to be portable to move to where the data exists. So, when I think about that landscape of creating ubiquitous data management within Hortonworks' portfolio, that's one of the big values that we can create for our customers. Not only can we be an on-ramp to their hybrid architecture, but as we become that on-ramp, we can also guarantee the portability of the applications that they've built out to those cloud footprints and, ultimately, even out to the edge. >> So, a quick question, then, to clarify on that, or drill down, would that mean you could see scenarios where Hortonworks is managing the distribution of models that do the inferencing on the edge, and you're collecting, bringing back the relevant data, however that's defined, to do the retraining of any models or recreation of new models. >> Absolutely, absolutely. That's one of the key things about the NiFi project in general and Hortonworks DataFlow, specifically, is the ability to selectively move data, and the selectivity can be based on analytic models as well. So, the easiest case to think about is self-driving cars. We all understand how that works, right? A self-driving car has cameras, and it's looking at things going on. It's making decisions, locally, based on models that have been delivered, and they have to be done locally, because of latency, right, but, selectively, hey, here's something that I saw as an image I didn't recognize. I need to send that up, so that it can be added to my lexicon of what images are and what action should be taken. So, of course, that's all very futuristic, but we understand how that works, but that has application in things that are very relevant today. Think about jet engines that have diagnostics running. Do I need to send that terabyte of data an hour over an expensive thing? No, but I have a model that runs locally that says, "Wow, this thing looks interesting. "Let me send a gigabyte now for immediate action." So, that decision making capability is extremely important. >> Well, Scott, thanks so much for taking some time to come chat with us once again on the Cube. We appreciate your insights. >> Appreciate it, time flies. This is great. >> Doesn't it? When you're having fun! >> Yeah. >> Alright, we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at Forager Tasting Room in downtown San Jose at our own event, Big Data SV. We'd love for you to come on down and join us tonight, today, tonight, and tomorrow. Stick around, we'll be right back with our next guest after a short break. (techno music) >> Narrator: Since the dawn of the cloud, the Cube

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media Welcome back to the Cube's We are down the street from the Strata Data Conference. as I've seen in the entire show. What does that do to help customers simplify data in motion? So, the idea is to make it consumable What are some of the things... It's a little bit more from customers, and how is that helping to drive what's that I have to comply with, and when you think and it looked like the governance solutions the problem is infinitely more difficult to go solve, So, one of the things, obviously, Okay, there's a chicken and an egg there it sounds like, Well, you're the CTO. of governance to solve that app. "so, I'm going to go and build that out, but I'm going to So, one of the things you mention, is to get a much broader view, that help you interact with your customer better, in the data to build out those sophisticated models. off the edge is accelerating. if data is not going to be centralized, of models that do the inferencing on the edge, is the ability to selectively move data, to come chat with us once again on the Cube. This is great. Alright, we want to thank you for watching the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
GeorgePERSON

0.99+

ScottPERSON

0.99+

HortonworksORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

George GilbertPERSON

0.99+

Scott GnauPERSON

0.99+

Lisa MartinPERSON

0.99+

San JoseLOCATION

0.99+

FebruaryDATE

0.99+

360 degreeQUANTITY

0.99+

2018DATE

0.99+

tomorrowDATE

0.99+

358OTHER

0.99+

GDPRTITLE

0.99+

todayDATE

0.99+

tomorrow morningDATE

0.99+

fifth yearQUANTITY

0.99+

tonightDATE

0.99+

LisaPERSON

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

firstQUANTITY

0.99+

Hortonworks'ORGANIZATION

0.99+

one departmentQUANTITY

0.99+

one organizationQUANTITY

0.99+

two thingsQUANTITY

0.99+

360QUANTITY

0.98+

one personQUANTITY

0.98+

oneQUANTITY

0.98+

CubeORGANIZATION

0.97+

Strata Data ConferenceEVENT

0.96+

300 different solutionsQUANTITY

0.96+

an hourQUANTITY

0.95+

OneQUANTITY

0.95+

tenthQUANTITY

0.95+

300 different proprietary solutionsQUANTITY

0.95+

Big Data SV 2018EVENT

0.93+

few weeks agoDATE

0.92+

one dataQUANTITY

0.87+

AtlasTITLE

0.86+

Hortonworks DataFlowORGANIZATION

0.85+

Big DataEVENT

0.85+

CubeCOMMERCIAL_ITEM

0.84+

Silicon ValleyLOCATION

0.83+

EuropeanOTHER

0.82+

DBAORGANIZATION

0.82+

ApacheTITLE

0.79+

TastingORGANIZATION

0.76+

ApacheORGANIZATION

0.73+

CTOPERSON

0.72+

SensorsORGANIZATION

0.71+

downtown San JoseLOCATION

0.7+

Forager Tasting RoomLOCATION

0.67+

SVEVENT

0.66+

terabyte of dataQUANTITY

0.66+

NiFiORGANIZATION

0.64+

ForagerLOCATION

0.62+

Narrator:TITLE

0.6+

Big DataORGANIZATION

0.55+

RoomLOCATION

0.52+

EateryORGANIZATION

0.45+

Jacque Istok, Pivotal | Big Data SV 2018


 

>> Announcer: Live from San Jose, it's The Cube. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to The Cube, we are live in San Jose at Forager Eatery, a really cool place down the street from the Strata Data Conference. This is our 10th big data event, we call this BigData SV, we've done five here, five in New York, and this is our day one of coverage, I'm Lisa Martin with my co-host George Gilbert, and we're joined by a Cube alumni, Jacque Istok, the head of data from Pivotal. Welcome back to the cube, Jacque. >> Thank you, it's great to be here. >> So, just recently you guys announced, Pivotal announced, the GA of your Kubernetes-based Pivotal container service, PKS following this initial beta that you guys released last year, tell us about that, what's the main idea behind PKS? >> So, as we were talking about earlier, we've had this opinionated platform as a service for the last couple of years, it's taken off, but it really requires a very specific methodology for deploying microservices and kind of next gen applications, and what we've seen with the ground swell behind Kubernetes is a very seamless way where we can not just do our opinionated applications, we can do any applications leveraging Kubernetes. In addition, it actually allows us to again, kind of have an opinionated way to work with stateful, stateful data, if you will. And so, what you'll see is two of the main things we have going on, again, if you look at both of those products they're all managed by a thing we call Bosch and Bosch allows for not just the ease of installation, but also the actual operation of the entire platform. And so, what we're seeing is the ability to do day two operations not just around just the apps, not just the platform, but also the data products that run within it. And you'll see later this year as we continue to evolve our data products running on top of either the PKS product or the PCF product. >> Quick question before you jump in George, so you talk about some of the technology benefits and reasoning for that, from a customer perspective, what are some of the key benefits that you've designed this for, or challenges to solve? >> I'd say the key benefits, one is convenience and ease of installation, and operationalization. Kubernetes seems to have basically become the standard for being able to deploy containers, whether its on Pram or off Pram, and having an enterprise solution to do that is something that customers are actually really looking towards, in fact, we had sold about a dozen of these products even before it was GA there was so much excitement around it. But, beyond that, I think we've been really focused on this idea of digital transformation. So Pivotal's whole talk track really is changing how companies build software. And I think the introduction of PKS really takes us to the next level, which is that there's no digital transformation without data, and basically Kubernetes and PKS allow us to implement that and perform for our customers. >> This is really a facilitator of a company's digital transformation journey. >> Correct. In a very easy and convenient way, and I think, you know, whether it's our generation, or, you know, what's going on in just technology, but everybody is so focused on convenience, push button, I just want it to work. I don't want to have to dig into the details. >> So this picks up on a theme we've been pounding on for a couple of years on our side, which is the infrastructure was too hard to stand up and operate >> Male Speaker: Yeah. >> But now that we're beginning to solve some of those problems, talk about some of the use case. Let's pick GE because that's a flagship customer, start with some of the big outcomes, some of the big business outcomes they're shooting for and then how some of the pivotal products map into that. >> Sure, so there's a lot of use cases. Obviously, GE is both a large organization, as well as an investor inside of Pivotal. A lot of different things we can talk about one that comes to mind out of the gate is we've got a data suite we sell in addition to PKS and PCF, and within that data suite there are a couple of products, green plum being one of them. Green plum is this open source MPP data platform. Probably one of the most successful implementations within GE is this ability to actually consolidate a bunch of different ERP data and have people be able to querey it, again, cheaply, easily, effectively and there are a lot of different ways you can implement a solution like that. I think what's attractive to these guys specifically around green plum is that it leverages, you know, standard ANSI SQL, it scales to pedobytes of data, we have this ability to do on pram and off pram I was actually at the Gartner Conference earlier this week and walking around the show it was actually somewhat eye opening to me to be able to see that if you look at just that one product, there really isn't a competitive product that was being showcased that was open source, multi cloud, analytical in nature, et cetera. And so I think, again, to get back to the GE scenario, what was attractive to them was everything they're doing on pram can move to the cloud, whether it's Google, Azure, Amazon they can literally run the exact same product and the exact same queries. If you extend it beyond that particular use case, there are other use cases that are more real time, and again, inside of the data suite, we've got another product called gem fire, which is an in-memory data grid that allows for this rapid ingest, so you can kind of think and imagine whether it's jet engines, or whether it's wind turbines data is constantly being generated, and our ability to take that data in real time, ingest it, actually perform analytics on it as it comes in, so, again, kind of a loose example would be if you know the heat tolerance of a wind turbine is between this temperature and this temperature, do something: send an alarm, shut it down, et cetera. If you can do that in real time, you can actually save millions of dollars by not letting that turbine fail. >> Okay, it sounds here like the gem fire product and the green plum DBMS are very complimentary. You know, one is speed, and one is sort of throughput. And we've seen almost like with Hadupen overreaction in turning a coherent platform into a bunch of building blocks. >> Male Speaker: Yes. >> And with green plum you have everything packaged together. Would it be proper to think of green plum as combining the best of the data link and the data warehouse where you've got the data scientists and data engineers with what would have been another product and the business analysts and the BI crowd satisfied with the same product, but what would have been another? >> Male Speaker: So, I'd say you're spot on. What is super interesting to me is, one, I've been doing data warehousing now for, I don't know, 20 years, and for the last five, I've kind of felt like data warehouse, just the term, was equivalent to the mainframe. So, I actually kind of relegated it the I'm not going to use that term anymore, but with the advent of the cloud and with other products that are out there we're seeing this resurgence where the data warehouse is cool again, and I think part of it is because we had this shift where we had really expensive products doing the classic EDW and it was too rigid, and it was too expensive, and Haduke sort of came on and everyone was like hey this is really easy, this is really cheap, we can store whatever we want, we can do any kind of analytics, and I think, I was saying before, the love affair with piecing all of that together is kind of over and I also think, it's funny, it was really hard for organizations to successfully stand up a Haduke platform, and I think the metric we hear is fifty percent of them fail, right, so part of that, I believe is because there just aren't enough people to be able to do what needed to be done. So, interestingly enough, because of those failures, because the Haduke ecosystem didn't quite integrate into the classic enterprise, products like green plum are suddenly very popular. I was just seeing our downloads for the open source part of green plum, and we're literally, at this juncture seeing 1500 distinct customers leveraging the open source product, so I feel like we're on kind of this upswing of getting everybody to understand that you don't have to go to Haduke to be able to do structured to unstructured data at scale. You can actually use some of these other products. >> Female Speaker: Sorry George, quickly, being in the industry for 20 years, we talk about, you know, culture a lot, and we say cultural shift. People started embracing Haduke, we can dump everything that data lake turned into swamps. I'm curious though, what is that, maybe it's not a cultural shift, maybe it's a cultural roller coaster, like, mainframes are cool again. Give us your perspective on how you've helped companies like GE sort of as technology waves come really kind of help design and maybe drive a culture that embraces the velocity of this change. >> Sure, so one of the things we do a lot is help our customers better leverage technology, and really kind of train it. So, we have a couple different aspects to pivotal. One of them is our labs aspect, and effectively that is our ability to teach people how to better build applications, how to better do data science, how to better do data engineering. Now, when we come in, we have an opinionated way to do all those things, and when a customer embraces it it actually opens up a lot of doors. So we're somewhat technology agnostic, which aids in your question, right, so we can come in, we're not trying to push a specific technology, we're trying to push a methodology and an end goal and solution. And I think, you know, often times of course that end goal and solution is best met by our products, but to your point about the roller coaster, it seems as though as we have evolved there is a notion that data will, from an organization, will all come together in a common object store, and then the ability to quickly be able to spin up an analytical or a programmmatic interface within that data is super important and that's where we're kind of leaning, and that's where I think this idea of convenience being able to push button instantiate a green plum cluster, push button instantiate a gem fire grid so that you can do analytics or you can take actions on it is so super important. >> Male Speaker: You said something that sounds really important which is we want to get it sounded like you were alluding to a single source of truth, and then you spin up whatever compute, you bring it to the data. But there's an emerging, still early school of thought which is maybe the single source of truth should be a hub centered around real time streams. >> Male Speaker: Sure. Yeah. >> How does Pivotal play in that role? >> So, there are a lot of products that can help facilitate that including our own. I would say that there is a broad ecosystem that kind of says, if I was going to start an organization today there are a number of vertical products I would need in order to be successful with data. One of the would be just a standard relational database. And if I pause there for a second, if you look at it, there is definitely a move toward building microservices so that you can glue all those pieces together. Those microservices require smaller, simpler relational type databases, or you know, SQL type databases on the front end, but they become simpler and simpler where I think if I was Oracle or some of the more stalwart on the relational side, it's not about how many widgets you can put into the database, it's really about it's simplicity and performance. From there, having some kind of message queue or system to be able to take the changes and the updates of the data down the line so that, not so much IT providing it to an end user, but more self service, being able to subscribe to the data that I care about. And again, going back to the simplicity, me as an end user being able to take control of my destiny and use whatever product or technology makes the most sense to me and if I sort of dovetail on the side of that, we've focused so much this year on convenience and flexibility that I think it is now at a spot where all of the innovations that we're doing in the Amazon marketplace on green plum, all of those innovations are actually leading us to the same types of innovations in data deployments on top of Kubernetes. And so two of them that come to mind, I felt like, I was in front of a group last week and we were presenting some of the things we had done, and one of them was self-healing of green plum and so it's often been said that these big analytical solutions are really hard to operate and through our innovations we're able to have, if a segment goes down or a host goes down, or network problems, through the implementation the system will actually self heal itself, so all of a sudden the operational needs become quite a bit less. In addition, we've also created this automatic snapshotting capability which allows, I think our last benchmark we did about a pedobyte of data in less than three minutes, so suddenly you've got this operational stalwart, almost a database as a service without really being a service really just this living breathing thing. And that kind of dovetails back to where we're trying to make all of our products perform in a way that customers can just use them and not worry about the nuts and bolts of it. >> Female Speaker: So last question, we've got about 30 seconds left. You mentioned a lot of technologies but you mentioned methodology. Is that approach from Pivotal one of the defining competitive advantages that you deliver to the market? >> Male Speaker: It is 100 per cent one of our defining our defining things. Our methodology is what is enabling our customers to be successful and it actually allows me to say we've partnered with postcrestkampf and green plum summit this year is next month in April and the theme of that is hashtag data tells the story. And so, from our standpoint, green plum is continuing to take off, gem fire is continuing to take off, Kubernetes is continuing to take off, PCF is continuing to take off, but we believe that digital transformation doesn't happen without data. We think data tells a story. I'm here to encourage everyone to come to green plum summit, I'm also here to encourage everyone to share their stories with us on twitter, hashtag data tells a story, so that we can continue to broaden this ecosystem. >> Female Speaker: Hahtag data tells a story. Jacque, thanks so much for carving out some time this week to come back to the cube and share what's new and differentiating at Pivotal. >> Thank you. >> We want to thank you for watching The Cube. I'm Lisa Martin with my co-host George Gilbert. We are live at Big Data SV, our tenth big data event come down here, see us, we're in San Jose at Forrager eatery, we've got a great party tonight and also tomorrow morning at eight am we've got a breakfast briefing you wont' want to miss. Stick around, we'll be back with our next guest after a short break.

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media Welcome back to The Cube, we are live in San Jose and Bosch allows for not just the ease of installation, and having an enterprise solution to do that This is really a facilitator of a company's you know, whether it's our generation, But now that we're beginning to solve and again, inside of the data suite, we've got and the green plum DBMS are very complimentary. and the business analysts and the BI crowd of getting everybody to understand a culture that embraces the velocity of this change. and then the ability to quickly be able to Male Speaker: You said something that And that kind of dovetails back to where we're competitive advantages that you deliver to the market? and it actually allows me to say and share what's new and differentiating at Pivotal. we've got a breakfast briefing you wont' want to miss.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

BoschORGANIZATION

0.99+

GeorgePERSON

0.99+

twoQUANTITY

0.99+

GEORGANIZATION

0.99+

San JoseLOCATION

0.99+

fifty percentQUANTITY

0.99+

AmazonORGANIZATION

0.99+

JacquePERSON

0.99+

New YorkLOCATION

0.99+

Jacque IstokPERSON

0.99+

20 yearsQUANTITY

0.99+

last yearDATE

0.99+

PivotalORGANIZATION

0.99+

oneQUANTITY

0.99+

100 per centQUANTITY

0.99+

GoogleORGANIZATION

0.99+

last weekDATE

0.99+

Silicon ValleyLOCATION

0.99+

less than three minutesQUANTITY

0.99+

fiveQUANTITY

0.99+

The CubeTITLE

0.99+

HadukeORGANIZATION

0.99+

next monthDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

OracleORGANIZATION

0.99+

bothQUANTITY

0.99+

PKSORGANIZATION

0.98+

CubeORGANIZATION

0.98+

Strata Data ConferenceEVENT

0.98+

OneQUANTITY

0.97+

tonightDATE

0.97+

millions of dollarsQUANTITY

0.97+

this weekDATE

0.97+

tomorrow morning atDATE

0.97+

about a dozenQUANTITY

0.96+

AzureORGANIZATION

0.96+

earlier this weekDATE

0.96+

todayDATE

0.96+

SQLTITLE

0.96+

1500 distinct customersQUANTITY

0.96+

this yearDATE

0.95+

later this yearDATE

0.95+

last couple of yearsDATE

0.95+

Gartner ConferenceEVENT

0.95+

single sourceQUANTITY

0.94+

KubernetesORGANIZATION

0.94+

one productQUANTITY

0.93+

PramORGANIZATION

0.93+

10thQUANTITY

0.93+

PCFORGANIZATION

0.92+

tenthQUANTITY

0.92+

KubernetesTITLE

0.88+

about 30 secondsQUANTITY

0.86+

AprilDATE

0.86+

HahtagORGANIZATION

0.85+

about a pedobyte of dataQUANTITY

0.84+

2018DATE

0.84+

twitterORGANIZATION

0.84+

One of themQUANTITY

0.84+

gem fireORGANIZATION

0.83+

The CubeORGANIZATION

0.82+

Murthy Mathiprakasam, Informatica | Big Data SV 2018


 

>> Narrator: Live from San Jose, it's theCUBE. Presenting big data silicon valley, brought to you be Siliconangle Media and its ecosystem partner. >> Welcome back to theCUBE we are live in San Jose, at Forger Eatery, super cool place. Our first day of our two days of coverage at our event called Big Data SV. Down the street is the Strata Data Conference, and we've got some great guests today that are going to share a lot insight and different perspectives on Big Data. This is our 10th big data event on theCUBE, our fifth in San Jose. We invite you to come on down to Forger Eatery and we also invite you to come down this evening. We've got a party going on and we've got a really cool breakfast presentation on the analysis site in the morning. Our first guest is, needs no introduction to theCUBE, he's a Cube Alumni, Murthy Mathiprakasam, did I get that right? >> Murthy: Absolutely. >> Murthy, awesome, as we're going to call him. The director of product marketing for Informatica, welcome back to theCUBE, it's great to have you back. >> Thanks for having me back, and congratulations on the 10 year anniversary. >> Yeah! So, interesting, exciting news from Informatica in the last two days, tell us about a couple of those big announcements that you guys just released. >> Absolutely, yes. So this has been very exciting year for us lots of, you know product, innovations and announcements, so just this week alone, actually there's one announcement that's probably going out right now as we speak, around API management, so one of the things, probably taking about before we started interviews you know around the trend toward cloud, lots of people doing a lot more data integration and application integration in the cloud space. But they face all the challenges that we've always seen in the data management space. Around developer productivity, and hand coding, just a lot of complexity that organizations have around maintenance. So one of the things at Informatica always brought to every domain that we cover is this ability to kind of abstract the underlying complexity, use a graphical user interface, make things at the logical level instead of the physical level. So we're bringing that entire kind of paradigm to the API management space. That's going to be very exciting, very game changing on the kind of app-to-app integration side of things. Back on the data world of course, which is what we're, you know, mainly talking about here today. We're doing a lot there as well. So we announced kind of a next generation of our data management platforms for the big data world, part of that is also a lot of cloud capabilities. 'Cause again, one of the bigger trends. >> Peter: Have you made a big bet there? >> Absolutely, and I mean this is the investment, return on investments over like 10 years, right? We were started in a kind of cloud game about 10 years ago with our platform as a service offering. So that has been continuously innovated on and we've been engineering, re-imagining that, to now include more of the big data stuff in it too, because more and more people are building data lakes in the cloud. So it's actually quite surprising, you know the rate at which the data lake kind of projects are now either migrating or just starting in the cloud environments. So given that being the trend, we were kind of innovating against that as well. So now our platform is service offerings supports the ability to connect to data sources in the cloud natively. You can do processing and gestion in the cloud. So there's a lot of really cool capabilities, again it's kind of bringing the Informatica ease of use, and kind of acceleration that comes to platform approach to the cloud environment. And there's a whole bunch of other announcements too, I mean I could spend 20 minutes, just on different innovations, but you know bringing artificial intelligence into the platform so we can talk more about that. >> Well I want to connect what you just announced with the whole notion of the data lake, 'cause it's really Informatica strength has always been in between. And it turns out that where a lot the enterprise problems have been. So the data lake has been there, but it's been big, it's been large, it was big data and the whole notion is make this as big as you can and we'll figure out what to do with it later. >> Murthy: Right. >> And now you're doing the API which is just a indication that we're seeing further segmentation and a specificity, a targeting of how we're going to use data, the value that we create out of data and apply it to business problems. But really Informatica strength is been in between. >> Murthy: Absolutely. >> It's been in, knowing where you data is, it's been in helping to build those pipelines and managing those pipelines. How have the investments that you've made over the last few years, made it possible for you to actually deliver an API orientation, that will actually work for Enterprises? >> Yeah, absolutely, and I would actually phrase it as sort of platform orientation, but you're exactly right. So what's happening is, I view this as sort of maturation of a lot of these new technologies. You know Hadoop was a very very, as you were saying kind of experimental technology four or five years ago. And we had customers too who were kind of in that experimental phase. But what's happening now is, big data isn't just a conversation with data engineers and developers, we're talking to CDO's, and Chief Data Officers, and VP's of data infrastructures about using Hadoop for Enterprise scale projects, now the minute you start having a conversation with a Chief Data Officer, you're not just talking about simple tools for ingestion and stuff like that. You're talking about security, you're talking about compliance, you're talking about GDPR if you're in Europe. So there's a whole host of sort of data management challenges, that are now relevant for the big data world, just because the big data world has become main stream. And so this is exactly to your point, where the investments that I think Informatica has been making and bringing our kind of comprehensive platform oriented approach to this space are paying off. Because for Chief Data Officer, they can't really do big data without those features. They can't not deal with security and compliance, they can't not deal with not knowing what the data is. 'Cause they're accountable for knowing what the data is, right? And so, there's a number of things that by virtue of the maturation of the industry, I think that trends are pointing toward, the enterprises kind of going more toward that platform approach. >> On that platform approach Informatica's really one of the only vendors that's talking about that, and delivering it. So that clearly is an area of differentiation. Why do you think that's nascent. This platform approach verses a kind of fit-for-purpose approach. >> Yeah, absolutely. And we should be careful with even the phrase fit-for-purpose too, 'cause I think that word gets thrown around a lot as it's one of those buzz words in the industry. Because it's sort of the positive way of saying incomplete, you know? And so, I think there are vendors who have tried to kind of address, know you one aspect of sort of one feature of the entire problem, that a Chief Data Officer would care about. They might call it fit-for-purpose, but you have to actually solve a problem at the end of the day. The Chief Data Officer's are trying to build enterprise data pipelines. You know you've got raw information from all sorts of data sources, on premise, in the cloud. You need to push that through a process, like at manufacturing process of being able to ingest it, repair it, cleanse it, govern it, secure it, master it, all the stuff has to happen in order to serve all the various communities that a Chief Data Officer has to serve. And so you're either doing all that or you're not. You know, that's the problem, that way we see the problem. And so the platform approach is a way of addressing the comprehensive set of problems that a Chief Data Officer, or these kind of of Data Executives care about, but also do it in a way, that fosters productivity and re-usability. Because the more you sort of build things in a kind of infrastructure level way, as soon as the infrastructure changes you're hosed, right? So you're seeing a lot of this in the industry now too, where somebody built something in Mapreduce three years ago, as soon as Spark came out, they're throwing all that stuff away. And it's not just, you know, major changes like that, even versions of Spark, or versions of Hadoop, can sometimes trigger a need to recode and throw away stuff. And organization can't afford this. When you're talking about 40 to 50% growth in the data overall. The last thing you want to do is make an investment that you're going to end up throwing away. And so, the platform approach to go back to your question, is the sort of most efficient pathway from an investment stand point, that an enterprise can take, to build something now that they can actually reuse and maintain and kind of scale in a very very pragmatic way. >> Well, let me push you on that a little bit. >> Murthy: Yeah. >> 'Cause what we would say is that, the fit-to-purpose is okay so long as you're true about the purpose, and you understand what it means to fit. What a lot of the open source, a lot of companies have done, is they've got a fit-to-purpose but then they do make promises that they say, oh this is fit-to-purpose, but it's really a platform. And as a consequence you get a whole bunch of, you know, duck-like solutions, (laughing) That are, you know, are they swimming, or are they flying, kind of problems. So, I think that what we see clients asking for, and this is one of my questions, what we see clients asking for is, I want to invest in technologies that allow me to sustain my investments, including perhaps some of my mistakes, if they are generating business value. >> Murthy: Right. >> So it's not a rip and replace, that's not what you're suggesting, what you're suggesting I think is, you know, use what you got, if it's creating value continue to use it, and then over time, invest the platform approach that's able to generate additional returns on top of it. Have I got that right? >> Absolutely. So it goes back to flexibility, that's the key word, I think that's kind of on the minds of a lot of Chief Data Officers. I don't want to build something today, that I know I'm going to throw away a year from now. >> Peter: I want to create options for the future. >> Create options. >> When I build them today. >> Exactly. So even the cloud, you're bringing up earlier on, right? Not everybody knows exactly what their cloud strategy is. And it's changing extremely rapidly, right? We had almost, we were seeing very few big data customers in the cloud maybe even a year or two ago? Now we're close to almost 50% of our big data business is people deploying off premise, I mean that's amazing, you know in a period of just a year or two. So Chief Data Officers are having to operate in these extreme kind of high velocity environments. The last thing you want to do is make a bet today, with the knowledge that you're going to end up having to throw away that bet in six months or a year. So the platform approach is sort of like your insurance policy because it enables you to design for today's requirements, but then very very quickly migrate or modify for new requirements that maybe be six months, a year or two down the line. >> On that front, I'd love for you to give us an example of a customer that has maybe in the last year, since you've seen so much velocity, come to you. But also had other technologies and their environment that from a cost perspective, I mean but at Peter's point there's still generating value, business value. How do you help customers that have multiple different products maybe exploring different multi-calibers, how to they come and start working with Informatica and not have to rip out other stuff, but be able to move forward and achieve ROI? >> So, it's really interesting kind of how people think about the whole rip and replace concept. So we actually had a customer dinner last night and I'm sitting next to a guy, and I was kind of asking very similar question. Tell me about your technology landscape, you know where are things going, where have things gone in the past, and he basically said there's a whole portfolio of technologies that they plan to obsolete. 'Cause they just know that, like they're probably, they don't even bother thinking about sustainability, to your point. They just want to use something just to kind of try it out. It's basically like a series of like three month trails of different technologies. And that's probably why we such proliferation of different technologies, 'cause people are just kind of trying stuff out, but it's like, I know I'm going to throw this stuff out. >> Yeah but that's, I mean, let me make sure I got that. 'Cause I want to reconcile a point. That's if they're in pilot and the pilot doesn't work. But the minute it goes into production and values being created they want to be able to sustain that stream of value. >> This is production environment. I'm glad you asked that question. So this is a customer that, and I'll tell you where I'm going to the point. So they've been using Informatica for over four years, for big data which is essentially almost the entire time big data's been around. So the reason this customers making the point is, Informatica's the only technology that is actually sustained precisely for the point that you're bringing up, because their requirements have changed wildly during this time. Even the internal politics of who needs access to data, all of that has changed radically over these four years. But the platform has enabled them to actually make those changes, and it's you know, been able to give them that flexibly. Everything else as far as, you know, developer tools, you know, visualization tools, like every year there's some kind of new thing that sort of comes out. And I don't want to be terribly harsh, there's probably one or two kind of vendors that have also persisted in those other areas. But, the point that they were trying to make to your original point is, is the point about sustainability. Like, at some point to avoid complete and utter chaos, you got to have like some foundation in the data environment. Something actually has to be something you can invest in today, knowing that as these changes internally externally are happening, you can kind of count on it and you can go to cloud you can be on Premise, you can have structured data, unstructured data, you know, for any type of data, any type of user, any type of deployment environment. I need something that I can count on, that's actually existing for four or more years. And that's where Informatica fits in. And meanwhile there's going to be a lot of other tools that, like this guy was saying, they're going to try out for three month or six months and that's great, but they're almost using it with the idea that they're going to throw it away. >> Couple questions here; What are some of the business values that you were, stating like this gentlemen, that you ere talking to last night. What's the industry that's he in and also, are there any like stats or ranges you can give us. Like, reduction in TCO, or new business models opening up. What's the business impact that Informatica is helping these customers achieve. >> Yeah, absolutely, I'll use this example, he's, I can't mention the name of the company but it's an insurance company. >> Lisa: Lot's of data. >> Lots of data, right. Not only do they have a lot of data, but there's a lot of sensitivity around the data. Because basically the only way they grow is by identifying patterns in consumers and they want to look at it if somebody's using car insurance in, maybe it for so long they're ready to get married, they need home insurance, they have these like really really sophisticated models around human behavior. So they know when to go and position new forms of insurance. There's also obviously security government types of issues that are at play as well. So the sensitivity around data is very very important. So for them, the business value is increased revenue, and you know ability to meet kind of regulatory pressure. I think that's generally, I mean every industry has some variant of that. >> Right. >> Cost production, increase revenue, you know meeting regulatory pressures. And so Informatica facilitates that, because instead of having to hire armies of people, and having to change them out maybe every three months or six months 'cause the underlying infrastructures changing, there's this one team, the Informatica team that's actually existed for this entire journey. They just keep changing, used cases, and projects, and new data sets, new deployment models, but the platform is sort of fixed and it's something that they can count on it's robust, it enables that kind of. >> Peter: It's an asset. >> It's an asset that delivers that sustainable value that you were taking about. >> Last question, we've got about a minute left, in terms of delivering value, Informatica not the only game in town, your competitors are kind of going with this MNA partnership approach. What makes Informatica stand out, why should companies consider Informatica? >> So they say like, what there's a quote about it. Imitation is the most sincere from of flattery. Yeah! (laughing) I guess we should feel as a little bit flattered, you know, by what we're seeing in the industry, but why from a customers stand point should they, you know continue to rely on Informatica. I mean we keep pushing the envelope on innovations, right? So, one the other areas that we innovated on is machine learning within the platform, because ultimately if one of the goals of the platform is to eliminate manual labor, a great way to do that is to just not have people doing it in the first place. Have machines doing it. So we can automatically understand the structure of data without any human intervention, right? We can understand if there's a file and it's got costumer names and you know, cost and skews, it must be an order. You don't actually have to say that it's an order. We can infer all this, because of the machine learning them we have. We can give recommendations to people as they're using our platform, if you're using a data set and you work with another person, we can go to you and say hey, maybe this is a data set that you would be interesting in. So those types of recommendations, predictions, discovery, totally changes the economic game for an organization. 'Cause the last thing you want is to have 40 to 50% growth in data translate into 40 to 50% of labor. Like you just can't afford it. It's not sustainable, again, to go back to your original point. The only sustainable approach to managing data for the future, is to have a machine learning based approach and so that's why, to your question, I think just gluing a bunch of stuff together still doesn't actually get to nut of sustainability. You actually have to have, the glue has to have something in it, you know? And in our case it's the machine learning approach that ties everything together that brings a data organization together, so they can actually deliver the maximum business value. >> Literally creates a network of data that delivers business value. >> You got it. >> Well Murthy, Murthy Awesome, thank you so much for coming back to theCUBE. >> Thank you! >> And sharing what's going on the Informatica and what's differentiating you guys. We wish you a great rest of the Strata Conference. >> Awesome, you as well. Thank you. >> Absolutely, we want to thank you for watching theCUBE. I'm Lisa Martin with Peter Burris, we are live in San Jose at the Forger Eatery, come down here and join us, we've got a really cool space, we've got a part-tay tonight, so come join us. And we've got a really interesting breakfast presentation tomorrow morning, stick around and we'll be right back, with our next guest for this short break. (fun upbeat music)

Published Date : Mar 7 2018

SUMMARY :

brought to you be Siliconangle Media and we also invite you to come down this evening. welcome back to theCUBE, it's great to have you back. and congratulations on the 10 year anniversary. big announcements that you guys just released. of our data management platforms for the big data world, and kind of acceleration that comes to platform approach So the data lake has been there, and apply it to business problems. for you to actually deliver an API orientation, now the minute you start having a conversation Informatica's really one of the only vendors And so, the platform approach to go back to your question, about the purpose, and you understand what it means to fit. you know, use what you got, that I know I'm going to throw away a year from now. So even the cloud, you're bringing up earlier on, right? that has maybe in the last year, of technologies that they plan to obsolete. But the minute it goes into production But the platform has enabled them to actually make What are some of the business values that you were, he's, I can't mention the name of the company and you know ability to meet kind of regulatory pressure. and it's something that they can count on it's robust, that you were taking about. Informatica not the only game in town, the glue has to have something in it, you know? that delivers business value. thank you so much for coming back to theCUBE. and what's differentiating you guys. Awesome, you as well. Absolutely, we want to thank you for watching theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
40QUANTITY

0.99+

Lisa MartinPERSON

0.99+

Peter BurrisPERSON

0.99+

EuropeLOCATION

0.99+

LisaPERSON

0.99+

PeterPERSON

0.99+

six monthsQUANTITY

0.99+

San JoseLOCATION

0.99+

Murthy MathiprakasamPERSON

0.99+

three monthQUANTITY

0.99+

InformaticaORGANIZATION

0.99+

MurthyPERSON

0.99+

20 minutesQUANTITY

0.99+

oneQUANTITY

0.99+

two daysQUANTITY

0.99+

fourQUANTITY

0.99+

twoQUANTITY

0.99+

first dayQUANTITY

0.99+

a yearQUANTITY

0.99+

tomorrow morningDATE

0.99+

Siliconangle MediaORGANIZATION

0.99+

fourDATE

0.99+

one teamQUANTITY

0.99+

MNAORGANIZATION

0.99+

last yearDATE

0.99+

last nightDATE

0.99+

todayDATE

0.99+

fifthQUANTITY

0.99+

50%QUANTITY

0.98+

first guestQUANTITY

0.98+

four yearsQUANTITY

0.98+

tonightDATE

0.98+

CubeORGANIZATION

0.98+

three years agoDATE

0.98+

Big Data SVEVENT

0.97+

five years agoDATE

0.97+

firstQUANTITY

0.96+

Couple questionsQUANTITY

0.96+

Strata Data ConferenceEVENT

0.95+

almost 50%QUANTITY

0.95+

10 year anniversaryQUANTITY

0.94+

over four yearsQUANTITY

0.94+

two kindQUANTITY

0.94+

one announcementQUANTITY

0.94+

GDPRTITLE

0.94+

two agoDATE

0.92+