David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
David Abercrombie	PERSON	0.99+
Michael Nixon	PERSON	0.99+
Michael	PERSON	0.99+
June	DATE	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Scala	TITLE	0.99+
first	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
five-minute	QUANTITY	0.99+
Snowflake	TITLE	0.99+
Christmas	EVENT	0.98+
Strata Data Conference	EVENT	0.98+
three-layer	QUANTITY	0.98+
first day	QUANTITY	0.98+
a dozen	QUANTITY	0.98+
two petabytes	QUANTITY	0.97+
Sharethrough	ORGANIZATION	0.97+
JSON	TITLE	0.97+
SQL	TITLE	0.96+
one place	QUANTITY	0.95+
six months	QUANTITY	0.94+
Forager Tasting Room & Eatery	ORGANIZATION	0.91+
today	DATE	0.89+
Snowflake	ORGANIZATION	0.87+
Spark	TITLE	0.87+
12 billion impressions a month	QUANTITY	0.87+
Machine Learning	TITLE	0.84+
Big Data	ORGANIZATION	0.84+
billions of impressions	QUANTITY	0.8+
CuBOL	TITLE	0.79+
Big Data SV 2018	EVENT	0.77+
once	QUANTITY	0.72+
theCUBE	ORGANIZATION	0.63+
JSONs	TITLE	0.61+
times	QUANTITY	0.55+

Octavian Tanase, NetApp | Big Data SV 2018

>> Announcer: Live from San Jose it's The Cube presenting Big Data, Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partners. >> Good morning. Welcome to The Cube. We are on day two of our coverage our event Big Data SV. I'm Lisa Martin with my cohost Dave Vellante. We're down the street from the Strata Data Conference. This is The Cube's tenth big data event and we had a great day yesterday learning a lot from myriad guests on very different nuances of big data journey where things are going. We're excited to welcome back to The Cube an alumni, Octavian Tanase, the Senior Vice President of Data ONTAP fron Net App. Octavian, welcome back to The Cube. >> Glad to be here. >> So you've been at the Strata Data Conference for the last couple of days. From a big data perspective, what are some of the things that you're hearing, in terms of from a customer's perspective on what's working, what challenges, opportunities? I'm very excited to be here and learn about the innovation of our partners in the industry and share with our partners and our customers what we're doing to enable them to drive more value out of that data. The reality is that data has become the 21st Century gold or oil that powers the business and everybody's looking to apply new techniques, a lot of times machine learning, deep learning, to draw more value of the data, make better decisions and compete in the marketplace. Octavian, you've been at NetApp now eight years and I've been watching NetApp, as we were talking about offline, for decades and I've seen the ebb and flow and this company has transformed many, many times. The latest, obviously cloud came in, flash came into play and then you're also going through a major transition in the customer based to clustered ONTAP. You seemed to negotiate that. NetApp is back, thriving, stock's up. What's happening at NetApp? What's the culture like these days? Give us the update. >> I think we've been very fortunate to have a CEO like George Kurian, who has been really focused on helping us do basically fewer things better, really focus on our core business, simplify our operations and continue to innovate and this is probably the area that I'm most excited about. It's always good to make sure that you accelerate the business, make it simpler for your customers and your partners to do business with you, but what you have to do is innovate. We are a product company. We are passionate about innovation. I believe that we are innovating with more pace than many of the startups in the space so that's probably the most exciting thing that has been part of our transformation. >> So let's talk about big data. Back in the day if you had a big data problem you would buy a big Unix box, maybe buy some Oracle licenses, try to put all your data into that box and that became your data warehouse. The brilliance of Hadoop was hey we can leave the data where it is. There's too much data to put into the box so we're going to bring five megabytes to code to a petabyte of data. And the other piece of it is CFOs loved it, because we're going to reduce the cost of our expensive data warehouse and we're going to buy off the shelf components: white box, servers and off the shelf disk drives. We're going to put that together and life will be good. Well as things matured, the old client-server days, it got very expensive, you needed enterprise grade. So where does NetApp fit into that equation, because originally big storage companies like NetApp, they weren't part of the equation? Has that changed? >> Absolutely. One of the things that has enabled that transformation, that change is we made a deliberate decision to focus on software defined and making sure that the ONTAP operating system is available wherever data is being created: on the edge in an IoT device, in the traditional data center or in the cloud. So we are in the unique position to enable analytics, big data, wherever those applications reside. One of the things that we've recently done is we've partnered with IDC and what the study, what the analysis has shown is that deploying in analytics, a Hadoop or NoSQL type of solution on top of NetApp is half the cost of DAS. So when you consider the cost of servers, the licenses that you're going to have to pay for, these commercial implementations of Hadoop as well as the storage and the data infrastructure, you are much better off choosing NetApp than a white box type of solution. >> Let's unpack that a little bit, because if I infer correctly from what you said normally you would say the operational costs are going to be dramatically lower, it's easier to manage a professional system like a NetApp ONTAP, it's integrated, great software, but am I hearing you correctly, you're saying the acquisition costs are actually less than if I'm buying white box? A lot of people are going to be skeptical about that, say Octavian no way, it's cheaper to buy white box stuff. Defend that statement. >> Absolutely. If you're looking at the whole solution that includes the server and the storage, what NetApp enables you to do if you're running the solution on top of ONTAP you reduce the need for so many servers. If you reduce that number you also reduce the licensing cost. Moreover, if you actually look at the core value proposition of the storage layer there, DAS typically makes three copies of the data. We don't. We are very greedy and we're making sure that you're using shared storage and we are applying a bunch of storage efficiency techniques to further compress, compact that data for world class storage efficiency. >> So cost efficiency is obviously a great benefit for any company when they're especially evolving, from a digital perspective. What are some of the business level benefits? You mentioned speed a minute ago. What is Data ONTAP and even ONTAP in the cloud enabling your enterprise customers to achieve at the business level, maybe from faster time to market, identifying with machine learning and AI new products? Give me an example of maybe a customer that you think really articulates the value that ONTAP in the cloud can deliver. >> One of the things that's really important is to have your data management capability, whatever the data is being produced so ONTAP being consumed either as a VM or a service ... I don't know if you've seen some of the partnerships that we have with AWS and Azure. We're able to offer the same rich data management capabilities, not only the traditional data center, but in the cloud. What that really enables customers to do is to simplify and have the same operating system, the same data management platform for the both the second platform traditional applications as well as for the third platform applications. I've seen a company like Adobe be very successful in deploying their infrastructure, their services not only on prem in their traditional data center, but using ONTAP Cloud. So we have more than about 1,500 customers right now that have adopted ONTAP in the AWS cloud. >> What are you seeing in terms of the adoption of flash and I'm particularly interested in the intersection of flash adoption and the developer angle, because we've seen, in certain instances, certain organizations are able to share data off of flash much more efficiently that you would be, for instance, of a spinning disk? Have you seen a developer impact in your customer base? >> Absolutely I think most of customers initially have adopted flash, because of high throughput and low latency. I think over time customers really understood and identified with the overall value proposition in cost of ownership in flash that it enables them to consolidate multiple workloads in a smaller footprint. So that enables you to then reduce the cost to operate that infrastructure and it really gives you a range of applications that you can deploy that you were never able to do that. Everybody's looking to do in place, in line analytics that now are possible, because of this fast media. Folks are looking to accelerate old applications in which they cannot invest anymore, but they just want to run faster. Flash also tends to be more reliable than traditional storage, so customers definitely appreciate that fewer things could go wrong so overall the value proposition of flash, it's all encompassing and we believe that in the near future flash will be the defacto standard in everybody's data center, whether it's on prem or in the cloud. >> How about backup and recovery in big data? We obviously, in the enterprise, very concerned about data protection. What's similar in big data? What's different and what's NetApp's angle on that? >> I think data protection and data security will never stop being important to our customers. Security's top of mind for everybody in the industry and it's a source of resume changing events, if you would, and they're typically not promotions. So we have invested a tremendous deal in certifications for HIPAA, for FIPS, we are enabling encryption, both at rest and in flight. We've done a lot of work to make sure that the encryption can happen in software layer, to make sure that we give the customers best storage class efficiency and what we're also leveraging is the innovation that ONTAP has done over many years to protect the data, replicate its snapshots, peering the data to the cloud. These are techniques that we're commonly using to reduce the cost of ownership, also protect the data the customers deploy. >> So security's still a hot topic and, like you said, it probably always will be, but it's a shared responsibility, right? So customers leveraging NetApps safe or on prem hybrid also using Azure or AWS, who's your target audience? If you're talking to the guys and gals that are still managing storage are you also having the CSO or the security guys come in, the gals, to understand we've got this appointment in Azure or AWS so we're going to bring in ONTAP to facilitate this? There's a shared responsibility of security. Who's at the table, from your perspective, in your customers that you need to help understand how they facilitate true security? >> It's definitely been a transformative event where more and more people in IQ organizations are involved in the decisions that are required to deploy the applications. There was a time when we would talk only to the storage admin. After a while we started talking to the application admin, the virtualization admin and now you're talking to the line of business who has that vested interest to make sure that they can harness the power of the data in their environment. So you have the CSO, you have the traditional infrastructure people, you have the app administration and you have the app owner, the business owner that are all at the table that are coming and looking to choose the best of breed solution for their data management. >> What are the conversations like with your CXO, executives? Everybody talks about digital transformation. It's kind of an overused term, but there's real substance when you actually peel the onion. What are you seeing as NetApp's role in effecting digital transformations within your customer base? >> I think we have a vision of how we can help enterprises take advantage of the digital transformation and adopt it. I think we have three tenants of that vision. Number one is we're helping customers harness the power of the cloud. Number two, we're looking to enable them to future proof their investments and build the next generation data center. And number three, nobody starts with a fresh slate so we're looking to help customers modernize their current infrastructure through storage. We have a lot of expertise in storage. We've helped, over time, customers time and again adopt disruptive technologies in nondisruptive ways. We're looking to adopt these technologies and trends on behalf of our customers and then help them use them in a seamless safe way. >> And continue their evolution to identify new revenue streams, new products, new opportunities and even probably give other lines of business access to this data that they need to understand is there value here, how can we harness it faster than our competitors, right? >> Absolutely. It's all about deriving value out of the data. I think earlier I called it the gold of the 21st Century. This is a trend that will continue. I believe there will be no enterprise or center that won't focus on using machine learning, deep learning, analytics to derive more value out of the data to find more customer touch points, to optimize their business to really compete in the marketplace. >> Data plus AI plus cloud economics are the new innovation drivers of the next 10, 20 years. >> Completely agree. >> Well Octavian thanks so much for spending time with us this morning sharing what's new at NetApp, some of the visions that you guys have and also some of the impact that you're making with customers. We look forward to having you back on the program in the near future. >> Thank you. Appreciate having the time. >> And for my cohost Dave Vellante I'm Lisa Martin. You're watching The Cube live on day two of coverage of our event, Big Data SV. We're at this really cool venue, Forager Tasting Room. Come down here, join us, get to hear all these great conversations. Stick around and we'll be right back with our next guest after a short break. (electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media We're down the street from the Strata Data Conference. in the customer based to clustered ONTAP. that you accelerate the business, Back in the day if you had a big data problem and making sure that the ONTAP operating system A lot of people are going to be skeptical about that, that includes the server and the storage, that ONTAP in the cloud can deliver. that have adopted ONTAP in the AWS cloud. to operate that infrastructure and it really gives you We obviously, in the enterprise, peering the data to the cloud. that you need to help understand that are required to deploy the applications. What are the conversations like with your CXO, executives? and build the next generation data center. out of the data to find more customer touch points, are the new innovation drivers of the next 10, 20 years. We look forward to having you back on the program Appreciate having the time. get to hear all these great conversations.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
George Kurian	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Octavian Tanase	PERSON	0.99+
Adobe	ORGANIZATION	0.99+
Octavian	PERSON	0.99+
AWS	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
San Jose	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
NetApp	TITLE	0.99+
Hadoop	TITLE	0.99+
five megabytes	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
second platform	QUANTITY	0.99+
21st Century	DATE	0.99+
HIPAA	TITLE	0.99+
Strata Data Conference	EVENT	0.99+
yesterday	DATE	0.99+
ONTAP	TITLE	0.99+
The Cube	TITLE	0.99+
IDC	ORGANIZATION	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
Unix	COMMERCIAL_ITEM	0.98+
NetApp	ORGANIZATION	0.97+
The Cube	ORGANIZATION	0.97+
Silicon Valley	LOCATION	0.96+
ONTAP Cloud	TITLE	0.95+
more than about 1,500 customers	QUANTITY	0.95+
NetApps	TITLE	0.93+
Big Data SV	EVENT	0.93+
Big Data SV 2018	EVENT	0.93+
day two	QUANTITY	0.93+
Forager Tasting Room	LOCATION	0.88+
NoSQL	TITLE	0.87+
Azure	ORGANIZATION	0.86+
third platform applications	QUANTITY	0.81+
a minute ago	DATE	0.81+
Number two	QUANTITY	0.8+
Senior Vice President	PERSON	0.79+
three tenants	QUANTITY	0.78+
decades	QUANTITY	0.74+
a petabyte of data	QUANTITY	0.73+
tenth big	QUANTITY	0.71+
Number one	QUANTITY	0.71+
three copies	QUANTITY	0.7+
this morning	DATE	0.69+
number three	QUANTITY	0.68+
ONTAP	ORGANIZATION	0.67+
Data ONTAP	ORGANIZATION	0.64+
event	QUANTITY	0.64+
Net App	TITLE	0.64+
10	QUANTITY	0.64+
half	QUANTITY	0.6+
flash	TITLE	0.58+
much	QUANTITY	0.58+
Big Data	EVENT	0.57+
years	QUANTITY	0.55+

Daniel Raskin, Kinetica | Big Data SV 2018

>> Narrator: Live, from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners (mellow electronic music) >> Welcome back to theCUBE, on day two of our coverage of our event, Big Data SV. I'm Lisa Martin, my co-host is Peter Burris. We are the down the street from the Strata Data Conference, we've had a great day yesterday, and great morning already, really learning and peeling back the layers of big data, challenges, opportunities, next generation, we're welcoming back to theCUBE an alumni, the CMO of Kinetica, Dan Raskin. Hey Dan, welcome back to theCUBE. >> Thank you, thank you for having me. >> So, I'm a messaging girl, look at your website, the insight engine for the extreme data economy. Tell us about the extreme data economy, and what is that, what does it mean for your customers? >> Yeah, so it's a great question, and, from our perspective, we sit, we're here at Strata, and you see all the different vendors kind of talking about what's going on, and there's a little bit of word spaghetti out there that makes it really hard for customers to think about how big data is affecting them today, right? And so, what we're actually looking at is the idea of, the world's changed. That, big data from five years ago, doesn't necessarily address all the use cases today. If you think about what customers are going through, you have more users, devices, and things coming on, there's more data coming back than ever before, and it's not just about creating the data driven business, and building these massive data lakes that turn into data swamps, it's really about how do you create the data-powered business. So when we're using that term, we're really trying to call out that the world's changed, that, in order for businesses to compete in this new world, they have to think about to take data and create CoreIP that differentiates, how do I use it to affect the omnichannel, how do I use it to deal with new things in the realm of banking and Fintech, how do I use it to protect myself against disruption in telco, and so, the extreme data economy is really this idea that you have business in motion, more things coming online ever before, how do I create a data strategy, where data is infused in my business, and creates CoreIP that helps me maintain category leadership or grow. >> So as you think about that challenge, there's a number of technologies that come into play. Not least of which is the industry, while it's always to a degree been driven by what hardware can do, that's moderated a bit over time, but today, in many respects, a lot of what is possible is made possible, by what hardware can do, and what hardware's going to be able to do. We've been using similar AI algorithms for a long time. But we didn't have the power to use them! We had access to data, but we didn't have the power to acquire and bring it in. So how is the relationship between your software, and your platform, and some of the new hardware that's becoming available, starting to play out in a way of creating value for customers? >> Right, so, if you think about this in terms of this extreme data concept, and you think about it in terms of a couple of things, one, streaming data, just massive amounts of streaming data coming in. Billions of rows that people want to take and translate into value. >> And that data coming from-- >> It's coming from users, devices, things, interacting with all the different assets, more edge devices that are coming online, and the Wild West essentially. You look at the world of IoT and it's absolutely insane, with the number of protocols, and device data that's coming back to a company, and then you think about how do you actually translate this into real-time insight. Not near real-time, where it's taking seconds, but true millisecond response times where you can infuse this into your business, and one of our whole premises about Kinetica is the idea of this massive parallel compute. So the idea of not using CPUs anymore, to actually drive the powering behind your intelligence, but leveraging GPUs, and if you think about this, a CPU has 64 cores, 64 parallel things that you can do at a time, a GPU can have up to 6,000 cores, 6,000 parallel things, so it's kind of like lizard brain verse modern brain. How do you actually create this next generation brain that has all these neural networks, for processing the data, in a way that you couldn't. And then on top of that, you're using not just the technology of GPUs, you're trying to operationalize it. So how do you actually bring the data scientist, the BI folks, the business folks all together to actually create a unified operational process, and the underlying piece is the Kinetica engine and the GPU used to do this, but the power is really in the use cases of what you can do with it, and how you actually affect different industries. >> So can you elaborate a little bit more on the use cases, in this kind of game changing environment? >> Yeah, so there's a couple of common use cases that we're seeing, one that affects every enterprise is the idea of breaking down silos of business units, and creating the customer 360 view. How do I actually take all these disparate data feeds, bring them into an engine where I can visualize concepts about my customer and the environment that they're living in, and provide more insight? So if you think about things like Whole Foods and Amazon merging together, you now have this power of, how do I actually bridge the digital and physical world to create a better omnichannel experience for the user, how do I think about things in terms of what preferences they have, personalization, how to actually pair that with sensor data to affect how they actually navigate in a Whole Foods store more efficiently, and that's affecting every industry, you could take that to banking as well and think about the banking omminchannel, and ATMs, and the digital bank, and all these Fintech upstarts that are working to disrupt them. A great example for us is the United States Postal Service, where we're actually looking at all the data, the environmental data, around the US Postal Service, we're able to visualize it in real-time, we're able to affect the logistics of how they actually navigate through their routes, we're able to look things like postal workers separating out of their zones, and potentially kicking off alerts around that, so effectively making the business more efficient. But, we've moved into this world where we always used to talk about brick and mortar going to cloud, we're now in this world where the true value is how you bridge the digital and physical world, and create more transformative experiences, and that's what we want to do with data. So it could be logistics, it could be omnichannel, it could be security, you name it. It affects every single industry that we're talking about. >> So I got two questions, what is Kinetica's contribution to that, and then, very importantly, as a CMO, how are you thinking about making sure that the value that people are creating, or can create with Kinetica, gets more broadly diffused into an ecosystem. >> Yeah, so the power that we're bringing is the idea of how to operationalize this in a way where again, you're using your data to create value, so, having a single engine where you're collecting all of this data, massive volumes of data, terabytes upon terabytes of data, enabling it where you can query the data, with millisecond response times, and visualize it, with millisecond response times, run machine learning algorithms against it to augment it, you still have that human ability to look at massive sets of data, and do ad hoc discovery, but can run machining learning algorithms against that and complement it with machine learning. And then the operational piece of bringing the data scientists into the same platform that the business is using, so you don't have data recency issues, is a really powerful mix. The other piece I would just add is the whole piece around data discovery, you can't really call it big data if, in order to analyze the data, you have to downsize and downsample to look at a subset of data. It's all about looking at the entire set. So that's where we really bring value. >> So, to summarize very quickly, you are providing a platform that can run very, very fast, in a parallel system, and memories in these parallel systems, so that large amounts of data can be acted upon. >> That's right. >> Now, so, the next question is, there's not going to be a billion people that are going to use your tool to do things, how are you going to work with an ecosystem and partners to get the value that you're able to create with this data, out into the engine enterprise. >> It's a great question, and probably the biggest challenge that I have, which is, how do you get above the word spaghetti, and just get into education around this. And so I think the key is getting into examples, of how it's affecting the industry. So don't talk about the technology, and streaming from Kafka into a GPU-powered engine, talk about the impact to the business in terms of what it brings in terms of the omnichannel. You look at something like Japan in the 2020 Olympics, and you think about that in terms of telco, and how are the mobile providers going to be able to take all the data of what people are doing, and to related that to ad-tech, to relate that to customer insight, to relate that to new business models of how they could sell the data, that's the world of education we have to focus on, is talk about the transformative value it brings from the customer perspective, the outside-in as opposed to the inside-out. >> On that educational perspective, as a CMO, I'm sure you meet with a lot of customers, do you find that you might be in this role of trying to help bridge the gaps between different roles in an organization, where there's data silos, and there's probably still some territorial culture going on? What are you finding in terms of Kinetica's ability to really help educate and maybe bring more stakeholders, not just to the table, but kind of build a foundation of collaboration? >> Yeah, it's a really interesting question because I think it means, not just for Kinetica, but all vendors in the space, have to get out of their comfort zone, and just stop talking speeds and feeds and scale, and in fact, when we were looking at how to tell our story, we did an analysis of where most companies were talking, and they were focusing a lot more on the technical aspirations that developers sell, which is important, you still need to court the developer, you have community products that they can download, and kick the tires with, but we need to extend our dialogue, get out of our customer comfort zone, and start talking more to CIOs, CTOs, CDOs, and that's just reaching out to different avenues of communication, different ways of engaging. And so, I think that's kind of a core piece that I'm taking away from Strata, is we do a wonderful job of speaking to developers, we all need to get out of our comfort zone and talk to a broader set of folks, so business folks. >> Right, 'cause that opens up so many new potential products, new revenue streams, on the marketing side being able to really target your customer base audience, with relevant, timely offers, to be able to be more connected. >> Yeah, the worst scenario is talking to an enterprise around the wonders of a technology that they're super excited about, but they don't know the use case that they're trying to solve, start with the use case they're trying to solve, start with thinking about how this could affect their position in the market, and work on that, in partnership. We have to do that in collaboration with the customers. We can't just do that alone, it's about building a partnership and learning together around how you use data in a different way. >> So as you imagine, the investments that Kinetica is going to make over the next few years, with partners, with customers, what do you hope Kinetica will be in 2020? >> So, we want it to be that transformative engine for enterprises, we think we are delivering something that's quite unique in the world, and, you want to see this on a global basis, affecting our customer's value. I almost want to take us out of the story, and if I'm successful, you're going to hear wonderful enterprise companies across telco, banking, and other areas just telling their story, and we happen to be the engine behind it. >> So you're an ingredient in their success. >> Yes, a core ingredient in their success. >> So if we think about over the course of the next technology, set of technology waves, are they any particular applications that you think you're going to be stronger in? So I'll give you an example, do you envision that Kinetica can have a major play in how automation happens inside infrastructure, or how developers start seeing patterns in data, imagine how those assets get created. Where are some of the kind of practical, but not really, or rarely talked about applications that you might find yourselves becoming more of an ingredient because they themselves become ingredients to some of these other big use cases? >> There are a lot of commonalities that we're starting to see, and the interesting piece is the architecture that you implement tends to be the same, but the context of how you talk about it, and the impact it has tends to be different, so, I already mentioned the customer 360 view? First and foremost, break down silos across your organization, figure out how do you get your data into one place where you can run queries against it, you can visualize it, you can do machine learning analysis, that's a foundational element, and, I have a company in Asia called Lippo that is doing that in their space, where all of the sudden they're starting to glean things they didn't know about their customer before to create, doing that ad hoc discovery, so that's one area. The other piece is this use case of how do you actually operationalize data scientists, and machine learning, into your core business? So, that's another area that we focus on. There are simple entry points, things like Tableau Acceleration, where you put us underneath the existing BI infrastructure, and all of the sudden, you're a hundred times faster, and now your business folks can sit at the table, and make real-time business decisions, where in the past, if they clicked on certain things, they'd have to wait to get those results. Geospatial visualization's a no-brainer, the idea of taking environmental data, pairing it with your customer data, for example, and now learning about interactions. And I'd say the other piece is more innovation driven, where we would love sit down with different innovation groups in different verticals and talk with them about, how are you looking to monetize your data in the future, what are the new business models, how does things like voice interaction affect your data strategy, what are the different ways you want to engage with your data, so there's a lot of different realms we can go to. >> One of the things you said as we wrap up here, that I couldn't agree with more, is, the best value articulation I think a brand can have, period, is through the voice of their customer. And being able to be, and I think that's one of the things that Paul said yesterday is, defining Kinetica's success based on the success of your customers across industry, and I think really doesn't get more objective than a customer who has, not just from a developer perspective, maybe improved productivity, or workforce productivity, but actually moved the business forward, to a point where you're maybe bridging the gaps between the digital and physical, and actually enabling that business to be more profitable, open up new revenue streams because this foundation of collaboration has been established. >> I think that's a great way to think about it-- >> Which is good, 'cause he's your CEO. >> (laughs) Yes, that sustains my job. But the other piece is, I almost get embarrassed talking about Kinetica, I don't want to be the car salesman, or the vacuum salesman, that sprinkles dirt on the floor and then vacuums it up, I'd rather us kind of fade to the behind the scenes power where our customers are out there telling wonderful stories that have an impact on how people live in this world. To me, that's the best marketing you can do, is real stories, real value. >> Couldn't agree more. Well Dan, thanks so much for stopping by, sharing what things that Kinetica is doing, some of the things you're hearing, and how you're working to really build this foundation of collaboration and enablement within your customers across industries. We look forward to hearing the kind of cool stuff that happens with Kinetica, throughout the rest of the year, and again, thanks for stopping by and sharing your insights. >> Thank you for having me. >> I want to thank you for watching theCUBE, I'm Lisa Martin with my co-host Peter Burris, we are at Big Data SV, our second day of coverage, at a cool place called the Forager Tasting Room, in downtown San Jose, stop by, check us out, and have a chance to talk with some of our amazing analysts on all things big data. Stick around though, we'll be right back with our next guest after a short break. (mellow electronic music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We are the down the street from the Strata Data Conference, and what is that, what does it mean for your customers? and it's not just about creating the data driven business, So how is the relationship between your software, if you think about this in terms of this is really in the use cases of what you can do with it, and the digital bank, and all these Fintech upstarts making sure that the value that people are creating, is the idea of how to operationalize this in a way you are providing a platform that are going to use your tool to do things, and how are the mobile providers going to be able and kick the tires with, but we need to extend our dialogue, on the marketing side being able to really target We have to do that in collaboration with the customers. the engine behind it. that you think you're going to be stronger in? and the impact it has tends to be different, so, One of the things you said as we wrap up here, To me, that's the best marketing you can do, some of the things you're hearing, and have a chance to talk with some of our amazing analysts

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Paul	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dan Raskin	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Daniel Raskin	PERSON	0.99+
64 cores	QUANTITY	0.99+
Asia	LOCATION	0.99+
Dan	PERSON	0.99+
2020	DATE	0.99+
San Jose	LOCATION	0.99+
two questions	QUANTITY	0.99+
Kinetica	ORGANIZATION	0.99+
Lippo	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
second day	QUANTITY	0.99+
yesterday	DATE	0.99+
6,000 parallel	QUANTITY	0.99+
64 parallel	QUANTITY	0.99+
2020 Olympics	EVENT	0.99+
Strata Data Conference	EVENT	0.99+
telco	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.98+
single engine	QUANTITY	0.97+
First	QUANTITY	0.97+
Wild West	LOCATION	0.97+
today	DATE	0.97+
five years ago	DATE	0.96+
Big Data SV	ORGANIZATION	0.96+
one area	QUANTITY	0.95+
Strata	ORGANIZATION	0.95+
United States Postal Service	ORGANIZATION	0.94+
day two	QUANTITY	0.93+
Narrator: Live	TITLE	0.93+
One	QUANTITY	0.93+
one place	QUANTITY	0.9+
Fintech	ORGANIZATION	0.88+
up to 6,000 cores	QUANTITY	0.88+
years	DATE	0.88+
US Postal Service	ORGANIZATION	0.88+
Billions of rows	QUANTITY	0.87+
terabytes	QUANTITY	0.85+
Japan	LOCATION	0.82+
hundred times	QUANTITY	0.82+
terabytes of data	QUANTITY	0.81+
Strata	TITLE	0.8+
Tableau Acceleration	TITLE	0.78+
single industry	QUANTITY	0.78+
CoreIP	TITLE	0.76+
360 view	QUANTITY	0.75+
Silicon Valley	LOCATION	0.73+
billion people	QUANTITY	0.73+
2018	DATE	0.73+
Data SV	EVENT	0.72+
Kinetica	COMMERCIAL_ITEM	0.72+
Forager Tasting Room	ORGANIZATION	0.68+
Big	EVENT	0.67+
millisecond	QUANTITY	0.66+
Kafka	PERSON	0.6+
Big Data	ORGANIZATION	0.59+
Data SV	ORGANIZATION	0.58+
big data	ORGANIZATION	0.56+
next	DATE	0.55+
lot	QUANTITY	0.54+
Big	ORGANIZATION	0.47+

Kunal Agarwal, Unravel Data | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCube! Presenting Big Data: Silicon Valley Brought to you by SiliconANGLE Media and its ecosystem partners. (techno music) >> Welcome back to theCube. We are live on our first day of coverage at our event BigDataSV. I am Lisa Martin with my co-host George Gilbert. We are at this really cool venue in downtown San Jose. We invite you to come by today, tonight for our cocktail party. It's called Forager Tasting Room and Eatery. Tasty stuff, really, really good. We are down the street from the Strata Data Conference, and we're excited to welcome to theCube a first-time guest, Kunal Agarwal, the CEO of Unravel Data. Kunal, welcome to theCube. >> Thank you so much for having me. >> So, I'm a marketing girl. I love the name Unravel Data. (Kunal laughs) >> Thank you. >> Two year old company. Tell us a bit about what you guys do and why that name... What's the implication there with respect to big data? >> Yeah, we are a application performance management company. And big data applications are just very complex. And the name Unravel is all about unraveling the mysteries of big data and understanding why things are not performing well and not really needing a PhD to do so. We're simplifying application performance management for the big data stack. >> Lisa: Excellent. >> So, so, um, you know, one of the things that a lot of people are talking about with Hadoop, originally it was this cauldron of innovation. Because we had the "let a thousand flowers bloom" in terms of all the Apache projects. But then once we tried to get it into operation, we discovered there's a... >> Kunal: There's a lot of problems. (Kunal laughs) >> There's an overhead, there's a downside to it. >> Maybe tell us, tell us why you both need to know, you need to know how people have done this many, many times. >> Yeah. >> How you need to learn from experience and then how you can apply that even in an environment where someone hasn't been doing it for that long. >> Right. So, if I back a little bit. Big data is powerful, right? It's giving companies an advantage that they never had, and data's an asset to all of these different companies. Now they're running everything from BI, machine learning, artificial intelligence, IOT, streaming applications on top of it for various reasons. Maybe it is to create a new product to understand the customers better, etc., But as you rightly pointed out, when you start to implement all of these different applications and jobs, it's very, very hard. It's because big data is very complex. With that great power comes a lot of complexity, and what we started to see is a lot of companies, while they want to create these applications and provide that differentiation to their company, they just don't have enough expertise as well in house to go and write good applications, maintain these applications, and even manage the underlying infrastructure and cluster that all these applications are running on. So we took it upon ourselves where we thought, Hey, if we simplify application performance management and if we simplify ongoing management challenges, then these companies would run more big data applications, they would be able to expand their use cases, and not really be fearful of, Hey, we don't know how to go and solve these problems. Do we actually rely on our system that is so complex and new? And that's the gap the Unravel fills, which is we monitor and manage not only one componenent of the big data ecosystem, but like you pointed out, it's a, it's a full zoo of all of these systems. You have Hadoop, and you have Spark, and you have Kafka for data injection. You may have some NoSQL systems and newer MPP platforms as well. So the vision of Unravel is really to be that one place where you can come in and understand what's happening with your applications and your system overall and be able to resolve those problems in an automatic, simple way. >> So, all right, let's start at the concrete level of what a developer might get out of >> Kunal: Right. >> something that's wrapped in Unravel and then tell us what the administrator experiences. >> Kunal: Absolutely. So if you are a big data developer you've got in a business requirement that, Hey, go and make this application that understands our customers better, right? They may choose a tool of their liking, maybe Hive, maybe Spark, maybe Kafka for data injection. And what they'll do is they'll write an app first in dev, in their dev environment or the QA environment. And they'll say, Hey, maybe this application is failing, or maybe this application is not performing as fast as I want it to, or even worse that this application is starting to hog a lot of resources, which may slow down my other applications. Now to understand what's causing these kind of problems today developers really need a PhD to go and decipher them. They have to look at tons of law rogs, uh, raw logs metrics, configuration settings and then try to stitch the story up in their head, trying to figure out what is the effect, what is the cause? Maybe it's this problem, maybe it's some other problem. And then do trial and error to try, you know to solving that particular issue. Now what we've seen is big data developers come in variety of flavors. You have the hardcore developers who truly understand Spark and Hadoop and everything, but then 80% of the people submitting these applications are data scientist or business analysts, who may understand SQL, who may know Python, but don't necessarily know what distributed computing and parallel processing and all of these things really are, and where can inefficiencies and problems really lie. So we give them this one view, which will connect all of these different data sources and then tell them in plain English, this is the problem, this is why this problem happened, and this is how you can go and resolve it, thereby getting them unstuck and making it very simple for them to go in and get the performance that they're getting. >> So, these, these, um, they're the developers up front and you're giving them a whole new, sort of, toolchain or environment to solve the operational issues. >> Kunal: Right. >> So that the, if it's DevOps, its really dev is much more sufficient. >> Yes, yes, I mean, all companies want to run fast. They don't want to be slowed down. If you have a problem today, they'll file a ticket, it'll go to the operations team, you wait a couple of days to get some more information back. That just means your business has slowed down. If things are simple enough where the application developers themselves can resolve a lot of these issues, that'll get the business unstuck and get them moving on further. Now, to the other point which you were asking, which is what about the operations and the app support people? So, Unravel's a great tool for them too because that helps them see what's happening holistically in the cluster. How are other applications behaving with each other? It's usually a multitenant, multiapplication environment that these big data jobs are running on. So, is my apps slowing down George's apps? Am I stealing resources from your applications? More so, not just about an individual application issue itself. So Unravel will give you visibility into each app, as well as the overall cluster to help you understand cluster-wide problems. >> Love to get at, maybe peel apart your target audience a little bit. You talked about DevOps. But also the business analysts, data scientists, and we talk about big data. Data is, has such tremendous power to fuel a company and, you know, like you said use it to deliver and, create and deliver new products. Are you talking with multiple audiences within a company? Do you start at DevOps and they bring in their peers? Or do you actually start, maybe, at the Chief Data Officer level? What's that kind of entrance for Unravel? >> So the word I use to describe this is DataOps, instead of DevOps, right? So in the older world you had developers, and you had operations people. Over here you have a data team and operations people, and that data team can comprise of the developers, the data scientists, the business analysts, etc., as well. But you're right. Although we first target the operations role because they have to manage and monitor the system and make sure everything is running like a well-oiled machine, they are now spreading it out to be end-users, meaning the developers themselves saying, "Don't come to me for every problem. "Look at Unravel, try solve it here, "and if you cannot, then come to me." This is all, again, improving agility within the company, making sure that people have the necessary tools and insights to carry on with their day. >> Sounds like an enabler, >> Yeah, absolutely. >> That operations would push down to the DevOp, the developers themselves. >> And even the managers and the CDOs, for example, they want to see their ROI that they're getting from their big data investments. They want to see, they have put in these millions of dollars, have got an infrastructure and these services set up, but how are we actually moving the needle forward? Are there any applications that we're actually putting in business, and is that driving any business value? So we will be able to give them a very nice dashboard helping them understand what kind of throughput are you getting from your system, how many applications were you able to develop last week and onboard to your production environment? And what's the rate of innovation that's really happening inside your company on those big data ecosystems? >> It sort of brings up an interesting question on two prongs. One is the well-known, but inexact number about how many big data projects, >> Kunal: Yeah, yeah. >> I don't know whether they fail or didn't pay off. So there's going in and saying, "Hey, we can help you manage this "because it was too complicated." But then there's also the, all the folks who decided, "Well, we really don't want "to run it all on-prem. "We're not going to throw away everything we did there, "but we're going to also put a lot of new investment >> Kunal: Exactly, exactly. >> in the cloud. Now, Wikibon has a term for that, which true private cloud, which is when you have the operational processes that you use in the public cloud and you can apply them on-prem. >> Right. >> George: But there's not many products that help you do that. How can Unravel work...? >> Kunal: That's a very good questions, George. We're seeing the world move more and more to a cloud environment, or I should say an on-demand environment where you're not so bothered about the infrastructure and the services, but you want Spark as a dial tone. You want Kafka as a dial tone. You want a machine-learning platform as a dial tone. You want to come in there, you want to put in your data, and you want to just start running it. Unravel has been designed from the ground up to monitor and manage any of these environments. So, Unravel can solve problems for your applications running on-premise and similarly all the applications that are running on cloud. Now, on the cloud there are other levels of problems as well so, of course, you'd have applications that are slow, applications that are failing; we can solve those problems. But if you look at a cloud environment, a lot of these now provide you an autoscaling capability, meaning, Hey, if this app doesn't run in the amount of time that we were hoping it to run, let's add extra hardware and run this application. Well, if you just keep throwing machines at the problem, it's not going to solve your issue. Now, it doesn't decrease the time that it will take linearly with how many servers that you're actually throwing in there, so what we can help companies understand is what is the resource requirement of a particular application? How should we be intelligently allocating resources to make sure that you're able to meet your time SLAs, your constraints of, here I need to finish this with x number of minutes, but at the same time be intelligent about how much cost you're spending over there. Do you actually need 500 containers to go and run this app? Well, you may have needed 200. How do you know that? So, Unravel will also help you get efficient with your run, not just faster, but also can it be a good multitenant citizen, can it use limited resources to actually run this applications as well? >> So, Kunal, some of the things I'm hearing from a customer's standpoint that are potential positive business outcomes are internal: performance boost. >> Kunal: Yeah. >> It also sounds like, sort of... productivity improvements internally. >> And then also the opportunity to have the insight to deliver new products, but even I'm thinking of, you know, helping make a retailer, for example, be able to do more targeted marketing, so >> the business outcomes and the impact that Unravel can make really seem to have pretty strong internal and external benefits. >> Kunal: Yes. >> Is there a favorite customer story, (Kunal laughs) don't have to mention names, that you really think speaks to your capabilities? >> So, 100% Improving performance is a very big factor of what Unravel can do. Decreasing costs by improving productivity, by limiting the amount of resources that you're using, is a very, very big factor. Now, amongst all of these companies that we work with, one key factor is improving reliability, which means, Hey, it's fine that he can speed up this application, but sometimes I know the latency that I expect from an app, maybe it's a second, maybe it's a minute, depending on the type of application. But what businesses cannot tolerate is this app taking five x amount more time today. If it's going to finish in a minute, tell me it'll finish in a minute and make sure it finishes in a minute. And this is a big use case for all of the big data vendors because a lot of the customers are moving from Teradata, or from Vertica, or from other relation databases, on to Hortonworks or Cloudera or Amazon EMR. Why? Because it's one tenth the amount of cost for running these workloads. But, all the customers get frustrated and say, "I don't mind paying 10 x more money, "but because over there it used to work. "Over here, there are just so many complications, "and I don't have reliability with these applications." So that's a big, big factor of, you know, how we actually help these customers get value out of the Unravel product. >> Okay, so, um... A question I'm, sort of... why aren't there so many other Unravels? >> Kunal: Yeah. (Kunal laughs) >> From what I understood from past conversations. >> Kunal: Yeah. >> You can only really build the models that are at the heart of your capabilities based on tons and tons of telemetry >> Kunal: Yeah. >> that cloud providers or, or, sort of, internet scale service providers have accumulated in that, because they all have sort of a well-known set of configurations and well-known kind of typology. In other words, there're not a million degrees of freedom on any particular side that you can, you have a well-scoped problem, and you have tons of data. So it's easier to build the models. So who, who else could do this? >> Yeah, so the difference between Unravel and other monitoring products is Unravel is not a monitoring product. It's an intelligent performance management suite. What that means is we don't just give you graphs and metrics and say, "Here are all the raw information, "you go figure it out." Instead, we have to take it a step further where we are actually giving people answers. In order to develop something like that, you need full stack information; that's number one. Meaning information from applications all the way down to infrastructure and everything in between. Why? Because problems can lie anywhere. And if you don't have that full stack info, you're blind-siding yourself, or limiting the scope of the problems that you can actually search for. Secondly is, like you were rightly pointing out, how do I create answers from all this raw data? So you have to think like how an expert with big data would think, which is if there is a problem what are the kinds of checks, balances, places that that person would look into, and how would that person establish that this is indeed the root cause of the problem today? And then, how would that person actually resolve this particular problem? So, we have a big team of scientists, researchers. In fact, my co-founder is a professor of computer science at Duke University who has been researching data-based optimization techniques for the last decade. We have about 80 plus publications in this area, Starfish being one of them. We have a bunch of other publications, which talk about how do you automate problem discovery, root cause analysis, as well as resolution, to get best performance out of these different databases? And you're right. A lot of work has gone on the research side, but a lot of work has gone in understanding the needs of the customers. So we worked with some of the biggest companies out there, which have some of the biggest big data clusters, to learn from them, what are some everyday, ongoing management challenges that you face, and then taking that problem to our datasets and figuring out, how can we automate problem discovery? How can we proactively spot a lot of these errors? I joke around and I tell people that we're big data for big data. Right? All these companies that we serve, they are gathering all of this data, and they're trying to find patterns, and they're trying to find, you know, some sort of an insight with their data. Our data is system generated data, performance data, application data, and we're doing the exact same thing, which is figuring out inefficiencies, problems, cause and effect of things, to be able to solve it in a more intelligent, smart way. >> Well, Kunal, thank you so much for stopping by theCube >> Kunal: Of course. >> And sharing how Unravel Data is helping to unravel the complexities of big data. (Kunal laughs) >> Thank you so much. Really appreciate it. >> Now you're a Cube almuni. (Kunal laughs) >> Absolutely. Thanks so much for having me. >> Kunal, thanks. >> Yeah, and we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at our own event BigData SV in downtown San Jose, California. Stick around. George and I will be right back with our next guest. (quiet crowd noise) (techno music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We invite you to come by today, I love the name Unravel Data. Tell us a bit about what you guys do and not really needing a PhD to do so. So, so, um, you know, one of the things that Kunal: There's a lot of problems. there's a downside to it. tell us why you both need to know, and then how you can apply that even in an environment of the big data ecosystem, but like you pointed out, and then tell us what the administrator experiences. and this is how you can go and resolve it, and you're giving them a whole new, sort of, So that the, if it's DevOps, Now, to the other point which you were asking, to fuel a company and, you know, like you said So in the older world you had developers, DevOp, the developers themselves. and is that driving any business value? One is the well-known, but inexact number "Hey, we can help you manage this and you can apply them on-prem. that help you do that. and you want to just start running it. So, Kunal, some of the things I'm hearing It also sounds like, sort of... that Unravel can make really seem to have So that's a big, big factor of, you know, A question I'm, sort of... and you have tons of data. What that means is we don't just give you graphs to unravel the complexities of big data. Thank you so much. Now you're a Cube almuni. Thanks so much for having me. Yeah, and we want to thank you

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Kunal Agarwal	PERSON	0.99+
George	PERSON	0.99+
Kunal	PERSON	0.99+
Lisa	PERSON	0.99+
80%	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Unravel Data	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
today	DATE	0.99+
500 containers	QUANTITY	0.99+
One	QUANTITY	0.99+
Two year	QUANTITY	0.99+
two prongs	QUANTITY	0.99+
last week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
tonight	DATE	0.99+
200	QUANTITY	0.99+
first day	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Spark	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
each app	QUANTITY	0.99+
Python	TITLE	0.98+
a minute	QUANTITY	0.98+
English	OTHER	0.98+
one	QUANTITY	0.98+
Duke University	ORGANIZATION	0.98+
five	QUANTITY	0.98+
Kafka	TITLE	0.98+
Hadoop	TITLE	0.98+
BigData SV	EVENT	0.97+
first-time	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
one key factor	QUANTITY	0.96+
millions of dollars	QUANTITY	0.95+
about 80 plus publications	QUANTITY	0.95+
SQL	TITLE	0.95+
DevOps	TITLE	0.94+
first	QUANTITY	0.94+
BigDataSV	EVENT	0.94+
tons and tons	QUANTITY	0.94+
both	QUANTITY	0.94+
Unravel	ORGANIZATION	0.93+
Secondly	QUANTITY	0.91+
million degrees	QUANTITY	0.91+
San Jose, California	LOCATION	0.91+
Hive	TITLE	0.91+
last decade	DATE	0.91+
Unravel	TITLE	0.9+

Guy Churchward, DataTorrent | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data, Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big Data SV, continues, this is our first day. We are down the street from the Strata Data Conference. Come by, we're at this really cool venue, the Forager Tasting Room. We've got a cocktail party tonight. You're going to hear some insights there as well as tomorrow morning. I am Lisa Martin, joined by my co-host, George Gilbert, and we welcome back to theCUBE, for I think the 900 millionth time, the president and CEO of DataTorrent, Guy Churchward. Hey Guy, welcome back! >> Thank you, Lisa, I appreciate it. >> So you're one of our regular VIP's. Give us the update on DataTorrent. What's new, what's going on? >> We actually talked to you a couple of weeks ago. We did a big announcement which was around 3.10, so it's a new release that we have. In all small companies, and we're a small startup, in the big data and analytic space, there is a plethora of features that I can reel through. But it actually makes something a little bit more fundamental. So in the last year... In fact, I think we chatted with you maybe six months ago. We've been looking very carefully at how customers purchase and what they want and how they execute against technology, and it's very very different to what I expected when I came into the company about a year ago off the EMC role that I had. And so, although the features are there, there's a huge amount of underpinning around the experience that a customer would have around big data applications. I'm reminded of, I think it's Gartner that quoted that something like 80% of big data applications fail. And this is one of the things that we really wanted to look at. We have very large customers in production, and we did the analysis of what are we doing well with them, and why can't we do that en masse, and what are people really looking for? So that was really what the release was about. >> Let's elaborate on this a little bit. I want to drill into something where you said many projects, as we've all heard, have not succeeded. There's a huge amount of complexity. The terminology we use is, without tarring and feathering any one particular product, the open source community is kind of like, you're sort of harnessing a couple dozen animals and a zookeeper that works in triplicate... How does DataTorrent tackle that problem? >> Yeah, I mean, in fact I was desperately interested in writing a blog recently about using the word community after open source, because in some respects, there isn't a huge community around the open source movement. What we find is it's the du jour way in which we want to deliver technology, so I have a huge amount of developers that work on a thing called Apache Apex, which is a component in a solution, or in an architecture and in an outcome. And we love what we do, and we do the best we do, and it's better than anybody else's thing. But that's not an application, that's not an outcome. And what happens is, we kind of don't think about what else a customer has to put together, so then they have to go out to the zoo and pick loads of bits and pieces and then try to figure out how to stitch them all together in the best they can. And that takes an inordinately long time. And, in general, people who love this love tinkering with technologies, and their projects never get to production. And large enterprises are used to sitting down and saying, "I need a bulletproof application. "It has to be industrialized. "I need a full SLA on the back of it. "This thing has to have lights out technology. "And I need it quick." Because that was the other thing, as an aspect, is this market is moving so fast, and you look at things like digital economy or any other buzz term, but it really means that if you realize you need to do something, you're probably already too late. And therefore, you need it speedy, expedited. So the idea of being able to wait for 12 months, or two years for an application, also makes no sense. So the arch of this is basically deliver an outcome, don't try and change the way in which open source is currently developed, because they're in components, but embrace them. And so what we did is we sort of looked at it and said, "Well what do people really want to do?" And it's big data analytics, and I want to ingest a lot of information, I want to enrich it, I want to analyze it, and I want to take actions, and then I want to go park it. And so, we looked at it and said, "Okay, so the majority "of stuff we need is what we call a cache stack, "which is KAFKA, Apache Apex, Spark and Hadoop, "and then put complex compute on top." So you would have heard of terms like machine learning, and dimensional compute, so we have their modules. So we actually created an opinionated stack... Because otherwise you have a thousand to choose from and people get confused with choice. I equate it to going into a menu at a restaurant, there's two types of restaurants, you walk into one and you can turn pages and pages and pages and pages of stuff, and you think that's great, I got loads of choice, but the choice kind of confuses you. And also, there's only one chef at the back, and he can't cook everything well. So you know if he chooses the components and puts them together, you're probably not going to get the best meal. And then you go to restaurants that you know are really good, they generally give you one piece of paper and they say, "Here's your three entrees." And you know every single one of them. It's not a lot of choice, but at the end of the day, it's going to be a really good meal. >> So when you go into a customer... You're leading us to ask you the question which is, you're selling the prix fixe tasting menu, and you're putting all the ingredients together. What are some of those solutions and then, sort of, what happens to the platform underneath? >> Yeah, so what you don't want to do is to take these flexible, microdata services, which are open source projects, and hard glue them together to create an application that then has no flexibility. Because, again, one of the myths that I used to assume is applications would last us seven to 10 years. But what we're finding in this space is this movement towards consumerization of enterprise applications. In other words, I need an app and I need it tomorrow because I'm competitively disadvantaged, but it might be wrong, so I then need to adjust it really quick. It's this idea of continual developed, continual adjustment. But that flies in the face of all of this gluing and enterprise-ilities. And I want to base it on open source, and open source, by default, doesn't glue well together. And so what we did is we said okay, not only do you have to create an opinionated stack, and you do that because you want them all to scale into all industries, and they don't need a huge amount of choice, just pick best of breed. But you need to then put a sleeve around them so they all act as though they are a single application. And so we actually announced a thing calls Epoxy. It's a bit of a riff on gluing, but it's called DataTorrent Epoxy. So we have, it's like a microdata service bus, and you can then interchange the components. For instance, right now, Apache Apex is this string-based processing engine in that component. But if there's a better unit, we're quite happy to pull it out, chuck it away, and then put another one in. This isn't a ubiquitous snap-on toolset, because, again, the premise is use open source, get the innovation from there. It has to be bulletproof and enterprise-ility and move really fast. So those are the components I was working on. >> Guy, as CEO, I'm sure you speak with a lot of customers often. What are some of the buying patterns that you're seeing across industries, and what are some of the major business value that DataTorrent can help deliver to your customers? >> The buying patterns when we get involved, and I'm kind of breaking this down into a slightly different way, because we normally get involved when a project's in flight, one of the 80% that's failing, and in general, it's driven by a strategic business partner that has an agenda. And what you see is proprietary application vendors will say, "We can solve everything for you." So they put the tool in and realize it doesn't have the flexibility, it does have enterprise-ility, but it can't adjust fast. And then you get the other type who say, "Well we'll go to a distro or we'll go "to a general purpose practitioner, "and they'll build an application for us." And they'll take open source components, but they'll glue it together with proprietary mush, and then that doesn't then grow past. And then you get the other ones, which is, "Well if I actually am not guided by anybody, "I'll buy a bunch of developers, stick them in my company, "and I've got control on that." But they fiddle around a lot. So we arrive in and, in general, they're in this middle process of saying, "I'm at a competitive disadvantage, "I want to move forward and I want to move forward fast, "and we're working on one of those three channels." The types of outcomes, we just, and back to the expediency of this, we had a telco come to us recently, and it was just before the iPhone X launched, and they wanted to do AB testing on the launch on their platform. We got them up and running within three months. Subsequent from that launch, they then repurposed the platform and some of the components with some augmentation, and they've come out with three further applications. They've all gone into production. So the idea is then these fast cycles of microdata services being stitched together with the Epoxy resin type approach-- >> So faster time to value, lower TCO-- >> Exactly. >> Being able to get to meet their customers' needs faster-- >> Exactly, so it's outcome-based and time to value, and it's time to proof. Because this is, again, the thing that Gartner picked up on, is Hadoop's difficult, this market's complex and people kick the tires a lot. And I sort of joke with customers, "Hey if you want to "obsess about components rather than the outcome, "then your successor will probably come see us "once you're out and your group's failed." And I don't mean that in an obnoxious way. It's not just DataTorrent that solves this same thing, but this it the movement, right? Deal with open source, get enterprise-ilities, get us up and running within a quarter or two, and then let us have some use and agile repurposing. >> Following on that, just to understand going in with a solution to an economic buyer, but then having the platform be reusable, is it opinionated and focused on continuous processing applications, or does it also address both the continuous processing and batch processing? >> Yeah, it's a good answer. In general, and again Gatekeeper, you've got batch and you've got realtime and string, and so we deal with data in motion, which is string-based processing. A string-based processing engine can deal with batch as well, but a batch cannot deal with string. >> George: So you do both-- >> Yeah >> And the idea being that you can have one programming model for both. >> Exactly. >> It's just a window, batch is just a window. >> And the other thing is, a myth bust, is for the last maybe eight plus years, companies assume that the first thing you do in big data analytics is collect all the data, create a data lake, and so they go in there, they ingest the information, they put it into a data lake, and then they poke the data lake posthumously. But the data in the data lake is, by default, already old. So the latency of sticking it into a data lake and then sorting it, and then basically poking it, means that if anybody deals with the data that's in motion, you lose. Because I'm analyzing as it's happening and then you would be analyzing it after at rest, right? So now the architecture of choice is ingest the information, use high performance storage and compute, and then, in essence, ingest, normalize, enrich, analyze, and act on data in motion, in memory. And then when I've used it, then throw it off into a data lake because then I can basically do posthumous analytics and use that for enrichment later. >> You said something also interesting where the DataTorrent customers, the initial successful ones sort of tended to be larger organizations. Those are typically the ones with skillsets to, if anyone's going to be able to put pieces together, it's those guys. Have you not... Well, we always expected big data applications, or sort of adaptive applications, to go mainstream when they were either packaged apps to take all the analysis and embed it, or when you had end to end integrated products to make it simple. Where do you think, what's going to drive this mainstream? >> Yeah, it depends on how mainstream you want mainstream. It's kind of like saying how fast is a fast car. If you want a contractor that comes into IT to create a dashboard, go buy Tableau, and that's mainstream analytics, but it's not. It's mainstream dashboarding of data. The applications that we deal with, by default, the more complex data, they're going to be larger organizations. Don't misunderstand when I say, "We deal with these organizations." We don't have a professional services arm. We work very closely with people like HCL, and we do have a jumpstart team that helps people get there. But our job is teach someone, it's like a kid with a bike and the training wheels, our job is to teach them how to ride the bike, and kick the wheels off, and step away. Because what we don't want to do is to put a professional services drip feed into them and just keep sucking the money out. Our job is to get them there. Now, we've got one company who actually are going to go live next month, and it's a kid tracker, you know like a GPS one that you put on bags and with your kids, and it'll be realtime tracking for the school and also for the individuals. And they had absolutely zero Hadoop experience when we got involved with them. And so we've brought them up, we've helped them with the application, we've kicked the wheels off and now they're going to be sailing. I would say, in a year's time, they're going to be comfortable to just ignore us completely, and in the first year, there's still going to be some handholding and covering up a bruise as they fall off the bike every so often. But that's our job, it's IP, technology, all about outcomes and all about time to value. >> And from a differentiation standpoint, that ability to enable that self service and kick off the training wheels, is that one of the biggest differentiators that you find DataTorret has, versus the Tableau's and the other competitors on the market? >> I don't want to say there's no one doing what we're doing, because that will sound like we're doing something odd. But there's no one doing what we're doing. And it's almost like Tesla. Are they an electric car or are they a platform? They've spurred an industry on, and Uber did the same thing, and Lyft's done something and AirBNB has. And what we've noticed is customer's buying patterns are very specific now. Use open source, get up their enterprise-ilities, and have that level of agility. Nobody else is really doing that. The only people that will do that is your contract with someone like Hortonworks or a Cloudera, and actually pay them a lot of money to build the application for you. And our job is really saying, "No, instead of you paying "them on professional services, we'll give you the sleeve, "we'll make it a little bit more opinionated, "and we'll get you there really quickly, "and then we'll let you and set you free." And so that's one. We have a thing called the Application Factory. That's the snap on toolset where they can literally go to a GUI and say, "I'm in the financial market, "I want a fraud prevention application." And we literally then just self assemble the stack, they can pick it up, and then put their input and output in. And then, as we move forward, we'll have partners who are building the spoke applications in verticals, and they will put them up on our website, so the customers can come in and download them. Everything is subscription software. >> Fantastic, I wish we had more time, but thanks so much for finding some time today to come by theCUBE, tell us what's new, and we look forward to seeing you on the show again very soon. >> I appreciate it, thank you very much. >> We want to thank you for watching theCUBE. Again, Lisa Martin with my co-host George Gilbert, we're live at our event, Big Data SV, in downtown San Jose, down the street from the Strata Data Conference. Stick around, George and I will be back after a short break with our next guest. (light electronic jingle)

Published Date : Mar 8 2018

SUMMARY :

presenting Big Data, Silicon Valley, brought to you and we welcome back to theCUBE, So you're one of our regular VIP's. and we did the analysis of what are we doing well with them, I want to drill into something where you said many projects, So the idea of being able to wait for 12 months, So when you go into a customer... And so what we did is we said okay, not only do you have What are some of the buying patterns that you're seeing And then you get the other ones, which is, And I sort of joke with customers, "Hey if you want to and so we deal with data in motion, And the idea being that you can have one and then you would be analyzing it after at rest, right? or when you had end to end integrated products and now they're going to be sailing. and actually pay them a lot of money to build and we look forward to seeing you We want to thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
two years	QUANTITY	0.99+
George	PERSON	0.99+
12 months	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
AirBNB	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
two types	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
iPhone X	COMMERCIAL_ITEM	0.99+
DataTorrent	ORGANIZATION	0.99+
seven	QUANTITY	0.99+
Guy Churchward	PERSON	0.99+
tomorrow morning	DATE	0.99+
Lyft	ORGANIZATION	0.99+
last year	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six months ago	DATE	0.99+
next month	DATE	0.99+
three months	QUANTITY	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.98+
EMC	ORGANIZATION	0.98+
first day	QUANTITY	0.98+
tonight	DATE	0.98+
Silicon Valley	LOCATION	0.98+
tomorrow	DATE	0.98+
one chef	QUANTITY	0.98+
10 years	QUANTITY	0.98+
one piece	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
Cloudera	ORGANIZATION	0.97+
three entrees	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
first thing	QUANTITY	0.97+
first year	QUANTITY	0.96+
single application	QUANTITY	0.96+
today	DATE	0.95+
couple of weeks ago	DATE	0.95+
telco	ORGANIZATION	0.95+
900 millionth time	QUANTITY	0.95+
one company	QUANTITY	0.94+
HCL	ORGANIZATION	0.94+
a quarter	QUANTITY	0.94+
DataTorret	ORGANIZATION	0.93+
three channels	QUANTITY	0.93+
two	QUANTITY	0.92+
Big Data SV	EVENT	0.92+
Big Data SV 2018	EVENT	0.91+
three further applications	QUANTITY	0.86+
Apex	TITLE	0.84+
a year	QUANTITY	0.82+
Tableau	ORGANIZATION	0.81+
Hadoop	PERSON	0.81+
about a year ago	DATE	0.8+
couple dozen animals	QUANTITY	0.8+
product	QUANTITY	0.78+
eight plus years	QUANTITY	0.77+
Apache	ORGANIZATION	0.76+
agile	TITLE	0.76+
Guy	PERSON	0.73+
Epoxy	ORGANIZATION	0.71+
Tableau	TITLE	0.71+
DataTorrent	PERSON	0.7+
around 3.10	DATE	0.69+
Spark	TITLE	0.68+
restaurants	QUANTITY	0.66+
Gatekeeper	TITLE	0.66+
model	QUANTITY	0.63+

Matthew Baird, AtScale | Big Data SV 2018

>> Announcer: Live from San Jose. It's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and it's ecosystem partners. (techno music) >> Welcome back to theCUBE, our continuing coverage on day one of our event, Big Data SV. I'm Lisa Martin with George Gilbert. We are down the street from the Strata Data Conference. We've got a great, a lot of cool stuff going on. You can see the cool set behind me. We are at Forager Tasting Room & Eatery. Come down and join us, be in our audience today. We have a cocktail event tonight, who doesn't want to join that? And we have a nice presentation tomorrow morning of our Wikibon's 2018 Big Data Forecast and Review. Joining us next is Matthew Baird the co-founder of AtScale. Matthew, welcome to theCUBE. >> Thanks for having me. Fantastic venue, by the way. >> Isn't it cool? >> This is very cool. >> Yeah, it is. So, talking about Big Data, you know, Gardner says, "85% of Big Data projects have failed." I often say failure is not a bad F word, because it can spawn the genesis of a lot of great business opportunities. Data lakes were big a few years ago, turned into swamps. AtScale has this vision of Data Lake 2.0, what is that? >> So, you're right. There have been a lot of failures, there's no doubt about it. And you're also right that is how we evolve, and we're a Silicon Valley based company. We don't give up when faced with these things. It's just another way to not do something. So, what we've seen and what we've learned through our customers is they need to have a solution that is integrated with all the technologies that they've adopted in the enterprise. And it's really about, if you're going to make a data lake, you're going to have data on there that is the crown jewels of your business. How are you going to get that in the hands of your constituents, so that they can analyze it, and they can use it to make decisions? And how can we, furthermore, do that in a way that supplies governance and auditability on top of it, so that we aren't just sending data out into the ether and not knowing where it goes? We have a lot of customers in the insurance, health insurance space, and with financial customers that the data absolutely must be managed. I think one of the biggest changes is around that integration with the current technologies. There's a lot of movement into the Cloud. The new data lake is kind of focused more on these large data stores, where it was HDFS with Hadoop. Now it's S3, Google's object storage, and Azure ADLS. Those are the sorts of things that are backing the new data lake I believe. >> So if we take these, where the Data Lake Store didn't have to be something that's a open source HDFS implementation, it could even be through just through a HDSF API. >> Matthew: Yeah, absolutely. >> What are some of the, how should we think about the data sources and feeds, for this repository, and then what is it on top that we need to put to make the data more consumable? >> Yeah, that's a good point. S3, Google Object Storage, and Azure, they all have a characteristic of, they are large stores. You can store as much as you want. They generally on the Clouds, and in the open source on-prem software for landing the data exists, for streaming the data and landing it, but the important thing there is it's cost-effective. S3 is a cost-effective storage system. HDFS is a mostly cost-effective storage system. You have to manage it, so it has a slightly higher cost, but the advice has been, get it to the place you're going to store it. Store it in a unified format. You get a halo effect when you have a unified format, and I think the industry is coalescing around... I'd probably say ParK's in the lead right now, but once ParK can be read by, let's take Amazon for instance, can be read by Athena, can be read by Redshift Spectrum, it can be read by their EMR, now you have this halo effect where your data's always there, always available to be consumed by a tool or a technology that can then deliver it to your end users. >> So when we talk about ParK, we're talking about columnar serialization format, >> Matthew: Yes. but there's more on top of that that needs to be layered, so that you can, as we were talking about earlier, combine the experience of a data warehouse, and the curated >> Absolutely data access where there's guard rails, >> Matthew: Yes >> and it's simple, versus sort of the wild west, but where I capture everything in a data lake. How do you bring those two together? >> Well, specifically for AtScale, we allow you to integrate multiple data access tools in AtScale, and then we use the appropriate tool to access the data for the use case. So let me give you an example, in the Amazon case, Redshift is wonderful for accessing interactive data, which BI users want, right? They want fast queries, sub-second queries. They don't want to pay to have all the raw data necessarily stored in Redshift 'cause that's pretty expensive. So they have this Redshift spectrum, it's sitting in S3, that's cost effective. So when we go and we read raw data to build these summary tables, to deliver the data fast, we can read from Spectrum, we can put it all together, drop it into Redshift, a much smaller volume of data, so it has faster characteristics for being accessed. And it delivers it to the user that way. We do that in Hadoop when we access via Hive for building aggregate tables, but Spark or Impala, is a much faster interactive engine, so we use those. As I step back and look at this, I think the Data Lake 2.0, from a technical perspective is about abstraction, and abstraction's sort of what separates us from the animals, right? It's a concept where we can pack a lot of sophistication and complexity behind an interface that allows people to just do what they want to do. You don't know how, or maybe you do know how a car engine works, I don't really, kind of, a little bit, but I do know how to press the gas pedal and steer. >> Right. >> I don't need to know these things, and I think the Data Lake 2.0 is about, well I don't need to know how Century, or Ranger, or Atlas, or any of these technologies work. I need to know that they're there, and when I access data, they're going to be applied to that data, and they're going to deliver me the stuff that I have access to and that I can see. >> So a couple things, it sounded like I was hearing abstraction, and you said really that's kind of the key, that sounds like a differentiator for AtScale, is giving customers that abstraction they need. But I'm also curious from a data value perspective, you talked about in Redshift from an expense perspective. Do you also help customers gain abstraction by helping them evaluate value of data and where they ought to keep it, and then you give them access to it? Or is that something that they need to do, kind of bring to the table? >> We don't really care, necessarily, about the source of the data, as long as it can be expressed in a way that can be accessed by whatever engine it is. Lift and shift is an example. There's a big move to move from Teradata or from Netezza into a Cloud-based offering. People want to lift it and shift it. It's the easiest way to do this. Same table definitions, but that's not optimized necessarily for the underlying data store. Take BigQuery for example, BigQuery's an amazing piece of technology. I think there's nothing like it out there in the market today, but if you really want BigQuery to be cost-effective, and perform and scale up to concurrency of... one of our customers is going to roll out about 8,000 users on this. You have to do things in BigQuery that are BigQuery-friendly. The data structures, the way that you store the data, repeated values, those sorts of things need to be taken into consideration when you build your schema out for consumption. With AtScale they don't need to think about that, they don't need to worry about it, we do it for them. They drop the schema in the same way that it exists on their current technology, and then behind the scenes, what we're doing is we're looking at signals, we're looking at queries, we're looking at all the different ways that people access the data naturally, and then we restructure those summary tables using algorithms and statistics, and I think people would broadly call it ML type approaches, to build out something that answers those questions, and adapts over time to new questions, and new use cases. So it's really about, imagine you had the best data engineering team in the world, in a box, they're never tired, they never stop, and they're always interacting with what the customers really want, which is "Now I want to look at the data this way". >> It's sounds actually like what your talking about is you have a whole set of sources, and targets, and you understand how they operate, but why I say you, I mean your software. And so that you can take data from wherever it's coming in, and then you apply, if it's machine learning or whatever other capabilities to learn from the access methods, how to optimize that data for that engine. >> Matthew: Exactly. >> And then the end users have an optimal experience and it's almost like the data migration service that Amazon has, it's like, you give us your Postgres or Oracle database, and we'll migrate it to the cloud. It sounds like you add a lot of intelligence to that process for decision support workloads. >> Yes. >> And figure out, so now you're going to... It's not Postgres to Postgres, but it might be Teradata to Redshift, or S3, that's going to be accessed by Athena or Redshift, and then let's put that in the right format. >> I think you sort of hit something that we've noticed is very powerful, which is if you can set up, and we've done this with a number of customers, if you can set up at the abstraction layer that is AtScale, on your on-prem data, literally in, say hours, you can move it into the Cloud, obviously you have to write the detail to move it into the Cloud, but once it's in the Cloud you take the same AtScale instance, you re-point it at that new data source, and it works. We've done that with multiple customers, and it's fast and effective, and it let's you actually try out things that you may not have the agility to do before because there's differences in how the SQL dialects work, there's differences in, potentially, how the schema might be built. >> So a couple things I'm interested in, I'm hearing two A-words, that abstraction that we've talked about a number of times, you also mention adaptability. So when you're talking with customers, what are some of the key business outcomes they need to drive, where adaptability and abstraction are concerned, in terms of like cost reduction, revenue generation. What are some of those see-swee business objectives that AtScale can help companies achieve? >> So looking at, say, a customer, a large retailer on the East Coast, everybody knows the stores, they're everywhere, they sell hardware. they have a 20-terabyte cube that they use for day-to-day revenue analytics. So they do period over period analysis. When they're looking at stores, they're looking at things like, we just tried out a new marketing approach... I was talking to somebody there last week about how they have these special stores where they completely redo one area and just see how that works. They have to be able to look at those analytics, and they run those for a short amount of time. So if you're window for getting data, refreshing data, building cubes, which in the old world could take a week, you know my co-founder at Yahoo, he had a week and a half build time. That data is now two weeks old, maybe three weeks old. There might be bugs in it-- >> And the relevance might be, pshh... >> And the relevance goes down, or you can't react as fast. I've been at companies where... Speed is so important these days, and the new companies that are grasping data aggressively, putting it somewhere where they can make decisions on it on a day-to-day basis, they're winning. And they're spending... I was at a company that was spending three million dollars on pay-per-click data, a month. If you can't get data everyday, you're on the wrong campaigns, and everything goes off the rails, and you only learn about it a week later, that's 25% of your spend, right there, gone. >> So the biggest thing, sorry George, it really sounds to me like what AtScale can facilitate for probably customers in any industry is the ability to truly make data-driven business decisions that can really directly affect revenue and profit. >> Yes, and in an agile format. So, you can build-- >> That's the third A; agile, adaptability, abstraction. >> There ya go, the three A's. (Lisa laughs) We had the three V's, now we have the three A's. >> Yes. >> The fact that you're building a curated model, so in retail the calendars are complex. I'm sure everybody that uses Tableau is good at analyzing data, but they might not know what your rules are around your financial calendar, or around the hierarchies of your product. There's a lot of things that happen where you want an enterprise group of data modelers to build it, bless it, and roll it out, but then you're a user, and you say, wait, you forgot x, y, and z, I don't want to wait a week, I don't want to wait two weeks, three weeks, a month, maybe more. I want that data to be available in the model an hour later 'cause that's what I get with Tableau today. And that's where we've taken the two approaches of enterprise analytics and self-service, and tried to create a scenario where you get the best of both worlds. >> So, we know that an implication of what you're telling us is that insights are perishable, and latency is becoming more and more critical. How do you plan to work with streaming data where you've got a historical archive, but you've got fresh data coming in? But fresh could mean a variety of things. Tell us what some of those scenarios look like. >> Absolutely, I think there's two approaches to this problem, and I'm seeing both used in practice, and I'm not exactly sure, although I have some theories on which one's going to win. In one case, you are streaming everything into, sort of a... like I talked about, this data lake, S3, and you're putting it in a format like ParK, and then people are accessing it. The other way is access the data where it is. Maybe it's already in, this is a common BI scenario, you have a big data store, and then you have a dimensional data store, like Oracle has your customers, Hadoop has machine data about those customers accessing on their mobile devices or something. If there was some way to access those data without having to move the Oracle stuff into the big data store, that's a Federation story that I think we've talked about in the Bay Area for a long time, or around the world for a long time. I think we're getting closer to understanding how we can do that in practice, and have it be tenable. You don't move the big data around, you move the small data around. For data coming in from outside sources it's probably a little bit more difficult, but it is kind of a degenerate version of the same story. I would say that streaming is gaining a lot of momentum, and with what we do, we're always mapping, because of the governance piece that we've built into the product, we're always mapping where did the data come from, where did it land, and how did we use it to build summary tables. So if we build five summary tables, 'cause we're answering different types of questions, we still need to know that it goes back to this piece of data, which has these security constraints, and these audit requirements, and we always track it back to that, and we always apply those to our derived data. So when you're accessing this automatically ETLed summary tables, it just works the way it is. So I think that there are two ways that this is going to expand and I'm excited about Federation because I think the time has come. I'm also excited about streaming. I think they can serve two different use cases, and I don't actually know what the answer will be, because I've seen both in customers, it's some of the biggest customers we have. >> Well Matthew thank you so much for stopping by, and four A's, AtScale can facilitate abstraction, adaptability, and agility. >> Yes. Hashtag four A's. >> There we go. I don't even want credit for that. (laughs) >> Oh wow, I'm going to get five more followers, I know it! (George laughs) >> There ya go! >> We want to thank you for watching theCUBE, I am Lisa Martin, we are live in San Jose, at our event Big Data SV, I'm with George Gilbert. Stick around, we'll be back with our next guest after a short break. (techno music)

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media, We are down the street from the Strata Data Conference. Thanks for having me. because it can spawn the genesis that is the crown jewels of your business. So if we take these, that can then deliver it to your end users. and the curated and it's simple, versus sort of the wild west, And it delivers it to the user that way. and they're going to deliver me the stuff and then you give them access to it? The data structures, the way that you store the data, And so that you can take data and it's almost like the data migration service but it might be Teradata to Redshift, and it let's you actually try out things they need to drive, and just see how that works. And the relevance goes down, or you can't react as fast. is the ability to truly make data-driven business decisions Yes, and in an agile format. We had the three V's, now we have the three A's. where you get the best of both worlds. How do you plan to work with streaming data and then you have a dimensional data store, and four A's, AtScale can facilitate abstraction, Yes. I don't even want credit for that. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
Matthew	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Matthew Baird	PERSON	0.99+
George	PERSON	0.99+
San Jose	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
25%	QUANTITY	0.99+
Gardner	PERSON	0.99+
two approaches	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two weeks	QUANTITY	0.99+
Redshift	TITLE	0.99+
S3	TITLE	0.99+
three million dollars	QUANTITY	0.99+
two ways	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
one case	QUANTITY	0.99+
85%	QUANTITY	0.99+
last week	DATE	0.99+
a month	QUANTITY	0.99+
Century	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
a week	QUANTITY	0.99+
BigQuery	TITLE	0.99+
both	QUANTITY	0.99+
20-terabyte	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
a week and a half	QUANTITY	0.99+
a week later	DATE	0.99+
Data Lake 2.0	COMMERCIAL_ITEM	0.99+
two	QUANTITY	0.99+
tomorrow morning	DATE	0.99+
AtScale	ORGANIZATION	0.99+
Atlas	ORGANIZATION	0.99+
Bay Area	LOCATION	0.98+
Lisa	PERSON	0.98+
ParK	TITLE	0.98+
Tableau	TITLE	0.98+
five more followers	QUANTITY	0.98+
an hour later	DATE	0.98+
Ranger	ORGANIZATION	0.98+
Netezza	ORGANIZATION	0.98+
tonight	DATE	0.97+
today	DATE	0.97+
both worlds	QUANTITY	0.97+
about 8,000 users	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
Strata Data Conference	EVENT	0.97+
one	QUANTITY	0.97+
Big Data SV 2018	EVENT	0.97+
Teradata	ORGANIZATION	0.96+
AtScale	TITLE	0.96+
Big Data SV	EVENT	0.93+
East Coast	LOCATION	0.93+
Hadoop	TITLE	0.92+
two different use cases	QUANTITY	0.92+
day one	QUANTITY	0.91+
one area	QUANTITY	0.91+

Scott Gnau, Hortonworks | Big Data SV 2018

>> Narrator: Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to the Cube's continuing coverage of Big Data SV. >> This is out tenth Big Data event, our fifth year in San Jose. We are down the street from the Strata Data Conference. We invite you to come down and join us, come on down! We are at Forager Tasting Room & Eatery, super cool place. We've got a cocktail event tonight, and a endless briefing tomorrow morning. We are excited to welcome back to the Cube, Scott Gnau, the CTO of Hortonworks. Hey, Scott, welcome back. >> Thanks for having me, and I really love what you've done with the place. I think there's as much energy here as I've seen in the entire show. So, thanks for having me over. >> Yeah! >> We have done a pretty good thing to this place that we're renting for the day. So, thanks for stopping by and talking with George and I. So, February, Hortonworks announced some news about Hortonworks DataFlow. What was in that announcement? What does that do to help customers simplify data in motion? What industries is it going to be most impactful for? I'm thinking, you know, GDPR is a couple months away, kind of what's new there? >> Well, yeah, and there are a couple of topics in there, right? So, obviously, we're very committed to, which I think is one of our unique value propositions, is we're committed to really creating an easy to use data management platform, as it were, for the entire lifecycle of data, from one data created at the edge and as data are streaming from one place to another place, and, at rest, analytics get run, analytics get pushed back out to the edge. So, that entire lifecycle is really the footprint that we're looking at, and when you dig a level into that, obviously, the data in motion piece is usually important, and So I think one a the things that we've looked at is we don't want to be just a streaming engine or just a tool for creating pipes and data flows and so on. We really want to create that entire experience around what needs to happen for data that's moving, whether it be acquisition at the edge in a protected way with provenance and encryption, whether it be applying streaming analytics as the data are flowing and everywhere kind of in between, and so that's what HDF represents, and what we released in our latest release, which, to your point, was just a few weeks ago, is a way for our customers to go build their data in motion applications using a very simple drag and drop GUI interface. So, they don't have to understand all of the different animals in the zoo, and the different technologies that are in play. It's like, "I want to do this." Okay, here's a GUI tool, you can have all of the different operators that are represented by the different underlying technologies that we provide as Hortonworks DataFlow, and you can stream them together, and then, you can make those applications and test those applications. One of the biggest enhancements that we did, is we made it very easy then for once those things are built in a laptop environment or in a dev environment, to be published out to production or to be published out to other developers who might want to enhance them and so on. So, the idea is to make it consumable inside of an enterprise, and when you think about data in motion and IOT and all those use cases, it's not going to be one department, one organization, or one person that's doing it. It's going to be a team of people that are distributed just like the data and the sensors, and, so, being able to have that sharing capability is what we've enhanced in the experience. >> So, you were just saying, before we went live, that you're here having speed dates with customers. What are some of the things... >> It's a little bit more sincere than that, but yeah. >> (laughs) Isn't speed dating sincere? It's 2018, I'm not sure. (Scott laughs) What are some of the things that you're hearing from customers, and how is that helping to drive what's coming out from Hortonworks? >> So, the two things that I'm hearing right, number one, certainly, is that they really appreciate our approach to the entire lifecycle of data, because customers are really experiencing huge data volume increases and data just from everywhere, and it's no longer just from the ERP system inside the firewall. It's from third party, it's from Sensors, it's from mobile devices, and, so, they really do appreciate kind of the territory that we cover with the tools and technologies we bring to market, and, so, that's been very rewarding. Clearly, customers who are now well into this path, they're starting to think about, in this new world, data governance, and data governance, I just took all of the energy out of the room, governance, it sounds like, you know, hard. What I mean by data governance, really, is customers need to understand, with all of this diverse, connected data everywhere, in the cloud, on PRIM, then Sensors, third party, partners, is, frankly, they need a trail of breadcrumbs that say what is it, where'd it come from, who had access to it, and then, what did they do with it? If you start to piece that together, that's what they really need to understand, the data estate that belongs to them, so they can turn that into refined product, and, so, when you then segway in one of your earlier questions, that GDPR is, certainly, a triggering point where if it's like, okay, the penalties are huge, oh my God, it's a whole new set of regulations that I have to comply with, and when you think about that trail of breadcrumbs that I just described, that actually becomes a roadmap for compliance under regulations like GDPR, where if a European customer calls up and says, "Forget my data.", the only way that you can guarantee that you forgot that person's data, is to actually understand where it all is, and that requires proper governance, tools, and techniques, and, so, when I say governance, it's, really, not like, you know, the governor and the government, and all that. That's an aspect, but the real, important part is how do I keep all of that connectivity so that I can understand the landscape of data that I've got access to, and I'm hearing a lot of energy around that, and when you think about an IOT kind of world, distributed processing, multiple hybrid cloud footprints, data is just everywhere, and, so, the perimeter is no longer fixed, it's kind of variable, and being able to keep track of that is a very important thing for our customers. >> So, continuing on that theme, Scott. Data lakes seem to be the first major new repository we added after we had data warehouses and data marts, and it looked like the governance solutions were sort of around that perimeter of the data lake. Tell us, you were alluding to, sort of, how many more repositories, whether at rest or in motion, there are for data. Do we have to solve the governance problem end-to-end before we can build meaningful applications? >> So, I would argue personally, that governance is one of the most strategic things for us as an industry, collectively, to go solve in a universal way, and what I mean by that, is throughout my career, which is probably longer than I'd like to admit, in an EDW centric world, where things are somewhat easier in terms of the perimeter and where the data came from, data sources were much more controlled, typically ERP systems, owned wholly by a company. Even in that era, true data governance, meta data management, and that provenance was never really solved adequately. There were 300 different solutions, none of which really won. They were all different, non-compatible, and the problem was easier. In this new world, with connected data, the problem is infinitely more difficult to go solve, and, so, that same kind of approach of 300 different proprietary solutions I don't think is going to work. >> So, tell us, how does that approach have to change and who can make that change? >> So, one of the things, obviously, that we're driving is we're leveraging our position in the open community to try to use the community to create that common infrastructure, common set of APIs for meta data management, and, of course, we call that Apache Atlas, and we work with a lot of partners, some of whom are customers, some of whom are other vendors, even some of whom could be considered competitors, to try to drive an Apache open source kind of project to become that standard layer that's common into which vendors can bring their applications. So, now, if I have a common API for tracking meta data in that trail of breadcrumbs that's commonly understood, I can bring in an application that helps customers go develop the taxonomy of the rules that they want to implement, and, then, that helps visualize all of the other functionality, which is also extremely important, and that's where I think specialization comes into play, but having that common infrastructure, I think, is a really important thing, because that's going to enable data, data lakes, IOT to be trusted, and if it's not trusted, it's not going to be successful. >> Okay, there's a chicken and an egg there it sounds like, potentially. >> Am I the chicken or the egg? >> Well, you're the CTO. (Lisa laughs) >> Okay. >> The thing I was thinking of was, the broader the scope of trust that you're trying to achieve at first, the more difficult the problem, do you see customers wanting to pick off one high value application, not necessarily that's about managing what's in Atlas, in the meta data, so much as they want to do an IOT app and they'll implement some amount of governance to solve that app. In other words, which comes first? Do they have to do the end-to-end meta data management and governance, or do they pick a problem off first? >> In this case, I think it's chicken or egg. I mean, you could start from either point. I see customers who are implementing applications in the IOT space, and they're saying, "Hey, this requires a new way to think of governance, "so, I'm going to go and build that out, but I'm going to "think about it being pluggable into the next app." I also see a lot of customers, especially in highly regulated industries, and especially in highly regulated jurisdictions, who are stepping back and saying, "Forget the applications, this is a data opportunity, "and, so, I want to go solve my data fabric, "and I want to have some consistency across "that data fabric into which I can publish data "for specific applications and guarantee "that, wholistically, I am compliant "and that I'm sitting inside of our corporate mission "and all of those things." >> George: Okay. >> So, one of the things you mention, and we talk about this a lot, is the proliferation of data. It's so many, so many different sources, and companies have an opportunity, you had mentioned the phrase data opportunity, there is massive opportunity there, but you said, you know, from even a GDR perspective alone, I can't remove the data if I don't know where it is to the breadcrumbs. As a marketer, we use terms like get a 360 degree view of your customer. Is that actually really something that customers can achieve leveraging a data. Can they actually really get, say a retailer, a 360, a complete view of their customer? >> Alright, 358. >> That's pretty good! >> And we're getting there. (Lisa laughs) Yeah, I mean, obviously, the idea is to get a much broader view, and 360 is a marketing term. I'm not a marketing person, >> Yes. But it, certainly, creates a much broader view of highly personalized information that help you interact with your customer better, and, yes, we're seeing customers do that today and have great success with it and actually change and build new business models based on that capability, for sure. The folks who've done that have realized that in this new world, the way that that works is you have to have a lot of people have access to a lot of data, and that's scary, because that's not the way it used to be, right? >> Right. >> It used to be you go to the DBA and you ask for access, and then, your boss has to sign off and say it's what you asked for. In this world, you need to have access to all of it. So, when you think about this new governance capability where as part of the governance integrated with security, personalized information can be encrypted, it can be blurred out, but you still have access to the data to look at the relationships to be found in the data to build out those sophisticated models. So, that's where not only is it a new opportunity for governance just because the sources, the variety at the different landscape, but it's, ultimately, very much required, because if you're the CSO, you're not going to give access to the marketing team all of its customer data unless you understand that, right, but it has to be, "I'm just giving it to you, "and I know that it's automatically protected." versus, "I'm going to let you ask for it." to be successful. >> Right. >> I guess, following up on that, it sounds like what we were talking about, chicken or egg. Are you seeing an accelerating shift from where data is sort of collected, centrally, from applications, or, what we hear on Amazon, is the amount coming off the edge is accelerating. >> It is, and I think that that is a big drive to, frankly, faster clouded option, you know, the analytic space, particularly, has been a laggard in clouded option for many reasons, and we've talked about it previously, but one of the biggest reasons, obviously, is that data has gravity, data movement is expensive, and, so, now, when you think about where data is being created, where it lives, being further out on the edge, and may live its entire lifecycle in the cloud, you're seeing a reversal of gravity more towards cloud, and that, again, creates more opportunities in terms of driving a more varied perimeter and just keeping track of where all the assets are. Finally, I think it also leads to this notion of managing entire lifecycle of data. One of the implications of that is if data is not going to be centralized, it's going to live in different places, applications have to be portable to move to where the data exists. So, when I think about that landscape of creating ubiquitous data management within Hortonworks' portfolio, that's one of the big values that we can create for our customers. Not only can we be an on-ramp to their hybrid architecture, but as we become that on-ramp, we can also guarantee the portability of the applications that they've built out to those cloud footprints and, ultimately, even out to the edge. >> So, a quick question, then, to clarify on that, or drill down, would that mean you could see scenarios where Hortonworks is managing the distribution of models that do the inferencing on the edge, and you're collecting, bringing back the relevant data, however that's defined, to do the retraining of any models or recreation of new models. >> Absolutely, absolutely. That's one of the key things about the NiFi project in general and Hortonworks DataFlow, specifically, is the ability to selectively move data, and the selectivity can be based on analytic models as well. So, the easiest case to think about is self-driving cars. We all understand how that works, right? A self-driving car has cameras, and it's looking at things going on. It's making decisions, locally, based on models that have been delivered, and they have to be done locally, because of latency, right, but, selectively, hey, here's something that I saw as an image I didn't recognize. I need to send that up, so that it can be added to my lexicon of what images are and what action should be taken. So, of course, that's all very futuristic, but we understand how that works, but that has application in things that are very relevant today. Think about jet engines that have diagnostics running. Do I need to send that terabyte of data an hour over an expensive thing? No, but I have a model that runs locally that says, "Wow, this thing looks interesting. "Let me send a gigabyte now for immediate action." So, that decision making capability is extremely important. >> Well, Scott, thanks so much for taking some time to come chat with us once again on the Cube. We appreciate your insights. >> Appreciate it, time flies. This is great. >> Doesn't it? When you're having fun! >> Yeah. >> Alright, we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at Forager Tasting Room in downtown San Jose at our own event, Big Data SV. We'd love for you to come on down and join us tonight, today, tonight, and tomorrow. Stick around, we'll be right back with our next guest after a short break. (techno music) >> Narrator: Since the dawn of the cloud, the Cube

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media Welcome back to the Cube's We are down the street from the Strata Data Conference. as I've seen in the entire show. What does that do to help customers simplify data in motion? So, the idea is to make it consumable What are some of the things... It's a little bit more from customers, and how is that helping to drive what's that I have to comply with, and when you think and it looked like the governance solutions the problem is infinitely more difficult to go solve, So, one of the things, obviously, Okay, there's a chicken and an egg there it sounds like, Well, you're the CTO. of governance to solve that app. "so, I'm going to go and build that out, but I'm going to So, one of the things you mention, is to get a much broader view, that help you interact with your customer better, in the data to build out those sophisticated models. off the edge is accelerating. if data is not going to be centralized, of models that do the inferencing on the edge, is the ability to selectively move data, to come chat with us once again on the Cube. This is great. Alright, we want to thank you for watching the Cube.

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Scott Gnau	PERSON	0.99+
Lisa Martin	PERSON	0.99+
San Jose	LOCATION	0.99+
February	DATE	0.99+
360 degree	QUANTITY	0.99+
2018	DATE	0.99+
tomorrow	DATE	0.99+
358	OTHER	0.99+
GDPR	TITLE	0.99+
today	DATE	0.99+
tomorrow morning	DATE	0.99+
fifth year	QUANTITY	0.99+
tonight	DATE	0.99+
Lisa	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Hortonworks'	ORGANIZATION	0.99+
one department	QUANTITY	0.99+
one organization	QUANTITY	0.99+
two things	QUANTITY	0.99+
360	QUANTITY	0.98+
one person	QUANTITY	0.98+
one	QUANTITY	0.98+
Cube	ORGANIZATION	0.97+
Strata Data Conference	EVENT	0.96+
300 different solutions	QUANTITY	0.96+
an hour	QUANTITY	0.95+
One	QUANTITY	0.95+
tenth	QUANTITY	0.95+
300 different proprietary solutions	QUANTITY	0.95+
Big Data SV 2018	EVENT	0.93+
few weeks ago	DATE	0.92+
one data	QUANTITY	0.87+
Atlas	TITLE	0.86+
Hortonworks DataFlow	ORGANIZATION	0.85+
Big Data	EVENT	0.85+
Cube	COMMERCIAL_ITEM	0.84+
Silicon Valley	LOCATION	0.83+
European	OTHER	0.82+
DBA	ORGANIZATION	0.82+
Apache	TITLE	0.79+
Tasting	ORGANIZATION	0.76+
Apache	ORGANIZATION	0.73+
CTO	PERSON	0.72+
Sensors	ORGANIZATION	0.71+
downtown San Jose	LOCATION	0.7+
Forager Tasting Room	LOCATION	0.67+
SV	EVENT	0.66+
terabyte of data	QUANTITY	0.66+
NiFi	ORGANIZATION	0.64+
Forager	LOCATION	0.62+
Narrator:	TITLE	0.6+
Big Data	ORGANIZATION	0.55+
Room	LOCATION	0.52+
Eatery	ORGANIZATION	0.45+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Forager Tasting Room :