David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018
>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)
SUMMARY :
Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
David Abercrombie | PERSON | 0.99+ |
Michael Nixon | PERSON | 0.99+ |
Michael | PERSON | 0.99+ |
June | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
Scala | TITLE | 0.99+ |
first | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
five-minute | QUANTITY | 0.99+ |
Snowflake | TITLE | 0.99+ |
Christmas | EVENT | 0.98+ |
Strata Data Conference | EVENT | 0.98+ |
three-layer | QUANTITY | 0.98+ |
first day | QUANTITY | 0.98+ |
a dozen | QUANTITY | 0.98+ |
two petabytes | QUANTITY | 0.97+ |
Sharethrough | ORGANIZATION | 0.97+ |
JSON | TITLE | 0.97+ |
SQL | TITLE | 0.96+ |
one place | QUANTITY | 0.95+ |
six months | QUANTITY | 0.94+ |
Forager Tasting Room & Eatery | ORGANIZATION | 0.91+ |
today | DATE | 0.89+ |
Snowflake | ORGANIZATION | 0.87+ |
Spark | TITLE | 0.87+ |
12 billion impressions a month | QUANTITY | 0.87+ |
Machine Learning | TITLE | 0.84+ |
Big Data | ORGANIZATION | 0.84+ |
billions of impressions | QUANTITY | 0.8+ |
CuBOL | TITLE | 0.79+ |
Big Data SV 2018 | EVENT | 0.77+ |
once | QUANTITY | 0.72+ |
theCUBE | ORGANIZATION | 0.63+ |
JSONs | TITLE | 0.61+ |
times | QUANTITY | 0.55+ |
Ziya Ma, Intel | Big Data SV 2018
>> Live from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big data SV. I'm Lisa Martin with my co-host George Gilbert. We're down the street from the Strata Data Conference, hearing a lot of interesting insights on big data. Peeling back the layers, looking at opportunities, some of the challenges, barriers to overcome but also the plethora of opportunities that enterprises alike have that they can take advantage of. Our next guest is no stranger to theCUBE, she was just on with me a couple days ago at the Women in Data Science Conference. Please welcome back to theCUBE, Ziya Ma. Vice President of Software and Services Group and the Director of Big Data Technologies from Intel. Hi Ziya! >> Hi Lisa. >> Long time, no see. >> I know, it was just really two to three days ago. >> It was, well and now I can say happy International Women's Day. >> The same to you, Lisa. >> Thank you, it's great to have you here. So as I mentioned, we are down the street from the Strata Data Conference. You've been up there over the last couple days. What are some of the things that you're hearing with respect to big data? Trends, barriers, opportunities? >> Yeah, so first it's very exciting to be back at the conference again. The one biggest trend, or one topic that's hit really hard by many presenters, is the power of bringing the big data system and data science solutions together. You know, we're definitely seeing in the last few years the advancement of big data and advancement of data science or you know, machine learning, deep learning truly pushing forward business differentiation and improve our life quality. So that's definitely one of the biggest trends. Another thing I noticed is there was a lot of discussion on big data and data science getting deployed into the cloud. What are the learnings, what are the use cases? So I think that's another noticeable trend. And also, there were some presentations on doing the data science or having the business intelligence on the edge devices. That's another noticeable trend. And of course, there were discussion on security, privacy for data science and big data so that continued to be one of the topics. >> So we were talking earlier, 'cause there's so many concepts and products to get your arms around. If someone is looking at AI and machine learning on the back end, you know, we'll worry about edge intelligence some other time, but we know that Intel has the CPU with the Xeon and then this lower power one with Atom. There's the GPU, there's ASICs, FPGAS, and then there are these software layers you know, with higher abstraction layer, higher abstraction level. Help us put some of those pieces together for people who are like saying, okay, I know I've got a lot of data, I've got to train these sophisticated models, you know, explain this to me. >> Right, so Intel is a real solution provider for data science and big data. So at the hardware level, and George, as you mentioned, we offer a wide range of products from general purpose like Xeon to targeted silicon such as FPGA, Nervana, and other ASICs chips like Nervana. And also we provide adjacencies like networking the hardware, non-volatile memory and mobile. You know, those are the other adjacent products that we offer. Now on top of the hardware layer, we deliver fully optimized software solutions stack from libraries, frameworks, to tools and solutions. So that we can help engineers or developers to create AI solutions with greater ease and productivity. For instance, we deliver Intel optimized math kernel library. That leverage of the latest instruction set gives us significant performance boosts when you are running your software on Intel hardware. We also deliver framework like BigDL and for Spark and big data type of customers if they are looking for deep learning capabilities. We also optimize some popular open source deep learning frameworks like Caffe, like TensorFlow, MXNet, and a few others. So our goal is to provide all the necessary solutions so that at the end our customers can create the applications, the solutions that they really need to address their biggest pinpoints. >> Help us think about the maturity level now. Like, we know that the very most sophisticated internet service providers who are sort of all over this machine learning now for quite a few years. Banks, insurance companies, people who've had this. Statisticians and actuaries who have that sort of skillset are beginning to deploy some of these early production apps. Where are we in terms of getting this out to the mainstream? What are some of the things that have to happen? >> To get it to mainstream, there are so many things we could do. First I think we will continue to see the wide range of silicon products but then there are a few things Intel is pushing. For example, we're developing this in Nervana, graph compiler that will encapsulate the hardware integration details and present a consistent API for developers to work with. And this is one thing that we hope that we can eventually help the developer community with. And also, we are collaborating with the end user. Like, from the enterprise segment. For example, we're working with the financial services industry, we're working with a manufacturing sector and also customers from the medical field. And online retailers, trying to help them to deliver or create the data science and analytics solutions on Intel-based hardware or Intel optimized software. So that's another thing that we do. And we're seeing actually very good progress in this area. Now we're also collaborating with many cloud service providers. For instance, we work with some of the top seven cloud service providers, both in the U.S. and also in China to democratize the, not only our hardware, but also our libraries and tools, BigDL, MKL, and other frameworks and libraries so that our customers, including individuals and businesses, can easily access to those building blocks from the cloud. So definitely we're working from different factors. >> So last question in the last couple of minutes. Let's kind of vibe on this collaboration theme. Tell us a little bit about the collaboration that you're having with, you mentioned customers in some highly regulated industries, for as an example. But a little bit to understand what's that symbiosis? What is Intel learning from your customers that's driving Intel's innovation of your technologies and big data? >> That's an excellent question. So Lisa, maybe I can start my sharing a couple of customer use cases. What kind of a solution that we help our customer to address. I think it's always wise not to start a conversation with the customer on technology that you deliver. You want to understand the customer's needs first. And then so that you can provide a solution that really address their biggest pinpoint rather than simply selling technology. So for example, we have worked with an online retailer to better understand their customers' shopping behavior and to assess their customers' preferences and interests. And based upon that analysis, the online retailer made different product recommendations and maximized its customers' purchase potential. And it drove up the retailer's sales. You know, that's one type of use case that we have worked. We also have partnered with the customers from the medical field. Actually, today at the Strata Conference we actually had somebody highlighting, we had a joint presentation with UCSF where we helped the medical center to automate the diagnosis and grading of meniscus lesions. And so today actually, that's all done manually by the radiologist but now that entire process is automated. The result is much more accurate, much more consistent, and much more timely. Because you don't have to wait for the availability of a radiologist to read all the 3D MRI images. And that can all be done by machines. You know, so those are the areas that we work with our customers, understand their business need, and give them the solution they are looking for. >> Wow, the impact there. I wish we had more time to dive into some of those examples. But we thank you so much, Ziya, for stopping by twice in one week to theCUBE and sharing your insights. And we look forward to having you back on the show in the near future. >> Thanks, so thanks Lisa, thanks George for having me. >> And for my co-host George Gilbert, I'm Lisa Martin. We are live at Big Data SV in San Jose. Come down, join us for the rest of the afternoon. We're at this cool place called Forager Tasting and Eatery. We will be right back with our next guest after a short break. (electronic outro music)
SUMMARY :
brought to you by SiliconANGLE Media some of the challenges, barriers to overcome What are some of the things that you're So that's definitely one of the biggest trends. on the back end, So at the hardware level, and George, as you mentioned, What are some of the things that have to happen? and also customers from the medical field. So last question in the last couple of minutes. customers from the medical field. And we look forward to having you We will be right back with our
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
UCSF | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
Ziya Ma | PERSON | 0.99+ |
U.S. | LOCATION | 0.99+ |
International Women's Day | EVENT | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Ziya | PERSON | 0.99+ |
one week | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
twice | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
Strata Data Conference | EVENT | 0.99+ |
one topic | QUANTITY | 0.98+ |
Spark | TITLE | 0.98+ |
both | QUANTITY | 0.98+ |
Intel | ORGANIZATION | 0.98+ |
one thing | QUANTITY | 0.98+ |
three days ago | DATE | 0.98+ |
Women in Data Science Conference | EVENT | 0.97+ |
Strata Conference | EVENT | 0.96+ |
first | QUANTITY | 0.96+ |
BigDL | TITLE | 0.96+ |
TensorFlow | TITLE | 0.96+ |
one type | QUANTITY | 0.95+ |
two | DATE | 0.94+ |
MXNet | TITLE | 0.94+ |
Caffe | TITLE | 0.92+ |
theCUBE | ORGANIZATION | 0.91+ |
one | QUANTITY | 0.9+ |
Software and Services Group | ORGANIZATION | 0.9+ |
Forager Tasting and Eatery | ORGANIZATION | 0.88+ |
Vice President | PERSON | 0.86+ |
Big Data Technologies | ORGANIZATION | 0.84+ |
seven cloud service providers | QUANTITY | 0.81+ |
last couple days | DATE | 0.81+ |
Atom | COMMERCIAL_ITEM | 0.76+ |
Silicon Valley | LOCATION | 0.76+ |
Big Data SV 2018 | EVENT | 0.74+ |
a couple days ago | DATE | 0.72+ |
Big Data SV | ORGANIZATION | 0.7+ |
Xeon | COMMERCIAL_ITEM | 0.7+ |
Nervana | ORGANIZATION | 0.68+ |
Big Data | EVENT | 0.62+ |
last | DATE | 0.56+ |
data | EVENT | 0.54+ |
case | QUANTITY | 0.52+ |
3D | QUANTITY | 0.48+ |
couple | QUANTITY | 0.47+ |
years | DATE | 0.47+ |
Nervana | TITLE | 0.45+ |
Big | ORGANIZATION | 0.32+ |
Blaine Mathieu, VANTIQ | Big Data SV 2018
>> Announcer: Live from San Jose, it's The Cube, presenting Big Data, Silicon Valley. Brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to The Cube. Our continuing coverage of our event, Big Data SV continues. I am Lisa Martin joined by Peter Burris. We're in downtown San Jose at a really cool place called Forager Tasting and Eatery. Come down, hang out with us today as we have continued conversations around all things big data, everything in between. This is our second day here and we're excited to welcome to The Cube the CMO of VANTIQ, Blaine Mathieu. Blaine, great to meet you, great to have you on the program. >> Great to be here, thanks for inviting me. >> So, VANTIQ, you guys are up the street in Walnut Creek. What do you guys do, what are you about, what makes VANTIQ different? >> Well, in a nutshell, VANTIQ is a so called high productivity application development platform to allow developers to build, deploy, and manage so called event driven real time applications, the kind of applications that are critical for driving many of the digital transformation initiatives that enterprises are trying to get on top of these days. >> Digital trasformation, it's a term that can mean so many different things, but today, it's essential for companies to be able to compete, especially enterprise companies with newer companies that are more agile, more modern. But if we peel apart digital transformation, there's so many elements that are essential. How do you guys help companies, enterprises, say, evolve their application architectures that might currently not be able to support an actual transformation to a digital business? >> Well, I think that's a great question, thank you. I think the key to digital trasformation is really a lot around the concept of real time, okay. The reason Uber is disrupting or has disrupted the taxi industry is the old way of doing it was somebody called a taxi and then they waited 30 minutes for a taxi to show up and then they told the taxi where to go and hopefully they got there. Whereas, Uber, turned that into a real time business, right? You called, you pinged something on your phone. They knew your location. They knew the location of the driver. They matched those up, brought 'em together in real time. Already knew where to bring you to and ensured you had the right route and that location. All of this data flowing, all of these actions have been taken in real time. The same thing applies to a disruptor like Netflix, okay? In the old days, Blockbuster used to send you, you know, a leaflet in the mail telling you what the new movies are. Maybe it was personalized for you. Probably not. No, Netflix knows who you are instantly, gives you that information, again, in real time based on what you've done in the past and is able to give you, deliver the movie also, in real time pretty well. Every disruptor you look at around digital transformation is bringing a business or a process that was done slowly and impersonally to make it happen in real time. Unfortunately, enterprise applications and the architectures, as you said a second ago, that are being used in most applications today weren't designed to enable these real time use cases. A great example is sales force. So, a sales force is a pretty standard, what you'd call a request application. So, you make a request, a person, generally, makes a request of the system, system goes into a database, queries that database, find information and then returns it back to the user. And that whole process could take, you know, significant amounts of time, especially if the right data isn't in the database at the time and you have to go request it or find it or create it. A new type of application needs to be created that's not fundamentally database centric, but it's able to take these real time data streams coming in from devices, from people, from enterprise systems, process them in real time and then take an action. >> So, let's pretend I'm a CEO. >> Yeah. >> One of the key things you said, and I want you to explain it better, is event. What is event? What is an event and how does that translate into a digital business decision? >> This notion of complex event processing CEP has been around in technology for a long time and yet, it surprises me still a lot of folks we talk to, CEOs, have never heard of the concept. And, it's very simple really. An event is just something that happens in the context of business. That's as complex and as simple as it is. An event could be a machine increases in temperature by one degree, a car moves from one location to another location. It could be an enterprise system, like an ERP system, you know, approves a PO. It could be a person pressing a button on a mobile device. All of those, or it could be an IOT device putting off a signal about the state of a machine. Increasingly, we're getting a lot of events coming from IOT devices. So, really, any particular interesting business situation or a change in a situation that happens is an event And increasingly driven, as you know, by IOT, by augmented reality, by AI and machine learning, by autonomous vehicles, by all these new real time technologies are spinning off more and more events, streams of these events coming off in rapid fashion and we have to be able to do something about them. >> Let me take a crack at it and you tell me if I've got this right. That, historically, applications have been defined in terms of processes and so, in many respects, there was a very concrete, discreet, well established program, set of steps that were performed and then the transaction took place. And event, it seems to me is, yeah, we generally described it, but it changes in response to the data. >> Right, right. >> So, an event is kind of like an outside in driven by data. >> Right, right. >> System response, whereas, your traditional transaction processing is an inside out driven by a sequence of programmed steps, and that decision might have been made six years ago. So, the event is what's happening right now informed by data versus a transaction, traditional transaction is much more, what did we decide to do six years ago and it just gets sustained. Have I got that right? >> That's right. Absolutely right or six hours ago or even six minutes ago, which might seem wow, six minutes, that's pretty good, but take a use case for a field service agent trying to fix a machine or an air conditioner on top of a building. In today's world now, that air conditioner has hundreds of sensors that are putting off data about the state of that air conditioner in real time. A service tech has the ability to, while the machine is still putting off that data, be able to make repairs and changes and fixes, again, in the moment, see how that is changing the data coming off the machine, and then, continue to make the appropriate repairs in collaboration with a smart system or an application that's helping them. >> That's how identifying patterns about what the problem is, versus some of the old ways was where we had recipe of, you know, steps that you went through in the call center. >> Right, right. And the customer is getting more and more frustrated. >> They got their clipboard out and had the 52 steps they followed to see oh that didn't work, now the next step. No, data can help us do that much more efficiently and effectively if we're able to process it in real time. >> So, in many respects, what we're really talking about is an application world or a world looking forward where the applications, which historically have been very siloed, process driven, to a world where the application function is much more networked together and the application, the output of one application is having a significant impact through data on the performance of an application somewhere else. That seems like it's got the potential to be an extremely complex fabric. (laughing) So, do I wait until I figure all that out (laughing) and then I start building it? Or do I, I mean, how do I do it? Do I start small and create and grow into it? What's the best way for people to start working on this? >> Well, you're absolutely right. Building these complex, geeking out a little bit, you know, asynchronous, non-blocking, so called reactive applications, that's the concept that we've been using in computer science for some time, is very hard, frankly. Okay, it's much easier to build computing systems that process things step one, step, two, step three, in order, but if you have to build a system that is able to take real time inputs or changes at any point in the process at any time and go in a different direction, it's very complex. And, computer scientists have been writing applications like this for decades. It's possible to do, but that isn't possible to do at the speed that companies now want to transform themselves, right? By the time you spec out an application and spend two years writing it, your business competitors have already disrupted you. The requirements have already changed. You need to be much more rapid and agile. And so, the secret sauce to this whole thing is to be able to write these transformative applications or create them, not even write is actually the wrong word to use, to be able to create them. >> Generate them. >> Yeah, generate them in a way which is very fast, does not require a guru level developer and reactive Java or some super low level code that you'd have to use to otherwise do it, so that you can literally have business people help design the applications, conceptually build them almost in real time, get them out into the market, and then be able to modify them as you need to, you know, on the fly. >> If I can build on that for just one second. So, it used to be we had this thing called computer assisted software engineer. >> (laughs) Right, right. >> We were going to operate this very very high level language. It's kind of-- But then, we would use code and build a code and the two of them were separated and so the minute that we deployed, somebody would go off and maintain and the whole thing would break. >> Right, right. >> Do you have that problem? >> No, well, that's exactly right. So, the old, you know, the old, the previous way of doing it was about really modeling an application, maybe visually, drag and drop, but then fundamentally, you created a bunch of code and then your job, as you said after, was to maintain and deploy and manage. >> Try to sustain some connection back up to that beautiful visual model. >> And you probably didn't because that was too much. That was too much work, so forget about the model after that. Instead, what we're able to do these days is to build the applications visually, you know, really for the most part with either super low code or, in many cases, no code because we have the ability to abstract away a lot of the complexity, a lot of the complex code that you'd have to write, we can represent that, okay, with these logical abstractions, create the applications themselves, and then continue to maintain, add to, modify the application using the exact same structure. You're not now stuck on, now you're stuck with 20,000 lines of code that you have to, that you have to edit. You're continuing to run and maintain the application just the way you built it, okay. We've now got to the place in computer science where we can actually do these things. We couldn't do them, you know, 20 years ago with case, but we can absolutely do them now. >> So, I'm hearing from a customer internal perspective a lot of operational efficiencies that VANTIQ can drive. Let's look now from a customer's perspective. What are the business impacts you're able to make? You mentioned the word reactive a minute ago when you were talking about applications, but do you have an example where you've, VANTIQ, has enabled a customer, a business, to be more, to be proactive and be able to identify through, you know, complex event processing, what their customers are doing to be able to deliver relevant messages and really drive revenue, drive profit? >> Right, right. So many, you know, so many great examples. And, I mentioned field service a few minutes ago. I've got a lot of clients in that doing this real time field service using these event processing applications. One that I want to bring up right now is one of the largest global shoe manufacturers, actually, that's a client of VANTIQ. I, unfortunately, can't say the name right now 'cause they want to keep what they're doing under wraps, but we all definitely know the company. And they're using this to manage the security, primarily, around their real time global supply chain. So, they've got a big challenge with companies in different countries redirecting shipments of their shoes, selling them on the gray market, at different prices than what are allowed in different regions of the world. And so, through both sensorizing the packages, the barcode scanning, the enterprise systems bringing all that data together in real time, they can literally tell in the moment is something is be-- If a package is redirected to the wrong region or if literally a shoe or a box of shoes is being sold where it shouldn't be sold at the wrong price. They used to get a monthly report on the activities and then they would go and investigate what happened last month. Now, their fraud detection manager is literally sitting there getting this in real time, saying, oh, Singapore sold a pallet of shoes that they should not have been able to sell five minute ago. Call up the guy in Singapore and have him go down and see what's going on and fix that issue. That's pretty powerful when you think about it. >> Definitely, so like reduction in fraud or increase in fraud detection. Sounds like, too, there's a potential for a significant amount of cost savings to the business, not just meeting the external customer needs, but from a, from a cost perspective reduction. Not just some probably TCO, but in operational expenses. >> For sure, although, I would say most of the digital transformation initiatives, when we talk to CEOs and CIOs, they're not focused as much on cost savings, as they're focused on A, avoiding being disrupted by the next interesting startup, B, creating new lines of business, new revenue streams, finding out a way to do something differently dramatically better than they're currently doing it. It's not only about optimizing or squeezing some cost out of their current application. This thing that we are talking about, I guess you could say it's an improvement on their current process, but really, it's actually something they just weren't even really doing before. Just a total different way of doing fraud detection and managing their global supply chain that they just fundamentally weren't even doing. And now, of course, they're looking at many other use cases across the company, not just in supply chain, but, you know, smart manufacturing, so many use cases. Your point about savings, though, there's, you know, what value does the application itself bring? Then, there's the question of what does it cost to build and maintain and deploy the application itself, right? And, again, with these new visual development tools, they're not modeling tools, you're literally developing the application visually. You know, I've been in so many scenarios where we talked to large enterprises. You know, we talk about what we're doing, like we talk about right now, and they say, okay, we'd love to do a POC, proof of concept. We want to allocate six months for this POC, like normally you would probably do for building most enterprise applications. And, we inevitably say, well, how about Friday? How about we have the POC done by Friday? And, you know, we get the Germans laugh, you know, laugh uncomfortably and we go away and deliver the POC by Friday because of how much different it is to build applications this way versus writing low level Java or C-sharp code and sticking together a bunch of technologies and tools 'cause we abstract all that away. And, you know, the eyes drop open and the mouth drops open and it's incredible what modern technology can do to radically change how software is being developed. >> Wow, big impact in a short period of time. That's always a nice thing to be able to deliver. >> It is, it is to-- It's great to be able to surprise people like that. >> Exactly, exactly. Well, Blaine, thank you so much for stopping by, sharing what VANTIQ is doing to help companies be disruptive and for sharing those great customer examples. We appreciate your time. >> You're welcome. Appreciate the time. >> And for my co-host, Peter Burris, I'm Lisa Martin. You're watching The Cube's continuing coverage of our event, Big Data SV Live from San Jose, down the street from the Strata Data Conference. Stick around, we'll be right back with our next guest after a short breal. (techy music)
SUMMARY :
Brought to you by Silicon Angle Media the CMO of VANTIQ, Blaine Mathieu. So, VANTIQ, you guys are up the street in Walnut Creek. for driving many of the digital transformation that might currently not be able to support and the architectures, as you said a second ago, One of the key things you said, in the context of business. in response to the data. So, an event is kind of like an outside in So, the event is what's happening right now and changes and fixes, again, in the moment, of the old ways was where we had recipe of, you know, And the customer is getting more and more frustrated. they followed to see oh that didn't work, and the application, the output of one application And so, the secret sauce to this whole thing to modify them as you need to, you know, on the fly. So, it used to be we had this thing and so the minute that we deployed, So, the old, you know, the old, Try to sustain just the way you built it, okay. but do you have an example where you've, that they should not have been able to sell to the business, not just meeting and deliver the POC by Friday because to be able to deliver. It's great to be able to surprise people Well, Blaine, thank you so much for stopping by, Appreciate the time. down the street from the Strata Data Conference.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Blaine | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Singapore | LOCATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
two years | QUANTITY | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
VANTIQ | ORGANIZATION | 0.99+ |
Blaine Mathieu | PERSON | 0.99+ |
20,000 lines | QUANTITY | 0.99+ |
30 minutes | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
52 steps | QUANTITY | 0.99+ |
Walnut Creek | LOCATION | 0.99+ |
six months | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
one degree | QUANTITY | 0.99+ |
Friday | DATE | 0.99+ |
second day | QUANTITY | 0.99+ |
last month | DATE | 0.99+ |
one second | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
six years ago | DATE | 0.98+ |
both | QUANTITY | 0.98+ |
Strata Data Conference | EVENT | 0.98+ |
Big Data SV Live | EVENT | 0.98+ |
One | QUANTITY | 0.98+ |
The Cube | ORGANIZATION | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
20 years ago | DATE | 0.98+ |
Big Data SV 2018 | EVENT | 0.97+ |
six hours ago | DATE | 0.97+ |
six minutes ago | DATE | 0.97+ |
five minute ago | DATE | 0.97+ |
a minute ago | DATE | 0.96+ |
hundreds of sensors | QUANTITY | 0.95+ |
The Cube | TITLE | 0.94+ |
Blockbuster | ORGANIZATION | 0.91+ |
few minutes ago | DATE | 0.89+ |
step one | QUANTITY | 0.89+ |
step three | QUANTITY | 0.85+ |
Forager Tasting and Eatery | ORGANIZATION | 0.85+ |
decades | QUANTITY | 0.84+ |
six minutes | QUANTITY | 0.84+ |
C | TITLE | 0.83+ |
Big Data | ORGANIZATION | 0.81+ |
one location | QUANTITY | 0.78+ |
one application | QUANTITY | 0.77+ |
second ago | DATE | 0.71+ |
CEP | ORGANIZATION | 0.53+ |
big | ORGANIZATION | 0.52+ |
Germans | PERSON | 0.51+ |
techy | ORGANIZATION | 0.41+ |
Data | EVENT | 0.31+ |
Octavian Tanase, NetApp | Big Data SV 2018
>> Announcer: Live from San Jose it's The Cube presenting Big Data, Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partners. >> Good morning. Welcome to The Cube. We are on day two of our coverage our event Big Data SV. I'm Lisa Martin with my cohost Dave Vellante. We're down the street from the Strata Data Conference. This is The Cube's tenth big data event and we had a great day yesterday learning a lot from myriad guests on very different nuances of big data journey where things are going. We're excited to welcome back to The Cube an alumni, Octavian Tanase, the Senior Vice President of Data ONTAP fron Net App. Octavian, welcome back to The Cube. >> Glad to be here. >> So you've been at the Strata Data Conference for the last couple of days. From a big data perspective, what are some of the things that you're hearing, in terms of from a customer's perspective on what's working, what challenges, opportunities? I'm very excited to be here and learn about the innovation of our partners in the industry and share with our partners and our customers what we're doing to enable them to drive more value out of that data. The reality is that data has become the 21st Century gold or oil that powers the business and everybody's looking to apply new techniques, a lot of times machine learning, deep learning, to draw more value of the data, make better decisions and compete in the marketplace. Octavian, you've been at NetApp now eight years and I've been watching NetApp, as we were talking about offline, for decades and I've seen the ebb and flow and this company has transformed many, many times. The latest, obviously cloud came in, flash came into play and then you're also going through a major transition in the customer based to clustered ONTAP. You seemed to negotiate that. NetApp is back, thriving, stock's up. What's happening at NetApp? What's the culture like these days? Give us the update. >> I think we've been very fortunate to have a CEO like George Kurian, who has been really focused on helping us do basically fewer things better, really focus on our core business, simplify our operations and continue to innovate and this is probably the area that I'm most excited about. It's always good to make sure that you accelerate the business, make it simpler for your customers and your partners to do business with you, but what you have to do is innovate. We are a product company. We are passionate about innovation. I believe that we are innovating with more pace than many of the startups in the space so that's probably the most exciting thing that has been part of our transformation. >> So let's talk about big data. Back in the day if you had a big data problem you would buy a big Unix box, maybe buy some Oracle licenses, try to put all your data into that box and that became your data warehouse. The brilliance of Hadoop was hey we can leave the data where it is. There's too much data to put into the box so we're going to bring five megabytes to code to a petabyte of data. And the other piece of it is CFOs loved it, because we're going to reduce the cost of our expensive data warehouse and we're going to buy off the shelf components: white box, servers and off the shelf disk drives. We're going to put that together and life will be good. Well as things matured, the old client-server days, it got very expensive, you needed enterprise grade. So where does NetApp fit into that equation, because originally big storage companies like NetApp, they weren't part of the equation? Has that changed? >> Absolutely. One of the things that has enabled that transformation, that change is we made a deliberate decision to focus on software defined and making sure that the ONTAP operating system is available wherever data is being created: on the edge in an IoT device, in the traditional data center or in the cloud. So we are in the unique position to enable analytics, big data, wherever those applications reside. One of the things that we've recently done is we've partnered with IDC and what the study, what the analysis has shown is that deploying in analytics, a Hadoop or NoSQL type of solution on top of NetApp is half the cost of DAS. So when you consider the cost of servers, the licenses that you're going to have to pay for, these commercial implementations of Hadoop as well as the storage and the data infrastructure, you are much better off choosing NetApp than a white box type of solution. >> Let's unpack that a little bit, because if I infer correctly from what you said normally you would say the operational costs are going to be dramatically lower, it's easier to manage a professional system like a NetApp ONTAP, it's integrated, great software, but am I hearing you correctly, you're saying the acquisition costs are actually less than if I'm buying white box? A lot of people are going to be skeptical about that, say Octavian no way, it's cheaper to buy white box stuff. Defend that statement. >> Absolutely. If you're looking at the whole solution that includes the server and the storage, what NetApp enables you to do if you're running the solution on top of ONTAP you reduce the need for so many servers. If you reduce that number you also reduce the licensing cost. Moreover, if you actually look at the core value proposition of the storage layer there, DAS typically makes three copies of the data. We don't. We are very greedy and we're making sure that you're using shared storage and we are applying a bunch of storage efficiency techniques to further compress, compact that data for world class storage efficiency. >> So cost efficiency is obviously a great benefit for any company when they're especially evolving, from a digital perspective. What are some of the business level benefits? You mentioned speed a minute ago. What is Data ONTAP and even ONTAP in the cloud enabling your enterprise customers to achieve at the business level, maybe from faster time to market, identifying with machine learning and AI new products? Give me an example of maybe a customer that you think really articulates the value that ONTAP in the cloud can deliver. >> One of the things that's really important is to have your data management capability, whatever the data is being produced so ONTAP being consumed either as a VM or a service ... I don't know if you've seen some of the partnerships that we have with AWS and Azure. We're able to offer the same rich data management capabilities, not only the traditional data center, but in the cloud. What that really enables customers to do is to simplify and have the same operating system, the same data management platform for the both the second platform traditional applications as well as for the third platform applications. I've seen a company like Adobe be very successful in deploying their infrastructure, their services not only on prem in their traditional data center, but using ONTAP Cloud. So we have more than about 1,500 customers right now that have adopted ONTAP in the AWS cloud. >> What are you seeing in terms of the adoption of flash and I'm particularly interested in the intersection of flash adoption and the developer angle, because we've seen, in certain instances, certain organizations are able to share data off of flash much more efficiently that you would be, for instance, of a spinning disk? Have you seen a developer impact in your customer base? >> Absolutely I think most of customers initially have adopted flash, because of high throughput and low latency. I think over time customers really understood and identified with the overall value proposition in cost of ownership in flash that it enables them to consolidate multiple workloads in a smaller footprint. So that enables you to then reduce the cost to operate that infrastructure and it really gives you a range of applications that you can deploy that you were never able to do that. Everybody's looking to do in place, in line analytics that now are possible, because of this fast media. Folks are looking to accelerate old applications in which they cannot invest anymore, but they just want to run faster. Flash also tends to be more reliable than traditional storage, so customers definitely appreciate that fewer things could go wrong so overall the value proposition of flash, it's all encompassing and we believe that in the near future flash will be the defacto standard in everybody's data center, whether it's on prem or in the cloud. >> How about backup and recovery in big data? We obviously, in the enterprise, very concerned about data protection. What's similar in big data? What's different and what's NetApp's angle on that? >> I think data protection and data security will never stop being important to our customers. Security's top of mind for everybody in the industry and it's a source of resume changing events, if you would, and they're typically not promotions. So we have invested a tremendous deal in certifications for HIPAA, for FIPS, we are enabling encryption, both at rest and in flight. We've done a lot of work to make sure that the encryption can happen in software layer, to make sure that we give the customers best storage class efficiency and what we're also leveraging is the innovation that ONTAP has done over many years to protect the data, replicate its snapshots, peering the data to the cloud. These are techniques that we're commonly using to reduce the cost of ownership, also protect the data the customers deploy. >> So security's still a hot topic and, like you said, it probably always will be, but it's a shared responsibility, right? So customers leveraging NetApps safe or on prem hybrid also using Azure or AWS, who's your target audience? If you're talking to the guys and gals that are still managing storage are you also having the CSO or the security guys come in, the gals, to understand we've got this appointment in Azure or AWS so we're going to bring in ONTAP to facilitate this? There's a shared responsibility of security. Who's at the table, from your perspective, in your customers that you need to help understand how they facilitate true security? >> It's definitely been a transformative event where more and more people in IQ organizations are involved in the decisions that are required to deploy the applications. There was a time when we would talk only to the storage admin. After a while we started talking to the application admin, the virtualization admin and now you're talking to the line of business who has that vested interest to make sure that they can harness the power of the data in their environment. So you have the CSO, you have the traditional infrastructure people, you have the app administration and you have the app owner, the business owner that are all at the table that are coming and looking to choose the best of breed solution for their data management. >> What are the conversations like with your CXO, executives? Everybody talks about digital transformation. It's kind of an overused term, but there's real substance when you actually peel the onion. What are you seeing as NetApp's role in effecting digital transformations within your customer base? >> I think we have a vision of how we can help enterprises take advantage of the digital transformation and adopt it. I think we have three tenants of that vision. Number one is we're helping customers harness the power of the cloud. Number two, we're looking to enable them to future proof their investments and build the next generation data center. And number three, nobody starts with a fresh slate so we're looking to help customers modernize their current infrastructure through storage. We have a lot of expertise in storage. We've helped, over time, customers time and again adopt disruptive technologies in nondisruptive ways. We're looking to adopt these technologies and trends on behalf of our customers and then help them use them in a seamless safe way. >> And continue their evolution to identify new revenue streams, new products, new opportunities and even probably give other lines of business access to this data that they need to understand is there value here, how can we harness it faster than our competitors, right? >> Absolutely. It's all about deriving value out of the data. I think earlier I called it the gold of the 21st Century. This is a trend that will continue. I believe there will be no enterprise or center that won't focus on using machine learning, deep learning, analytics to derive more value out of the data to find more customer touch points, to optimize their business to really compete in the marketplace. >> Data plus AI plus cloud economics are the new innovation drivers of the next 10, 20 years. >> Completely agree. >> Well Octavian thanks so much for spending time with us this morning sharing what's new at NetApp, some of the visions that you guys have and also some of the impact that you're making with customers. We look forward to having you back on the program in the near future. >> Thank you. Appreciate having the time. >> And for my cohost Dave Vellante I'm Lisa Martin. You're watching The Cube live on day two of coverage of our event, Big Data SV. We're at this really cool venue, Forager Tasting Room. Come down here, join us, get to hear all these great conversations. Stick around and we'll be right back with our next guest after a short break. (electronic music)
SUMMARY :
brought to you by SiliconANGLE Media We're down the street from the Strata Data Conference. in the customer based to clustered ONTAP. that you accelerate the business, Back in the day if you had a big data problem and making sure that the ONTAP operating system A lot of people are going to be skeptical about that, that includes the server and the storage, that ONTAP in the cloud can deliver. that have adopted ONTAP in the AWS cloud. to operate that infrastructure and it really gives you We obviously, in the enterprise, peering the data to the cloud. that you need to help understand that are required to deploy the applications. What are the conversations like with your CXO, executives? and build the next generation data center. out of the data to find more customer touch points, are the new innovation drivers of the next 10, 20 years. We look forward to having you back on the program Appreciate having the time. get to hear all these great conversations.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
George Kurian | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Octavian Tanase | PERSON | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Octavian | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
eight years | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
NetApp | TITLE | 0.99+ |
Hadoop | TITLE | 0.99+ |
five megabytes | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
second platform | QUANTITY | 0.99+ |
21st Century | DATE | 0.99+ |
HIPAA | TITLE | 0.99+ |
Strata Data Conference | EVENT | 0.99+ |
yesterday | DATE | 0.99+ |
ONTAP | TITLE | 0.99+ |
The Cube | TITLE | 0.99+ |
IDC | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
Unix | COMMERCIAL_ITEM | 0.98+ |
NetApp | ORGANIZATION | 0.97+ |
The Cube | ORGANIZATION | 0.97+ |
Silicon Valley | LOCATION | 0.96+ |
ONTAP Cloud | TITLE | 0.95+ |
more than about 1,500 customers | QUANTITY | 0.95+ |
NetApps | TITLE | 0.93+ |
Big Data SV | EVENT | 0.93+ |
Big Data SV 2018 | EVENT | 0.93+ |
day two | QUANTITY | 0.93+ |
Forager Tasting Room | LOCATION | 0.88+ |
NoSQL | TITLE | 0.87+ |
Azure | ORGANIZATION | 0.86+ |
third platform applications | QUANTITY | 0.81+ |
a minute ago | DATE | 0.81+ |
Number two | QUANTITY | 0.8+ |
Senior Vice President | PERSON | 0.79+ |
three tenants | QUANTITY | 0.78+ |
decades | QUANTITY | 0.74+ |
a petabyte of data | QUANTITY | 0.73+ |
tenth big | QUANTITY | 0.71+ |
Number one | QUANTITY | 0.71+ |
three copies | QUANTITY | 0.7+ |
this morning | DATE | 0.69+ |
number three | QUANTITY | 0.68+ |
ONTAP | ORGANIZATION | 0.67+ |
Data ONTAP | ORGANIZATION | 0.64+ |
event | QUANTITY | 0.64+ |
Net App | TITLE | 0.64+ |
10 | QUANTITY | 0.64+ |
half | QUANTITY | 0.6+ |
flash | TITLE | 0.58+ |
much | QUANTITY | 0.58+ |
Big Data | EVENT | 0.57+ |
years | QUANTITY | 0.55+ |
Sastry Malladi, FogHorn | Big Data SV 2018
>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partner. (upbeat electronic music) >> Welcome back to The Cube. I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV, in downtown San Jose down the street from the Strata Data Conference. We're joined by a new guest to theCUBE, Sastry Malladi, the CTO Of FogHorn. Sastry, welcome to theCUBE. >> Thank you, thank you, Lisa. >> So FogHorn, cool name, what do you guys do, who are you? Tell us all that good stuff. >> Sure. We are a startup based in Silicon Valley right here in Mountain View. We started about three years ago, three plus years ago. We provide edge computing intelligence software for edge computing or fog computing. That's how our company name got started is FogHorn. For our particularly, for our IoT industrial sector. All of the industrial guys, whether it's transportation, manufacturing, oil and gas, smart cities, smart buildings, any of those different sectors, they use our software to predict failure conditions in real time, or do condition monitoring, or predictive maintenance, any of those use cases and successfully save a lot of money. Obviously in the process, you know, we get paid for what we do. >> So Sastry... GE populized this concept of IIoT and the analytics and, sort of the new business outcomes you could build on it, like Power by the Hour instead of selling a jet engine. >> Sastry: That's right. But there's... Actually we keep on, and David Floor did some pioneering research on how we're going to have to do a lot of analytics on the edge for latency and bandwidth. What's the FogHorn secret sauce that others would have difficulty with on the edge analytics? >> Okay, that's a great question. Before I directly answer the question, if you don't mind, I'll actually even describe why that's even important to do that, right? So a lot of these industrial customers, if you look at, because we work with a lot of them, the amount of data that's produced from all of these different machines is terabytes to petabytes of data, it's real. And it's not just the traditional digital sensors but there are video, audio, acoustic sensors out there. The amount of data is humongous, right? It's not even practical to send all of that to a Cloud environment and do data processing, for many reasons. One is obviously the connectivity, bandwidth issues, and all of that. But the two most important things are cyber security. None of these customers actually want to connect these highly expensive machines to the internet. That's one. The second is the lack of real-time decision making. What they want to know, when there is a problem, they want to know before it's too late. We want to notify them it is a problem that is occurring so that have a chance to go fix it and optimize their asset that is in question. Now, existing solutions do not work in this constrained environment. That's why FogHorn had to invent that solution. >> And tell us, actually, just to be specific, how constrained an environment you can operate in. >> We can run in about less than 100 to 150 megabytes of memory, single-core to dual-core of CPU, whether it's an ARM processor, an x86 Intel-based processor, almost literally no storage because we're a real-time processing engine. Optionally, you could have some storage if you wanted to store some of the results locally there but that's the kind of environment we're talking about. Now, when I say 100 megabytes of memory, it's like a quarter of Raspberry Pi, right? And even in that environment we have customers that run dozens of machinery models, right? And we're not talking -- >> George: Like an ensemble. >> Like an anomaly detection, a regression, a random forest, or a clustering, or a gamut, some of those. Now, if we get into more deep learning models, like image processing and neural net and all of that, you obviously need a little bit more memory. But what we have shown, we could still run, one of our largest smart city buildings customer, elevator company, runs in a raspberry Pi on millions of elevators, right? Dozens of machinery algorithms on top of that, right? So that's the kind of size we're talking about. >> Let me just follow up with one question on the other thing you said, with, besides we have to do the low-latency locally. You said a lot of customers don't want to connect these brown field, I guess, operations technology machines to the internet, and physically, I mean there was physical separation for security. So it's like security, Bill Joy used to say "Security by obscurity." Here it's security by -- >> Physical separation, absolutely. Tell me about it. I was actually coming from, if you don't mind, last week I was in Saudi Arabia. One of the oil and gas plants where we deployed our software, you have to go to five levels of security even to get to there, It's a multibillion dollar plant and refining the gas and all of that. Completely offline, no connectivity to the internet, and we installed, in their existing small box, our software, connected to their live video cameras that are actually measuring the stuff, doing the processing and detecting the specific conditions that we're looking for. >> That's my question, which was if they want to be monitoring. So there's like one low level, really low hardware low level, the sensor feeds. But you could actually have a richer feed, which is video and audio, but how much of that, then, are you doing the, sort of, inferencing locally? Or even retraining, and I assume that since it's not the OT device, and it's something that's looking at it, you might be more able to send it back up the Cloud if you needed to do retraining? >> That's exactly right. So the way the model works is particularly for image processing because you need, it's a more complex process to train than create a model. You could create a model offline, like in a GPU box, an FPGA box and whatnot. Import and bring the model back into this small little device that's running in the plant, and now the live video data is coming in, the model is inferencing the specific thing. Now there are two ways to update and revise the model: incremental revision of the model, you could do that if you want, or you can send the results to a central location. Not internet, they do have local, in this example for example a PIDB, an OSS PIDB, or some other local service out there, where you have an opportunity to gather the results from each of these different locations and then consolidate and retrain the model, put the model back again. >> Okay, the one part that I didn't follow completely is... If the model is running ultimately on the device, again and perhaps not even on a CPU, but a programmable logic controller. >> It could, even though a programmable controller also typically have some shape of CPU there as well. These days, most of the PLCs, programmable controllers, have either an RM-based processor or an x86-based processor. We can run either one of those too. >> So, okay, assume you've got the model deployed down there, for the, you know, local inferencing. Now, some retraining is going to go on in the Cloud, where you have, you're pulling in the richer perspective from many different devices. How does that model get back out to the device if it doesn't have the connectivity between the device and the Cloud? >> Right, so if there's strictly no connectivity, so what happens is once the model is regenerated or retrained, they put a model in a USB stick, it's a low attack. USB stick, bring it to the PLC device and upload the model. >> George: Oh, so this is sort of how we destroyed the Iranian centrifuges. >> That's exactly right, exactly right. But you know, some other environments, even though it's not connectivity to the Cloud environment, per se, but the devices have the ability to connect to the Cloud. Optionally, they say, "Look, I'm the device "that's coming up, do you have an upgraded model for me?" Then it can pull the model. So in some of the environments it's super strict where there are absolutely no way to connect this device, you put it in a USB stick and bring the model back here. Other environments, device can query the Cloud but Cloud cannot connect to the device. This is a very popular model these days because, in other words imagine this, an elevator sitting in a building, somebody from the Cloud cannot reach the elevator, but an elevator can reach the Cloud when it wants to. >> George: Sort of like a jet engine, you don't want the Cloud to reach the jet engine. >> That's exactly right. The jet engine can reach the Cloud it if wants to, when it wants to, but the Cloud cannot reach the jet engine. That's how we can pull the model. >> So Sastry, as a CTO you meet with customers often. You mentioned you were in Saudi Arabia last week. I'd love to understand how you're leveraging and gaging with customers to really help drive the development of FogHorn, in terms of being differentiated in the market. What are those, kind of bi-directional, symbiotic customer relationships like? And how are they helping FogHorn? >> Right, that's actually a great question. We learn a lot from customers because we started a long time ago. We did an initial version of the product. As we begin to talk to the customers, particularly that's part of my job, where I go talk to many of these customers, they give us feedback. Well, my problem is really that I can't even do, I can't even give you connectivity to the Cloud, to upgrade the model. I can't even give you sample data. How do you do that modeling, right? And sometimes they say, "You know what, "We are not technical people, help us express the problem, "the outcome, give me tools "that help me express that outcome." So we created a bunch of what we call OT tools, operational technology tools. How we distinguish ourselves in this process, from the traditional Cloud-based vendor, the traditional data science and data analytics companies, is that they think in terms of computer scientists, computer programmers, and expressions. We think in terms of industrial operators, what can they express, what do they know? They don't really necessarily care about, when you tell them, "I've got an anomaly detection "data science machine algorithm", they're going to look at you like, "What are you talking about? "I don't understand what you're talking about", right? You need to tell them, "Look, this machine is failing." What are the conditions in which the machine is failing? How do you express that? And then we translate that requirement, or that into the underlying models, underlying Vel expressions, Vel or CPU expression language. So we learned a ton from user interface, capabilities, latency issues, connectivity issues, different protocols, a number of things that we learn from customers. >> So I'm curious with... More of the big data vendors are recognizing data in motion and data coming from devices. And some, like Hortonworks DataFlow NiFi has a MiNiFi component written in C plus plus, really low resource footprint. But I assume that that's really just a transport. It's almost like a collector and that it doesn't have the analytics built in -- >> That's exactly right, NiFi has the transport, it has the real-time transport capability for sure. What it does not have is this notion of that CEP concept. How do you combine all of the streams, everything is a time series data for us, right, from the devices. Whether it's coming from a device or whether it's coming from another static source out there. How do you express a pattern, a recognition pattern definition, across these streams? That's where our CPU comes in the picture. A lot of these seemingly similar software capabilities that people talk about, don't quite exactly have, either the streaming capability, or the CPU capability, or the real-time, or the low footprint. What we have is a combination of all of that. >> And you talked about how everything's time series to you. Is there a need to have, sort of an equivalent time series database up in some central location? So that when you subset, when you determine what relevant subset of data to move up to the Cloud, or you know, on-prem central location, does it need to be the same database? >> No, it doesn't need to be the same database. It's optional. In fact, we do ship a local time series database at the edge itself. If you have a little bit of a local storage, you can down sample, take the results, and store it locally, and many customers actually do that. Some others, because they have their existing environment, they have some Cloud storage, whether it's Microsoft, it doesn't matter what they use, we have connectors from our software to send these results into their existing environments. >> So, you had also said something interesting about your, sort of, tool set, as being optimized for operations technology. So this is really important because back when we had the Net-Heads and the Bell-Heads, you know it was a cultural clash and they had different technologies. >> Sastry: They sure did, yeah. >> Tell us more about how selling to operations, not just selling, but supporting operations technology is different from IT technology and where does that boundary live? >> Right, so typical IT environment, right, you start with the boss who is the decision maker, you work with them and they approve the project and you go and execute that. In an industrial, in an OT environment, it doesn't quite work like that. Even if the boss says, "Go ahead and go do this project", if the operator on the floor doesn't understand what you're talking about, because that person is in charge of operating that machine, it doesn't quite work like that. So you need to work bottom up as well, to convincing them that you are indeed actually solving their pain point. So the way we start, where rather than trying to tell them what capabilities we have as a product, or what we're trying to do, the first thing we ask is what is their pain point? "What's your problem? What is the problem "you're trying to solve?" Some customers say, "Well I've got yield, a lot of scrap. "Help me reduce my scrap. "Help me to operate my equipment better. "Help me predict these failure conditions "before it's too late." That's how the problem starts. Then we start inquiring them, "Okay, what kind of data "do you have, what kind of sensors do you have? "Typically, do you have information about under what circumstances you have seen failures "versus not seeing failures out there?" So in the process of inauguration we begin to understand how they might actually use our software and then we tell them, "Well, here, use your software, "our software, to predict that." And, sorry, I want 30 more seconds on that. The other thing is that, typically in an IT environment, because I came from that too, I've been in this position for 30 plus years, IT, UT and all of that, where we don't right away talk about CEP, or expressions, or analytics, and we don't talk about that. We talk about, look, you have these bunch of sensors, we have OT tools here, drag and drop your sensors, express the outcome that you're trying to look for, what is the outcome you're trying to look for, and then we drive behind the scenes what it means. Is it analytics, is it machine learning, is it something else, and what is it? So that's kind of how we approach the problem. Of course, if, sometimes you do surprisingly occasionally run into very technical people. From those people we can right away talk about, "Hey, you need these analytics, you need to use machinery, "you need to use expressions" and all of that. That's kind of how we operate. >> One thing, you know, that's becoming clearer is I think this widespread recognition that's data intensive and low latency work to be done near the edge. But what goes on in the Cloud is actually closer to simulation and high-performance compute, if you want to optimize a model. So not just train it, but maybe have something that's prescriptive that says, you know, here's the actionable information. As more of your data is video and audio, how do you turn that into something where you can simulate a model, that tells you the optimal answer? >> Right, so this is actually a good question. From our experience, there are models that require a lot of data, for example, video and audio. There are some other models that do not require a lot of data for training. I'll give you an example of what customer use cases that we have. There's one customer in a manufacturing domain, where they've been seeing a lot of finished goods failures, there's a lot of scrap and the problem then was, "Hey, predict the failures, "reduce my scrap, save the money", right? Because they've been seeing a lot of failures every single day, we did not need a lot of data to train and create a model to that. So, in fact, we just needed one hour's worth of data. We created a model, put the thing, we have reduced, completely eliminated their scrap. There are other kinds of models, other kinds of models of video, where we can't do that in the edge, so we're required for example, some video files or simulated audio files, take it to an offline model, create the model, and see whether it's accurately predicting based on the real-time video coming in or not. So it's a mix of what we're seeing between those two. >> Well Sastry, thank you so much for stopping by theCUBE and sharing what it is that you guys at FogHorn are doing, what you're hearing from customers, how you're working together with them to solve some of these pretty significant challenges. >> Absolutely, it's been a pleasure. Hopefully this was helpful, and yeah. >> Definitely, very educational. We want to thank you for watching theCUBE, I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV in downtown San Jose. Come stop by Forager Tasting Room, hang out with us, learn as much as we are about all the layers of big data digital transformation and the opportunities. Stick around, we will be back after a short break. (upbeat electronic music)
SUMMARY :
brought to you by SiliconANGLE Media down the street from the Strata Data Conference. what do you guys do, who are you? Obviously in the process, you know, the new business outcomes you could build on it, What's the FogHorn secret sauce that others Before I directly answer the question, if you don't mind, how constrained an environment you can operate in. but that's the kind of environment we're talking about. So that's the kind of size we're talking about. on the other thing you said, with, and refining the gas and all of that. the Cloud if you needed to do retraining? Import and bring the model back If the model is running ultimately on the device, These days, most of the PLCs, programmable controllers, if it doesn't have the connectivity USB stick, bring it to the PLC device and upload the model. we destroyed the Iranian centrifuges. but the devices have the ability to connect to the Cloud. you don't want the Cloud to reach the jet engine. but the Cloud cannot reach the jet engine. So Sastry, as a CTO you meet with customers often. they're going to look at you like, and that it doesn't have the analytics built in -- or the real-time, or the low footprint. So that when you subset, when you determine If you have a little bit of a local storage, So, you had also said something interesting So the way we start, where rather than trying that tells you the optimal answer? and the problem then was, "Hey, predict the failures, and sharing what it is that you guys at FogHorn are doing, Hopefully this was helpful, and yeah. We want to thank you for watching theCUBE,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Saudi Arabia | LOCATION | 0.99+ |
Sastry Malladi | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
one hour | QUANTITY | 0.99+ |
Sastry | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
GE | ORGANIZATION | 0.99+ |
100 megabytes | QUANTITY | 0.99+ |
Lisa | PERSON | 0.99+ |
Bill Joy | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
FogHorn | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
Mountain View | LOCATION | 0.99+ |
30 more seconds | QUANTITY | 0.99+ |
David Floor | PERSON | 0.99+ |
one question | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
30 plus years | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
three plus years ago | DATE | 0.99+ |
one customer | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
second | QUANTITY | 0.98+ |
C plus plus | TITLE | 0.98+ |
One | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
150 megabytes | QUANTITY | 0.98+ |
two ways | QUANTITY | 0.97+ |
Strata Data Conference | EVENT | 0.97+ |
Iranian | OTHER | 0.97+ |
five levels | QUANTITY | 0.95+ |
millions of elevators | QUANTITY | 0.95+ |
about less than 100 | QUANTITY | 0.95+ |
one part | QUANTITY | 0.94+ |
Vel | OTHER | 0.94+ |
One thing | QUANTITY | 0.92+ |
dozens of machinery models | QUANTITY | 0.92+ |
each | QUANTITY | 0.91+ |
Intel | ORGANIZATION | 0.91+ |
FogHorn | PERSON | 0.86+ |
2018 | DATE | 0.85+ |
first thing | QUANTITY | 0.85+ |
single-core | QUANTITY | 0.85+ |
NiFi | ORGANIZATION | 0.82+ |
Power by the Hour | ORGANIZATION | 0.81+ |
about three years ago | DATE | 0.81+ |
Forager Tasting R | ORGANIZATION | 0.8+ |
a ton | QUANTITY | 0.8+ |
CTO | PERSON | 0.79+ |
multibillion dollar | QUANTITY | 0.79+ |
Data | EVENT | 0.79+ |
Bell-Heads | ORGANIZATION | 0.78+ |
every single day | QUANTITY | 0.76+ |
The Cube | ORGANIZATION | 0.75+ |
Cloud | COMMERCIAL_ITEM | 0.73+ |
Dozens of machinery algorithms | QUANTITY | 0.71+ |
Pi | COMMERCIAL_ITEM | 0.71+ |
petabytes | QUANTITY | 0.7+ |
raspberry | ORGANIZATION | 0.69+ |
Big Data | ORGANIZATION | 0.68+ |
Cloud | TITLE | 0.67+ |
dual-core | QUANTITY | 0.65+ |
Sastry | ORGANIZATION | 0.62+ |
Net | ORGANIZATION | 0.61+ |
Daniel Raskin, Kinetica | Big Data SV 2018
>> Narrator: Live, from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners (mellow electronic music) >> Welcome back to theCUBE, on day two of our coverage of our event, Big Data SV. I'm Lisa Martin, my co-host is Peter Burris. We are the down the street from the Strata Data Conference, we've had a great day yesterday, and great morning already, really learning and peeling back the layers of big data, challenges, opportunities, next generation, we're welcoming back to theCUBE an alumni, the CMO of Kinetica, Dan Raskin. Hey Dan, welcome back to theCUBE. >> Thank you, thank you for having me. >> So, I'm a messaging girl, look at your website, the insight engine for the extreme data economy. Tell us about the extreme data economy, and what is that, what does it mean for your customers? >> Yeah, so it's a great question, and, from our perspective, we sit, we're here at Strata, and you see all the different vendors kind of talking about what's going on, and there's a little bit of word spaghetti out there that makes it really hard for customers to think about how big data is affecting them today, right? And so, what we're actually looking at is the idea of, the world's changed. That, big data from five years ago, doesn't necessarily address all the use cases today. If you think about what customers are going through, you have more users, devices, and things coming on, there's more data coming back than ever before, and it's not just about creating the data driven business, and building these massive data lakes that turn into data swamps, it's really about how do you create the data-powered business. So when we're using that term, we're really trying to call out that the world's changed, that, in order for businesses to compete in this new world, they have to think about to take data and create CoreIP that differentiates, how do I use it to affect the omnichannel, how do I use it to deal with new things in the realm of banking and Fintech, how do I use it to protect myself against disruption in telco, and so, the extreme data economy is really this idea that you have business in motion, more things coming online ever before, how do I create a data strategy, where data is infused in my business, and creates CoreIP that helps me maintain category leadership or grow. >> So as you think about that challenge, there's a number of technologies that come into play. Not least of which is the industry, while it's always to a degree been driven by what hardware can do, that's moderated a bit over time, but today, in many respects, a lot of what is possible is made possible, by what hardware can do, and what hardware's going to be able to do. We've been using similar AI algorithms for a long time. But we didn't have the power to use them! We had access to data, but we didn't have the power to acquire and bring it in. So how is the relationship between your software, and your platform, and some of the new hardware that's becoming available, starting to play out in a way of creating value for customers? >> Right, so, if you think about this in terms of this extreme data concept, and you think about it in terms of a couple of things, one, streaming data, just massive amounts of streaming data coming in. Billions of rows that people want to take and translate into value. >> And that data coming from-- >> It's coming from users, devices, things, interacting with all the different assets, more edge devices that are coming online, and the Wild West essentially. You look at the world of IoT and it's absolutely insane, with the number of protocols, and device data that's coming back to a company, and then you think about how do you actually translate this into real-time insight. Not near real-time, where it's taking seconds, but true millisecond response times where you can infuse this into your business, and one of our whole premises about Kinetica is the idea of this massive parallel compute. So the idea of not using CPUs anymore, to actually drive the powering behind your intelligence, but leveraging GPUs, and if you think about this, a CPU has 64 cores, 64 parallel things that you can do at a time, a GPU can have up to 6,000 cores, 6,000 parallel things, so it's kind of like lizard brain verse modern brain. How do you actually create this next generation brain that has all these neural networks, for processing the data, in a way that you couldn't. And then on top of that, you're using not just the technology of GPUs, you're trying to operationalize it. So how do you actually bring the data scientist, the BI folks, the business folks all together to actually create a unified operational process, and the underlying piece is the Kinetica engine and the GPU used to do this, but the power is really in the use cases of what you can do with it, and how you actually affect different industries. >> So can you elaborate a little bit more on the use cases, in this kind of game changing environment? >> Yeah, so there's a couple of common use cases that we're seeing, one that affects every enterprise is the idea of breaking down silos of business units, and creating the customer 360 view. How do I actually take all these disparate data feeds, bring them into an engine where I can visualize concepts about my customer and the environment that they're living in, and provide more insight? So if you think about things like Whole Foods and Amazon merging together, you now have this power of, how do I actually bridge the digital and physical world to create a better omnichannel experience for the user, how do I think about things in terms of what preferences they have, personalization, how to actually pair that with sensor data to affect how they actually navigate in a Whole Foods store more efficiently, and that's affecting every industry, you could take that to banking as well and think about the banking omminchannel, and ATMs, and the digital bank, and all these Fintech upstarts that are working to disrupt them. A great example for us is the United States Postal Service, where we're actually looking at all the data, the environmental data, around the US Postal Service, we're able to visualize it in real-time, we're able to affect the logistics of how they actually navigate through their routes, we're able to look things like postal workers separating out of their zones, and potentially kicking off alerts around that, so effectively making the business more efficient. But, we've moved into this world where we always used to talk about brick and mortar going to cloud, we're now in this world where the true value is how you bridge the digital and physical world, and create more transformative experiences, and that's what we want to do with data. So it could be logistics, it could be omnichannel, it could be security, you name it. It affects every single industry that we're talking about. >> So I got two questions, what is Kinetica's contribution to that, and then, very importantly, as a CMO, how are you thinking about making sure that the value that people are creating, or can create with Kinetica, gets more broadly diffused into an ecosystem. >> Yeah, so the power that we're bringing is the idea of how to operationalize this in a way where again, you're using your data to create value, so, having a single engine where you're collecting all of this data, massive volumes of data, terabytes upon terabytes of data, enabling it where you can query the data, with millisecond response times, and visualize it, with millisecond response times, run machine learning algorithms against it to augment it, you still have that human ability to look at massive sets of data, and do ad hoc discovery, but can run machining learning algorithms against that and complement it with machine learning. And then the operational piece of bringing the data scientists into the same platform that the business is using, so you don't have data recency issues, is a really powerful mix. The other piece I would just add is the whole piece around data discovery, you can't really call it big data if, in order to analyze the data, you have to downsize and downsample to look at a subset of data. It's all about looking at the entire set. So that's where we really bring value. >> So, to summarize very quickly, you are providing a platform that can run very, very fast, in a parallel system, and memories in these parallel systems, so that large amounts of data can be acted upon. >> That's right. >> Now, so, the next question is, there's not going to be a billion people that are going to use your tool to do things, how are you going to work with an ecosystem and partners to get the value that you're able to create with this data, out into the engine enterprise. >> It's a great question, and probably the biggest challenge that I have, which is, how do you get above the word spaghetti, and just get into education around this. And so I think the key is getting into examples, of how it's affecting the industry. So don't talk about the technology, and streaming from Kafka into a GPU-powered engine, talk about the impact to the business in terms of what it brings in terms of the omnichannel. You look at something like Japan in the 2020 Olympics, and you think about that in terms of telco, and how are the mobile providers going to be able to take all the data of what people are doing, and to related that to ad-tech, to relate that to customer insight, to relate that to new business models of how they could sell the data, that's the world of education we have to focus on, is talk about the transformative value it brings from the customer perspective, the outside-in as opposed to the inside-out. >> On that educational perspective, as a CMO, I'm sure you meet with a lot of customers, do you find that you might be in this role of trying to help bridge the gaps between different roles in an organization, where there's data silos, and there's probably still some territorial culture going on? What are you finding in terms of Kinetica's ability to really help educate and maybe bring more stakeholders, not just to the table, but kind of build a foundation of collaboration? >> Yeah, it's a really interesting question because I think it means, not just for Kinetica, but all vendors in the space, have to get out of their comfort zone, and just stop talking speeds and feeds and scale, and in fact, when we were looking at how to tell our story, we did an analysis of where most companies were talking, and they were focusing a lot more on the technical aspirations that developers sell, which is important, you still need to court the developer, you have community products that they can download, and kick the tires with, but we need to extend our dialogue, get out of our customer comfort zone, and start talking more to CIOs, CTOs, CDOs, and that's just reaching out to different avenues of communication, different ways of engaging. And so, I think that's kind of a core piece that I'm taking away from Strata, is we do a wonderful job of speaking to developers, we all need to get out of our comfort zone and talk to a broader set of folks, so business folks. >> Right, 'cause that opens up so many new potential products, new revenue streams, on the marketing side being able to really target your customer base audience, with relevant, timely offers, to be able to be more connected. >> Yeah, the worst scenario is talking to an enterprise around the wonders of a technology that they're super excited about, but they don't know the use case that they're trying to solve, start with the use case they're trying to solve, start with thinking about how this could affect their position in the market, and work on that, in partnership. We have to do that in collaboration with the customers. We can't just do that alone, it's about building a partnership and learning together around how you use data in a different way. >> So as you imagine, the investments that Kinetica is going to make over the next few years, with partners, with customers, what do you hope Kinetica will be in 2020? >> So, we want it to be that transformative engine for enterprises, we think we are delivering something that's quite unique in the world, and, you want to see this on a global basis, affecting our customer's value. I almost want to take us out of the story, and if I'm successful, you're going to hear wonderful enterprise companies across telco, banking, and other areas just telling their story, and we happen to be the engine behind it. >> So you're an ingredient in their success. >> Yes, a core ingredient in their success. >> So if we think about over the course of the next technology, set of technology waves, are they any particular applications that you think you're going to be stronger in? So I'll give you an example, do you envision that Kinetica can have a major play in how automation happens inside infrastructure, or how developers start seeing patterns in data, imagine how those assets get created. Where are some of the kind of practical, but not really, or rarely talked about applications that you might find yourselves becoming more of an ingredient because they themselves become ingredients to some of these other big use cases? >> There are a lot of commonalities that we're starting to see, and the interesting piece is the architecture that you implement tends to be the same, but the context of how you talk about it, and the impact it has tends to be different, so, I already mentioned the customer 360 view? First and foremost, break down silos across your organization, figure out how do you get your data into one place where you can run queries against it, you can visualize it, you can do machine learning analysis, that's a foundational element, and, I have a company in Asia called Lippo that is doing that in their space, where all of the sudden they're starting to glean things they didn't know about their customer before to create, doing that ad hoc discovery, so that's one area. The other piece is this use case of how do you actually operationalize data scientists, and machine learning, into your core business? So, that's another area that we focus on. There are simple entry points, things like Tableau Acceleration, where you put us underneath the existing BI infrastructure, and all of the sudden, you're a hundred times faster, and now your business folks can sit at the table, and make real-time business decisions, where in the past, if they clicked on certain things, they'd have to wait to get those results. Geospatial visualization's a no-brainer, the idea of taking environmental data, pairing it with your customer data, for example, and now learning about interactions. And I'd say the other piece is more innovation driven, where we would love sit down with different innovation groups in different verticals and talk with them about, how are you looking to monetize your data in the future, what are the new business models, how does things like voice interaction affect your data strategy, what are the different ways you want to engage with your data, so there's a lot of different realms we can go to. >> One of the things you said as we wrap up here, that I couldn't agree with more, is, the best value articulation I think a brand can have, period, is through the voice of their customer. And being able to be, and I think that's one of the things that Paul said yesterday is, defining Kinetica's success based on the success of your customers across industry, and I think really doesn't get more objective than a customer who has, not just from a developer perspective, maybe improved productivity, or workforce productivity, but actually moved the business forward, to a point where you're maybe bridging the gaps between the digital and physical, and actually enabling that business to be more profitable, open up new revenue streams because this foundation of collaboration has been established. >> I think that's a great way to think about it-- >> Which is good, 'cause he's your CEO. >> (laughs) Yes, that sustains my job. But the other piece is, I almost get embarrassed talking about Kinetica, I don't want to be the car salesman, or the vacuum salesman, that sprinkles dirt on the floor and then vacuums it up, I'd rather us kind of fade to the behind the scenes power where our customers are out there telling wonderful stories that have an impact on how people live in this world. To me, that's the best marketing you can do, is real stories, real value. >> Couldn't agree more. Well Dan, thanks so much for stopping by, sharing what things that Kinetica is doing, some of the things you're hearing, and how you're working to really build this foundation of collaboration and enablement within your customers across industries. We look forward to hearing the kind of cool stuff that happens with Kinetica, throughout the rest of the year, and again, thanks for stopping by and sharing your insights. >> Thank you for having me. >> I want to thank you for watching theCUBE, I'm Lisa Martin with my co-host Peter Burris, we are at Big Data SV, our second day of coverage, at a cool place called the Forager Tasting Room, in downtown San Jose, stop by, check us out, and have a chance to talk with some of our amazing analysts on all things big data. Stick around though, we'll be right back with our next guest after a short break. (mellow electronic music)
SUMMARY :
Brought to you by SiliconANGLE Media We are the down the street from the Strata Data Conference, and what is that, what does it mean for your customers? and it's not just about creating the data driven business, So how is the relationship between your software, if you think about this in terms of this is really in the use cases of what you can do with it, and the digital bank, and all these Fintech upstarts making sure that the value that people are creating, is the idea of how to operationalize this in a way you are providing a platform that are going to use your tool to do things, and how are the mobile providers going to be able and kick the tires with, but we need to extend our dialogue, on the marketing side being able to really target We have to do that in collaboration with the customers. the engine behind it. that you think you're going to be stronger in? and the impact it has tends to be different, so, One of the things you said as we wrap up here, To me, that's the best marketing you can do, some of the things you're hearing, and have a chance to talk with some of our amazing analysts
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Peter Burris | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dan Raskin | PERSON | 0.99+ |
Whole Foods | ORGANIZATION | 0.99+ |
Daniel Raskin | PERSON | 0.99+ |
64 cores | QUANTITY | 0.99+ |
Asia | LOCATION | 0.99+ |
Dan | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
San Jose | LOCATION | 0.99+ |
two questions | QUANTITY | 0.99+ |
Kinetica | ORGANIZATION | 0.99+ |
Lippo | ORGANIZATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
second day | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
6,000 parallel | QUANTITY | 0.99+ |
64 parallel | QUANTITY | 0.99+ |
2020 Olympics | EVENT | 0.99+ |
Strata Data Conference | EVENT | 0.99+ |
telco | ORGANIZATION | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
single engine | QUANTITY | 0.97+ |
First | QUANTITY | 0.97+ |
Wild West | LOCATION | 0.97+ |
today | DATE | 0.97+ |
five years ago | DATE | 0.96+ |
Big Data SV | ORGANIZATION | 0.96+ |
one area | QUANTITY | 0.95+ |
Strata | ORGANIZATION | 0.95+ |
United States Postal Service | ORGANIZATION | 0.94+ |
day two | QUANTITY | 0.93+ |
Narrator: Live | TITLE | 0.93+ |
One | QUANTITY | 0.93+ |
one place | QUANTITY | 0.9+ |
Fintech | ORGANIZATION | 0.88+ |
up to 6,000 cores | QUANTITY | 0.88+ |
years | DATE | 0.88+ |
US Postal Service | ORGANIZATION | 0.88+ |
Billions of rows | QUANTITY | 0.87+ |
terabytes | QUANTITY | 0.85+ |
Japan | LOCATION | 0.82+ |
hundred times | QUANTITY | 0.82+ |
terabytes of data | QUANTITY | 0.81+ |
Strata | TITLE | 0.8+ |
Tableau Acceleration | TITLE | 0.78+ |
single industry | QUANTITY | 0.78+ |
CoreIP | TITLE | 0.76+ |
360 view | QUANTITY | 0.75+ |
Silicon Valley | LOCATION | 0.73+ |
billion people | QUANTITY | 0.73+ |
2018 | DATE | 0.73+ |
Data SV | EVENT | 0.72+ |
Kinetica | COMMERCIAL_ITEM | 0.72+ |
Forager Tasting Room | ORGANIZATION | 0.68+ |
Big | EVENT | 0.67+ |
millisecond | QUANTITY | 0.66+ |
Kafka | PERSON | 0.6+ |
Big Data | ORGANIZATION | 0.59+ |
Data SV | ORGANIZATION | 0.58+ |
big data | ORGANIZATION | 0.56+ |
next | DATE | 0.55+ |
lot | QUANTITY | 0.54+ |
Big | ORGANIZATION | 0.47+ |
Maribel Lopez, Lopez Research | Big Data SV 2018
>> Narrator: Live, from San Jose. It's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconAngle Media, and its ecosystem partners. >> Welcome come back to theCUBE, we are live in San Jose, at our event, Big Data SV. I'm Lisa Martin. And we are down the street from the Strata Data Conference. We've had a great day so far, talking with a lot of folks from different companies that are all involved in the big data unraveling process. I'm excited to welcome back to theCUBE one of our extinguished alumni, Maribel Lopez; the founder and principal analyst at Lopez research. Welcome back to theCUBE. >> Thank you. I'm excited to be here. >> Yeah, so you've been, a startup conference started a couple days ago. What are some the trends and things that you're hearing that are really kind of top of mind for not just the customers that are attending, the companies that are creating or are trying to create solutions around this big data challenge and opportunity? >> Yeah absolutely, I mean I think we talked a lot about data in the years past. How do you gather the data? How do you store the data? How you might want to process the data? This year seems to be all about how do I make something interesting happen with the data? How do I make an intelligent inside? How do I cure prostate cancer? How do I make sure I can classify images? It's a really different show, and we've also changed some of the terminology a lot more in machine learning now, and artificial intelligence, and frankly a lot of discussion around ethics. So it's been very interesting. >> Data ethics you mean? >> Data ethics; how do we do privacy? How do we maintain the right level of data so that we don't have bias in our data? How do we get Diversity Inclusion going? Lots really interesting powerful human topics, not just about the data. >> I love that the human topics especially where you know AI and ML come into play. You talked, data diversity. Or bias that we were just at that women and data science conference a couple of days ago talking to a lot of female leaders in in data science, computer science, both in academia as well as in industry. And one of the interesting topics about the gender disparity, is the fact that that is limiting the analyses on data in terms of, there may be a few perspectives looking on it. So there's an inherent bias there. So that's one issue, and I'd like to get your thoughts on that. Another is with that thought, lack of thought diversity, I guess I would say going into analyzing the data, companies might be potentially limiting themselves on the types of products that they can create, how to monetize the data and actually drive new revenue streams. On the kind of thought diversity will start there. What are some of the things that you're hearing, and what are some of your recommendations for your clients on how to get some of that bias out of data analysis? >> Yes it's interesting. One is trying to find multiple sources of data. So there's data that you have and that you own. But there is a wide range of openly available data now. There's some challenges around making sure that that data is clean before you integrated with your data. But basically, diversifying your data sources with third party data is one big thing that we're talking about. In previous analytical generations, I think we talked a lot about how to have a hypothesis, and you were trying to prove a hypothesis. And now I think we're trying to be a little more open and looser, and not really lead the data where per se, but try to find the right patterns and correlations in the data. And then just awareness in general. Like we don't believe we're biased. But if we have data that's biased who gets put into the system. So we have to really be thoughtful about what we put into the system. So I think that those three things combined have really changed the way people are looking at it. And there's a lot of awareness now around that. Because we assume at some point, the machines might be making certain decisions for us. And we want to make sure that they have the best information to do that. And that they don't limit our opportunities as a society. >> Where are companies in terms of the clients that you see, culturally in terms of embracing the openness? 'Cause you're right! From a scientific scientific method perspective. People go into, I'm going to hypothesize this because I think I'm going to find this. And maybe wanting the data to say this. Where are companies, we'll say enterprises, in becoming culturally more open to not leading the data somewhere and bringing up bias? >> Well, there are two interesting things here, right? I think there are some people that have gone down the data route for a while now, sort of the industry leading companies. They're in this mindset now trying to make sure they don't leave the data, they don't create biases in the data. They have ways to explain how the data and the analysis of the learning came about, not just for regulation, but so that they can make sure they ethically done the right thing. But then I think there's the other 95 percent of companies that they're not even there yet. They don't know that this is a problem yet. So they're still dealing with the "I've got a pool in the data." "I've got to do something with it." They don't even know what they want to do with it let alone if it's biased or not. So we're not quite at the leading the witness point there with a lot of organizations. >> But that's something that you expect to see maybe down the road. >> I'm hoping we'll get ahead of it. I'm really hoping that we'll get ahead of it. >> It's a good positive outlook on it, yeah? >> I think that, I think because the real analysis of the data problem in a big machine learning, deep learning way is so new, and the people are actually out seeking guidance, that there is an opportunity to get ahead of it. The second thing that's happening is, people don't have data scientists, right? So they don't necessarily have the people that can code this. So what they're doing now, is they're depending on the vendor landscape to provide them with an entry level set of tools. So if you're Microsoft, if you're Google, if you're Amazon, you're trying very hard to make sure that you're giving tools that have the right ethics in them, and that can help kickstart people's Machine Learning efforts. So I think that's going to be a real win for us. And we talked a lot today at the Strata conference about how, oh you don't have enough images, you can't do that. Or you don't have enough data, you can't do that. Or you don't have enough data scientists. And some of what came back is that, some of the best and the brightest have coded some things that you can start to use to kickstart that will get you to a better place than you ever could have started with yourself. So that was pretty exciting, you know. Transfer learning as an example of taking you know, image node from Google and some algorithms, and using those to take your images and try to figure out if somebody has Alzheimer's or not. Encode things Alzheimer's or not characteristic. So, very cool stuff, very exciting and nice to see that we've got some minds working on this for us. >> Yeah, definitely. Where you're meeting with clients that don't have a data scientist, or chief analytics officer? Sounds like a lot of the technologies need to or some have built in sort of enablement for a difference data citizen within a company. If you talking to clients that don't have a data scientist or data science team, who are your constituents there? Where are companies that don't maybe have that skill gap? Who do they go to in their organization to start evaluating the data that they have to get to know what and start to understand what their potential is? >> Yeah, there's a couple of places people go. They go to their business decision analytics people. So the people that were working with their BI dashboards, for example. The second place they go is to the cloud computing guys, cuz we're hearing a lot about cloud computing and maybe I can buy some of the stuff from the cloud. I'm just going to roll up and get all my machine learning in the cloud, right? So we're not there yet. So the biggest thing that I talk to people about right now is, what are the realities around Machine Learning and AI? We've made tremendous progress but you know you read the newspaper, and something is going to get rid of your job, and AI's going to take over the world, and we're kind of far from that reality. First of all it's very dystopian and negative. But even if it weren't that, you know what you can do today, is not that. So there's a lot of stages in between. So the first thing is just trying to get people comfortable with. No you can't just buy one product, and throw in some data, and you've got everything you need. >> Right. >> We're not there yet. But we're getting closer. You can add some components, you can get some new information, you could do some new correlations. So just getting a reality and grounding of where we are, and that we have a lot of opportunity, and that it's moving very fast. that's the other thing. >> Right. >> IT leaders are used to all evaluated once a year, evaluated once every couple of years. These things are moving in monthly increments. Like really huge changes in product categories. So you kind of have to keep on top of it to make sure you know what's available to you. >> Right. And if they don't they miss out on not only the ability to monetize data streams, but essentially going out of business. Because somebody will come in may be more nimble and agile, and be able to do it faster. >> Yeah. And we already saw those with the digital native companies that started born in the cloud companies, we used to call them. Well, now, everybody can be using the cloud. So the question then is like what's the next wave of that? The next wave of that is around understanding how to use your data, understanding how to get third-party data, and being able to rapidly make decisions and change models based on that. >> One of the things that's interesting about big data is you know it was a big buzzword, and it seems to be becoming less of a buzzword now. Gartner even was saying I think the number was 85 percent of big data projects and I think that's more in tested environments fail. And I often say, "Failure in a lot of cases is not a bad effort." Because it spawns genesis of new products, new ideas, et cetera. But when you're talking with clients who go, alright, we've embraced Hadoop, we've got this big data lake, now it's turning really swampy. We don't know-- >> We've got lakes, we've got oceans, we've got ponds. Yeah. >> Right. What's the conversation there where you're helping a customer clean that swamp up, get broader visibility across their datasets and enable different lines of business. Not just you know, the BI folks or the cloud folks or IT. But marketing, logistics, sales. What's that conversation like to clean up the swamp and do more enablement for visibility? >> I think one of the things that we got really hung up on was, you know, creating a data ocean, right? We're going to bring everything all in one place, it's going to be this one massive data source. >> It sounded great. >> It's going to be awesome. And this is not the reality of the world, right? So I think the first thing in the cleaning up that we have to do, is being able to figure out what's the source of truth for any given dataset that somebody needs. So you see 15 salespeople walk in and they all have different versions of the data that shouldn't happen. >> Right. >> So we need to get to the point where they know where the source of truth is for that data. The second is sort of governance around the data. We spent a lot of time dumping the data but not a lot of time in terms of getting governance around who can access it, what they can do with it, for how long they could have access to it. Is it just internal? Is it internal and external? So I think that's the second thing around like harassing and haranguing the swamps, and the lakes and the ponds, right? And then assuming that you do that, I think the other thing is, You know, if you have a hammer everything looks like a nail. Well, in reality you know when you construct things you have nails, you have screws, you have bolts, right? And picking the right tool for the job is something that the IT leadership has to work with. And the only way that they get that right is to work very closely with the different lines of business so they can understand the problem. Because the business leader knows the problem, they don't know the solution. If you put them together which we've talked about forever, frankly. But now I think we're seeing more imperatives for those two to work closely together. And sometimes it's even driven by security, just to make sure that the data isn't leaking into other places or that it's secure and that they've met regulatory compliance. So we're in a much better space than we were two, three, five years ago cuz we're thinking about the real problems now. Not just how do you collect it, and how do you store it. But how do we actually make it an actionable manageable set of solutions. >> Exactly, and make it work for the business. Well Maribel, I wish we had more time, but thank you so much for stopping by theCUBE, sharing the insights that you've seen. Not just at a conference, but also with your clients. >> Thank you. >> We want to thank you for watching theCUBE. Again, I'm Lisa Martin, live from Big Data SV, in Downtown San Jose. Get involved in the conversation #BigDataSV. Come see us at the Forager Eatery & Tasting Room, and I'll be right back with our next guest. (upbeat music)
SUMMARY :
Brought to you by SiliconAngle Media, that are all involved in the big data unraveling process. I'm excited to be here. just the customers that are attending, a lot about data in the years past. so that we don't have bias in our data? and I'd like to get your thoughts on that. and looser, and not really lead the data where per se, that you see, culturally in terms of embracing the openness? and the analysis of the learning came about, But that's something that you expect to see I'm really hoping that we'll get ahead of it. and the brightest have coded some things that they have to get to know and maybe I can buy some of the stuff from the cloud. and that we have a lot of opportunity, to make sure you know and be able to do it faster. that started born in the cloud companies, and it seems to be becoming less of a buzzword now. we've got oceans, we've got ponds. What's that conversation like to clean up the swamp that we got really hung up on was, you know, So you see 15 salespeople walk in and they all have is something that the IT leadership has to work with. sharing the insights that you've seen. and I'll be right back with our next guest.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
Maribel | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Maribel Lopez | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Microsoft | ORGANIZATION | 0.99+ |
15 salespeople | QUANTITY | 0.99+ |
SiliconAngle Media | ORGANIZATION | 0.99+ |
85 percent | QUANTITY | 0.99+ |
95 percent | QUANTITY | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
one issue | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
both | QUANTITY | 0.98+ |
Strata Data Conference | EVENT | 0.98+ |
Big Data SV | ORGANIZATION | 0.98+ |
second thing | QUANTITY | 0.98+ |
one product | QUANTITY | 0.98+ |
first thing | QUANTITY | 0.98+ |
three things | QUANTITY | 0.97+ |
once a year | QUANTITY | 0.97+ |
second | QUANTITY | 0.96+ |
This year | DATE | 0.96+ |
One | QUANTITY | 0.96+ |
First | QUANTITY | 0.96+ |
theCUBE | ORGANIZATION | 0.96+ |
Downtown San Jose | LOCATION | 0.96+ |
Strata | EVENT | 0.94+ |
two interesting things | QUANTITY | 0.94+ |
five years ago | DATE | 0.94+ |
Big Data | ORGANIZATION | 0.9+ |
couple days ago | DATE | 0.87+ |
couple of days ago | DATE | 0.85+ |
once | QUANTITY | 0.78+ |
#BigDataSV | ORGANIZATION | 0.75+ |
one place | QUANTITY | 0.75+ |
second place | QUANTITY | 0.75+ |
every couple of years | QUANTITY | 0.75+ |
Forager | LOCATION | 0.7+ |
Data | ORGANIZATION | 0.69+ |
Narrator: Live | TITLE | 0.69+ |
wave | EVENT | 0.68+ |
years past | DATE | 0.66+ |
three | QUANTITY | 0.66+ |
Alzheimer | OTHER | 0.66+ |
Big | EVENT | 0.65+ |
Hadoop | TITLE | 0.64+ |
Big Data SV | EVENT | 0.59+ |
Eatery & Tasting Room | ORGANIZATION | 0.57+ |
Lopez Research | ORGANIZATION | 0.55+ |
SV 2018 | EVENT | 0.54+ |
thing | QUANTITY | 0.53+ |
Lopez | ORGANIZATION | 0.49+ |
Kunal Agarwal, Unravel Data | Big Data SV 2018
>> Announcer: Live from San Jose, it's theCube! Presenting Big Data: Silicon Valley Brought to you by SiliconANGLE Media and its ecosystem partners. (techno music) >> Welcome back to theCube. We are live on our first day of coverage at our event BigDataSV. I am Lisa Martin with my co-host George Gilbert. We are at this really cool venue in downtown San Jose. We invite you to come by today, tonight for our cocktail party. It's called Forager Tasting Room and Eatery. Tasty stuff, really, really good. We are down the street from the Strata Data Conference, and we're excited to welcome to theCube a first-time guest, Kunal Agarwal, the CEO of Unravel Data. Kunal, welcome to theCube. >> Thank you so much for having me. >> So, I'm a marketing girl. I love the name Unravel Data. (Kunal laughs) >> Thank you. >> Two year old company. Tell us a bit about what you guys do and why that name... What's the implication there with respect to big data? >> Yeah, we are a application performance management company. And big data applications are just very complex. And the name Unravel is all about unraveling the mysteries of big data and understanding why things are not performing well and not really needing a PhD to do so. We're simplifying application performance management for the big data stack. >> Lisa: Excellent. >> So, so, um, you know, one of the things that a lot of people are talking about with Hadoop, originally it was this cauldron of innovation. Because we had the "let a thousand flowers bloom" in terms of all the Apache projects. But then once we tried to get it into operation, we discovered there's a... >> Kunal: There's a lot of problems. (Kunal laughs) >> There's an overhead, there's a downside to it. >> Maybe tell us, tell us why you both need to know, you need to know how people have done this many, many times. >> Yeah. >> How you need to learn from experience and then how you can apply that even in an environment where someone hasn't been doing it for that long. >> Right. So, if I back a little bit. Big data is powerful, right? It's giving companies an advantage that they never had, and data's an asset to all of these different companies. Now they're running everything from BI, machine learning, artificial intelligence, IOT, streaming applications on top of it for various reasons. Maybe it is to create a new product to understand the customers better, etc., But as you rightly pointed out, when you start to implement all of these different applications and jobs, it's very, very hard. It's because big data is very complex. With that great power comes a lot of complexity, and what we started to see is a lot of companies, while they want to create these applications and provide that differentiation to their company, they just don't have enough expertise as well in house to go and write good applications, maintain these applications, and even manage the underlying infrastructure and cluster that all these applications are running on. So we took it upon ourselves where we thought, Hey, if we simplify application performance management and if we simplify ongoing management challenges, then these companies would run more big data applications, they would be able to expand their use cases, and not really be fearful of, Hey, we don't know how to go and solve these problems. Do we actually rely on our system that is so complex and new? And that's the gap the Unravel fills, which is we monitor and manage not only one componenent of the big data ecosystem, but like you pointed out, it's a, it's a full zoo of all of these systems. You have Hadoop, and you have Spark, and you have Kafka for data injection. You may have some NoSQL systems and newer MPP platforms as well. So the vision of Unravel is really to be that one place where you can come in and understand what's happening with your applications and your system overall and be able to resolve those problems in an automatic, simple way. >> So, all right, let's start at the concrete level of what a developer might get out of >> Kunal: Right. >> something that's wrapped in Unravel and then tell us what the administrator experiences. >> Kunal: Absolutely. So if you are a big data developer you've got in a business requirement that, Hey, go and make this application that understands our customers better, right? They may choose a tool of their liking, maybe Hive, maybe Spark, maybe Kafka for data injection. And what they'll do is they'll write an app first in dev, in their dev environment or the QA environment. And they'll say, Hey, maybe this application is failing, or maybe this application is not performing as fast as I want it to, or even worse that this application is starting to hog a lot of resources, which may slow down my other applications. Now to understand what's causing these kind of problems today developers really need a PhD to go and decipher them. They have to look at tons of law rogs, uh, raw logs metrics, configuration settings and then try to stitch the story up in their head, trying to figure out what is the effect, what is the cause? Maybe it's this problem, maybe it's some other problem. And then do trial and error to try, you know to solving that particular issue. Now what we've seen is big data developers come in variety of flavors. You have the hardcore developers who truly understand Spark and Hadoop and everything, but then 80% of the people submitting these applications are data scientist or business analysts, who may understand SQL, who may know Python, but don't necessarily know what distributed computing and parallel processing and all of these things really are, and where can inefficiencies and problems really lie. So we give them this one view, which will connect all of these different data sources and then tell them in plain English, this is the problem, this is why this problem happened, and this is how you can go and resolve it, thereby getting them unstuck and making it very simple for them to go in and get the performance that they're getting. >> So, these, these, um, they're the developers up front and you're giving them a whole new, sort of, toolchain or environment to solve the operational issues. >> Kunal: Right. >> So that the, if it's DevOps, its really dev is much more sufficient. >> Yes, yes, I mean, all companies want to run fast. They don't want to be slowed down. If you have a problem today, they'll file a ticket, it'll go to the operations team, you wait a couple of days to get some more information back. That just means your business has slowed down. If things are simple enough where the application developers themselves can resolve a lot of these issues, that'll get the business unstuck and get them moving on further. Now, to the other point which you were asking, which is what about the operations and the app support people? So, Unravel's a great tool for them too because that helps them see what's happening holistically in the cluster. How are other applications behaving with each other? It's usually a multitenant, multiapplication environment that these big data jobs are running on. So, is my apps slowing down George's apps? Am I stealing resources from your applications? More so, not just about an individual application issue itself. So Unravel will give you visibility into each app, as well as the overall cluster to help you understand cluster-wide problems. >> Love to get at, maybe peel apart your target audience a little bit. You talked about DevOps. But also the business analysts, data scientists, and we talk about big data. Data is, has such tremendous power to fuel a company and, you know, like you said use it to deliver and, create and deliver new products. Are you talking with multiple audiences within a company? Do you start at DevOps and they bring in their peers? Or do you actually start, maybe, at the Chief Data Officer level? What's that kind of entrance for Unravel? >> So the word I use to describe this is DataOps, instead of DevOps, right? So in the older world you had developers, and you had operations people. Over here you have a data team and operations people, and that data team can comprise of the developers, the data scientists, the business analysts, etc., as well. But you're right. Although we first target the operations role because they have to manage and monitor the system and make sure everything is running like a well-oiled machine, they are now spreading it out to be end-users, meaning the developers themselves saying, "Don't come to me for every problem. "Look at Unravel, try solve it here, "and if you cannot, then come to me." This is all, again, improving agility within the company, making sure that people have the necessary tools and insights to carry on with their day. >> Sounds like an enabler, >> Yeah, absolutely. >> That operations would push down to the DevOp, the developers themselves. >> And even the managers and the CDOs, for example, they want to see their ROI that they're getting from their big data investments. They want to see, they have put in these millions of dollars, have got an infrastructure and these services set up, but how are we actually moving the needle forward? Are there any applications that we're actually putting in business, and is that driving any business value? So we will be able to give them a very nice dashboard helping them understand what kind of throughput are you getting from your system, how many applications were you able to develop last week and onboard to your production environment? And what's the rate of innovation that's really happening inside your company on those big data ecosystems? >> It sort of brings up an interesting question on two prongs. One is the well-known, but inexact number about how many big data projects, >> Kunal: Yeah, yeah. >> I don't know whether they fail or didn't pay off. So there's going in and saying, "Hey, we can help you manage this "because it was too complicated." But then there's also the, all the folks who decided, "Well, we really don't want "to run it all on-prem. "We're not going to throw away everything we did there, "but we're going to also put a lot of new investment >> Kunal: Exactly, exactly. >> in the cloud. Now, Wikibon has a term for that, which true private cloud, which is when you have the operational processes that you use in the public cloud and you can apply them on-prem. >> Right. >> George: But there's not many products that help you do that. How can Unravel work...? >> Kunal: That's a very good questions, George. We're seeing the world move more and more to a cloud environment, or I should say an on-demand environment where you're not so bothered about the infrastructure and the services, but you want Spark as a dial tone. You want Kafka as a dial tone. You want a machine-learning platform as a dial tone. You want to come in there, you want to put in your data, and you want to just start running it. Unravel has been designed from the ground up to monitor and manage any of these environments. So, Unravel can solve problems for your applications running on-premise and similarly all the applications that are running on cloud. Now, on the cloud there are other levels of problems as well so, of course, you'd have applications that are slow, applications that are failing; we can solve those problems. But if you look at a cloud environment, a lot of these now provide you an autoscaling capability, meaning, Hey, if this app doesn't run in the amount of time that we were hoping it to run, let's add extra hardware and run this application. Well, if you just keep throwing machines at the problem, it's not going to solve your issue. Now, it doesn't decrease the time that it will take linearly with how many servers that you're actually throwing in there, so what we can help companies understand is what is the resource requirement of a particular application? How should we be intelligently allocating resources to make sure that you're able to meet your time SLAs, your constraints of, here I need to finish this with x number of minutes, but at the same time be intelligent about how much cost you're spending over there. Do you actually need 500 containers to go and run this app? Well, you may have needed 200. How do you know that? So, Unravel will also help you get efficient with your run, not just faster, but also can it be a good multitenant citizen, can it use limited resources to actually run this applications as well? >> So, Kunal, some of the things I'm hearing from a customer's standpoint that are potential positive business outcomes are internal: performance boost. >> Kunal: Yeah. >> It also sounds like, sort of... productivity improvements internally. >> And then also the opportunity to have the insight to deliver new products, but even I'm thinking of, you know, helping make a retailer, for example, be able to do more targeted marketing, so >> the business outcomes and the impact that Unravel can make really seem to have pretty strong internal and external benefits. >> Kunal: Yes. >> Is there a favorite customer story, (Kunal laughs) don't have to mention names, that you really think speaks to your capabilities? >> So, 100% Improving performance is a very big factor of what Unravel can do. Decreasing costs by improving productivity, by limiting the amount of resources that you're using, is a very, very big factor. Now, amongst all of these companies that we work with, one key factor is improving reliability, which means, Hey, it's fine that he can speed up this application, but sometimes I know the latency that I expect from an app, maybe it's a second, maybe it's a minute, depending on the type of application. But what businesses cannot tolerate is this app taking five x amount more time today. If it's going to finish in a minute, tell me it'll finish in a minute and make sure it finishes in a minute. And this is a big use case for all of the big data vendors because a lot of the customers are moving from Teradata, or from Vertica, or from other relation databases, on to Hortonworks or Cloudera or Amazon EMR. Why? Because it's one tenth the amount of cost for running these workloads. But, all the customers get frustrated and say, "I don't mind paying 10 x more money, "but because over there it used to work. "Over here, there are just so many complications, "and I don't have reliability with these applications." So that's a big, big factor of, you know, how we actually help these customers get value out of the Unravel product. >> Okay, so, um... A question I'm, sort of... why aren't there so many other Unravels? >> Kunal: Yeah. (Kunal laughs) >> From what I understood from past conversations. >> Kunal: Yeah. >> You can only really build the models that are at the heart of your capabilities based on tons and tons of telemetry >> Kunal: Yeah. >> that cloud providers or, or, sort of, internet scale service providers have accumulated in that, because they all have sort of a well-known set of configurations and well-known kind of typology. In other words, there're not a million degrees of freedom on any particular side that you can, you have a well-scoped problem, and you have tons of data. So it's easier to build the models. So who, who else could do this? >> Yeah, so the difference between Unravel and other monitoring products is Unravel is not a monitoring product. It's an intelligent performance management suite. What that means is we don't just give you graphs and metrics and say, "Here are all the raw information, "you go figure it out." Instead, we have to take it a step further where we are actually giving people answers. In order to develop something like that, you need full stack information; that's number one. Meaning information from applications all the way down to infrastructure and everything in between. Why? Because problems can lie anywhere. And if you don't have that full stack info, you're blind-siding yourself, or limiting the scope of the problems that you can actually search for. Secondly is, like you were rightly pointing out, how do I create answers from all this raw data? So you have to think like how an expert with big data would think, which is if there is a problem what are the kinds of checks, balances, places that that person would look into, and how would that person establish that this is indeed the root cause of the problem today? And then, how would that person actually resolve this particular problem? So, we have a big team of scientists, researchers. In fact, my co-founder is a professor of computer science at Duke University who has been researching data-based optimization techniques for the last decade. We have about 80 plus publications in this area, Starfish being one of them. We have a bunch of other publications, which talk about how do you automate problem discovery, root cause analysis, as well as resolution, to get best performance out of these different databases? And you're right. A lot of work has gone on the research side, but a lot of work has gone in understanding the needs of the customers. So we worked with some of the biggest companies out there, which have some of the biggest big data clusters, to learn from them, what are some everyday, ongoing management challenges that you face, and then taking that problem to our datasets and figuring out, how can we automate problem discovery? How can we proactively spot a lot of these errors? I joke around and I tell people that we're big data for big data. Right? All these companies that we serve, they are gathering all of this data, and they're trying to find patterns, and they're trying to find, you know, some sort of an insight with their data. Our data is system generated data, performance data, application data, and we're doing the exact same thing, which is figuring out inefficiencies, problems, cause and effect of things, to be able to solve it in a more intelligent, smart way. >> Well, Kunal, thank you so much for stopping by theCube >> Kunal: Of course. >> And sharing how Unravel Data is helping to unravel the complexities of big data. (Kunal laughs) >> Thank you so much. Really appreciate it. >> Now you're a Cube almuni. (Kunal laughs) >> Absolutely. Thanks so much for having me. >> Kunal, thanks. >> Yeah, and we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at our own event BigData SV in downtown San Jose, California. Stick around. George and I will be right back with our next guest. (quiet crowd noise) (techno music)
SUMMARY :
Brought to you by SiliconANGLE Media We invite you to come by today, I love the name Unravel Data. Tell us a bit about what you guys do and not really needing a PhD to do so. So, so, um, you know, one of the things that Kunal: There's a lot of problems. there's a downside to it. tell us why you both need to know, and then how you can apply that even in an environment of the big data ecosystem, but like you pointed out, and then tell us what the administrator experiences. and this is how you can go and resolve it, and you're giving them a whole new, sort of, So that the, if it's DevOps, Now, to the other point which you were asking, to fuel a company and, you know, like you said So in the older world you had developers, DevOp, the developers themselves. and is that driving any business value? One is the well-known, but inexact number "Hey, we can help you manage this and you can apply them on-prem. that help you do that. and you want to just start running it. So, Kunal, some of the things I'm hearing It also sounds like, sort of... that Unravel can make really seem to have So that's a big, big factor of, you know, A question I'm, sort of... and you have tons of data. What that means is we don't just give you graphs to unravel the complexities of big data. Thank you so much. Now you're a Cube almuni. Thanks so much for having me. Yeah, and we want to thank you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Kunal Agarwal | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Kunal | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
100% | QUANTITY | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
Unravel Data | ORGANIZATION | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
500 containers | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Two year | QUANTITY | 0.99+ |
two prongs | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
tonight | DATE | 0.99+ |
200 | QUANTITY | 0.99+ |
first day | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
Spark | TITLE | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
each app | QUANTITY | 0.99+ |
Python | TITLE | 0.98+ |
a minute | QUANTITY | 0.98+ |
English | OTHER | 0.98+ |
one | QUANTITY | 0.98+ |
Duke University | ORGANIZATION | 0.98+ |
five | QUANTITY | 0.98+ |
Kafka | TITLE | 0.98+ |
Hadoop | TITLE | 0.98+ |
BigData SV | EVENT | 0.97+ |
first-time | QUANTITY | 0.97+ |
Strata Data Conference | EVENT | 0.97+ |
one key factor | QUANTITY | 0.96+ |
millions of dollars | QUANTITY | 0.95+ |
about 80 plus publications | QUANTITY | 0.95+ |
SQL | TITLE | 0.95+ |
DevOps | TITLE | 0.94+ |
first | QUANTITY | 0.94+ |
BigDataSV | EVENT | 0.94+ |
tons and tons | QUANTITY | 0.94+ |
both | QUANTITY | 0.94+ |
Unravel | ORGANIZATION | 0.93+ |
Secondly | QUANTITY | 0.91+ |
million degrees | QUANTITY | 0.91+ |
San Jose, California | LOCATION | 0.91+ |
Hive | TITLE | 0.91+ |
last decade | DATE | 0.91+ |
Unravel | TITLE | 0.9+ |
Guy Churchward, DataTorrent | Big Data SV 2018
>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data, Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big Data SV, continues, this is our first day. We are down the street from the Strata Data Conference. Come by, we're at this really cool venue, the Forager Tasting Room. We've got a cocktail party tonight. You're going to hear some insights there as well as tomorrow morning. I am Lisa Martin, joined by my co-host, George Gilbert, and we welcome back to theCUBE, for I think the 900 millionth time, the president and CEO of DataTorrent, Guy Churchward. Hey Guy, welcome back! >> Thank you, Lisa, I appreciate it. >> So you're one of our regular VIP's. Give us the update on DataTorrent. What's new, what's going on? >> We actually talked to you a couple of weeks ago. We did a big announcement which was around 3.10, so it's a new release that we have. In all small companies, and we're a small startup, in the big data and analytic space, there is a plethora of features that I can reel through. But it actually makes something a little bit more fundamental. So in the last year... In fact, I think we chatted with you maybe six months ago. We've been looking very carefully at how customers purchase and what they want and how they execute against technology, and it's very very different to what I expected when I came into the company about a year ago off the EMC role that I had. And so, although the features are there, there's a huge amount of underpinning around the experience that a customer would have around big data applications. I'm reminded of, I think it's Gartner that quoted that something like 80% of big data applications fail. And this is one of the things that we really wanted to look at. We have very large customers in production, and we did the analysis of what are we doing well with them, and why can't we do that en masse, and what are people really looking for? So that was really what the release was about. >> Let's elaborate on this a little bit. I want to drill into something where you said many projects, as we've all heard, have not succeeded. There's a huge amount of complexity. The terminology we use is, without tarring and feathering any one particular product, the open source community is kind of like, you're sort of harnessing a couple dozen animals and a zookeeper that works in triplicate... How does DataTorrent tackle that problem? >> Yeah, I mean, in fact I was desperately interested in writing a blog recently about using the word community after open source, because in some respects, there isn't a huge community around the open source movement. What we find is it's the du jour way in which we want to deliver technology, so I have a huge amount of developers that work on a thing called Apache Apex, which is a component in a solution, or in an architecture and in an outcome. And we love what we do, and we do the best we do, and it's better than anybody else's thing. But that's not an application, that's not an outcome. And what happens is, we kind of don't think about what else a customer has to put together, so then they have to go out to the zoo and pick loads of bits and pieces and then try to figure out how to stitch them all together in the best they can. And that takes an inordinately long time. And, in general, people who love this love tinkering with technologies, and their projects never get to production. And large enterprises are used to sitting down and saying, "I need a bulletproof application. "It has to be industrialized. "I need a full SLA on the back of it. "This thing has to have lights out technology. "And I need it quick." Because that was the other thing, as an aspect, is this market is moving so fast, and you look at things like digital economy or any other buzz term, but it really means that if you realize you need to do something, you're probably already too late. And therefore, you need it speedy, expedited. So the idea of being able to wait for 12 months, or two years for an application, also makes no sense. So the arch of this is basically deliver an outcome, don't try and change the way in which open source is currently developed, because they're in components, but embrace them. And so what we did is we sort of looked at it and said, "Well what do people really want to do?" And it's big data analytics, and I want to ingest a lot of information, I want to enrich it, I want to analyze it, and I want to take actions, and then I want to go park it. And so, we looked at it and said, "Okay, so the majority "of stuff we need is what we call a cache stack, "which is KAFKA, Apache Apex, Spark and Hadoop, "and then put complex compute on top." So you would have heard of terms like machine learning, and dimensional compute, so we have their modules. So we actually created an opinionated stack... Because otherwise you have a thousand to choose from and people get confused with choice. I equate it to going into a menu at a restaurant, there's two types of restaurants, you walk into one and you can turn pages and pages and pages and pages of stuff, and you think that's great, I got loads of choice, but the choice kind of confuses you. And also, there's only one chef at the back, and he can't cook everything well. So you know if he chooses the components and puts them together, you're probably not going to get the best meal. And then you go to restaurants that you know are really good, they generally give you one piece of paper and they say, "Here's your three entrees." And you know every single one of them. It's not a lot of choice, but at the end of the day, it's going to be a really good meal. >> So when you go into a customer... You're leading us to ask you the question which is, you're selling the prix fixe tasting menu, and you're putting all the ingredients together. What are some of those solutions and then, sort of, what happens to the platform underneath? >> Yeah, so what you don't want to do is to take these flexible, microdata services, which are open source projects, and hard glue them together to create an application that then has no flexibility. Because, again, one of the myths that I used to assume is applications would last us seven to 10 years. But what we're finding in this space is this movement towards consumerization of enterprise applications. In other words, I need an app and I need it tomorrow because I'm competitively disadvantaged, but it might be wrong, so I then need to adjust it really quick. It's this idea of continual developed, continual adjustment. But that flies in the face of all of this gluing and enterprise-ilities. And I want to base it on open source, and open source, by default, doesn't glue well together. And so what we did is we said okay, not only do you have to create an opinionated stack, and you do that because you want them all to scale into all industries, and they don't need a huge amount of choice, just pick best of breed. But you need to then put a sleeve around them so they all act as though they are a single application. And so we actually announced a thing calls Epoxy. It's a bit of a riff on gluing, but it's called DataTorrent Epoxy. So we have, it's like a microdata service bus, and you can then interchange the components. For instance, right now, Apache Apex is this string-based processing engine in that component. But if there's a better unit, we're quite happy to pull it out, chuck it away, and then put another one in. This isn't a ubiquitous snap-on toolset, because, again, the premise is use open source, get the innovation from there. It has to be bulletproof and enterprise-ility and move really fast. So those are the components I was working on. >> Guy, as CEO, I'm sure you speak with a lot of customers often. What are some of the buying patterns that you're seeing across industries, and what are some of the major business value that DataTorrent can help deliver to your customers? >> The buying patterns when we get involved, and I'm kind of breaking this down into a slightly different way, because we normally get involved when a project's in flight, one of the 80% that's failing, and in general, it's driven by a strategic business partner that has an agenda. And what you see is proprietary application vendors will say, "We can solve everything for you." So they put the tool in and realize it doesn't have the flexibility, it does have enterprise-ility, but it can't adjust fast. And then you get the other type who say, "Well we'll go to a distro or we'll go "to a general purpose practitioner, "and they'll build an application for us." And they'll take open source components, but they'll glue it together with proprietary mush, and then that doesn't then grow past. And then you get the other ones, which is, "Well if I actually am not guided by anybody, "I'll buy a bunch of developers, stick them in my company, "and I've got control on that." But they fiddle around a lot. So we arrive in and, in general, they're in this middle process of saying, "I'm at a competitive disadvantage, "I want to move forward and I want to move forward fast, "and we're working on one of those three channels." The types of outcomes, we just, and back to the expediency of this, we had a telco come to us recently, and it was just before the iPhone X launched, and they wanted to do AB testing on the launch on their platform. We got them up and running within three months. Subsequent from that launch, they then repurposed the platform and some of the components with some augmentation, and they've come out with three further applications. They've all gone into production. So the idea is then these fast cycles of microdata services being stitched together with the Epoxy resin type approach-- >> So faster time to value, lower TCO-- >> Exactly. >> Being able to get to meet their customers' needs faster-- >> Exactly, so it's outcome-based and time to value, and it's time to proof. Because this is, again, the thing that Gartner picked up on, is Hadoop's difficult, this market's complex and people kick the tires a lot. And I sort of joke with customers, "Hey if you want to "obsess about components rather than the outcome, "then your successor will probably come see us "once you're out and your group's failed." And I don't mean that in an obnoxious way. It's not just DataTorrent that solves this same thing, but this it the movement, right? Deal with open source, get enterprise-ilities, get us up and running within a quarter or two, and then let us have some use and agile repurposing. >> Following on that, just to understand going in with a solution to an economic buyer, but then having the platform be reusable, is it opinionated and focused on continuous processing applications, or does it also address both the continuous processing and batch processing? >> Yeah, it's a good answer. In general, and again Gatekeeper, you've got batch and you've got realtime and string, and so we deal with data in motion, which is string-based processing. A string-based processing engine can deal with batch as well, but a batch cannot deal with string. >> George: So you do both-- >> Yeah >> And the idea being that you can have one programming model for both. >> Exactly. >> It's just a window, batch is just a window. >> And the other thing is, a myth bust, is for the last maybe eight plus years, companies assume that the first thing you do in big data analytics is collect all the data, create a data lake, and so they go in there, they ingest the information, they put it into a data lake, and then they poke the data lake posthumously. But the data in the data lake is, by default, already old. So the latency of sticking it into a data lake and then sorting it, and then basically poking it, means that if anybody deals with the data that's in motion, you lose. Because I'm analyzing as it's happening and then you would be analyzing it after at rest, right? So now the architecture of choice is ingest the information, use high performance storage and compute, and then, in essence, ingest, normalize, enrich, analyze, and act on data in motion, in memory. And then when I've used it, then throw it off into a data lake because then I can basically do posthumous analytics and use that for enrichment later. >> You said something also interesting where the DataTorrent customers, the initial successful ones sort of tended to be larger organizations. Those are typically the ones with skillsets to, if anyone's going to be able to put pieces together, it's those guys. Have you not... Well, we always expected big data applications, or sort of adaptive applications, to go mainstream when they were either packaged apps to take all the analysis and embed it, or when you had end to end integrated products to make it simple. Where do you think, what's going to drive this mainstream? >> Yeah, it depends on how mainstream you want mainstream. It's kind of like saying how fast is a fast car. If you want a contractor that comes into IT to create a dashboard, go buy Tableau, and that's mainstream analytics, but it's not. It's mainstream dashboarding of data. The applications that we deal with, by default, the more complex data, they're going to be larger organizations. Don't misunderstand when I say, "We deal with these organizations." We don't have a professional services arm. We work very closely with people like HCL, and we do have a jumpstart team that helps people get there. But our job is teach someone, it's like a kid with a bike and the training wheels, our job is to teach them how to ride the bike, and kick the wheels off, and step away. Because what we don't want to do is to put a professional services drip feed into them and just keep sucking the money out. Our job is to get them there. Now, we've got one company who actually are going to go live next month, and it's a kid tracker, you know like a GPS one that you put on bags and with your kids, and it'll be realtime tracking for the school and also for the individuals. And they had absolutely zero Hadoop experience when we got involved with them. And so we've brought them up, we've helped them with the application, we've kicked the wheels off and now they're going to be sailing. I would say, in a year's time, they're going to be comfortable to just ignore us completely, and in the first year, there's still going to be some handholding and covering up a bruise as they fall off the bike every so often. But that's our job, it's IP, technology, all about outcomes and all about time to value. >> And from a differentiation standpoint, that ability to enable that self service and kick off the training wheels, is that one of the biggest differentiators that you find DataTorret has, versus the Tableau's and the other competitors on the market? >> I don't want to say there's no one doing what we're doing, because that will sound like we're doing something odd. But there's no one doing what we're doing. And it's almost like Tesla. Are they an electric car or are they a platform? They've spurred an industry on, and Uber did the same thing, and Lyft's done something and AirBNB has. And what we've noticed is customer's buying patterns are very specific now. Use open source, get up their enterprise-ilities, and have that level of agility. Nobody else is really doing that. The only people that will do that is your contract with someone like Hortonworks or a Cloudera, and actually pay them a lot of money to build the application for you. And our job is really saying, "No, instead of you paying "them on professional services, we'll give you the sleeve, "we'll make it a little bit more opinionated, "and we'll get you there really quickly, "and then we'll let you and set you free." And so that's one. We have a thing called the Application Factory. That's the snap on toolset where they can literally go to a GUI and say, "I'm in the financial market, "I want a fraud prevention application." And we literally then just self assemble the stack, they can pick it up, and then put their input and output in. And then, as we move forward, we'll have partners who are building the spoke applications in verticals, and they will put them up on our website, so the customers can come in and download them. Everything is subscription software. >> Fantastic, I wish we had more time, but thanks so much for finding some time today to come by theCUBE, tell us what's new, and we look forward to seeing you on the show again very soon. >> I appreciate it, thank you very much. >> We want to thank you for watching theCUBE. Again, Lisa Martin with my co-host George Gilbert, we're live at our event, Big Data SV, in downtown San Jose, down the street from the Strata Data Conference. Stick around, George and I will be back after a short break with our next guest. (light electronic jingle)
SUMMARY :
presenting Big Data, Silicon Valley, brought to you and we welcome back to theCUBE, So you're one of our regular VIP's. and we did the analysis of what are we doing well with them, I want to drill into something where you said many projects, So the idea of being able to wait for 12 months, So when you go into a customer... And so what we did is we said okay, not only do you have What are some of the buying patterns that you're seeing And then you get the other ones, which is, And I sort of joke with customers, "Hey if you want to and so we deal with data in motion, And the idea being that you can have one and then you would be analyzing it after at rest, right? or when you had end to end integrated products and now they're going to be sailing. and actually pay them a lot of money to build and we look forward to seeing you We want to thank you for watching theCUBE.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
two years | QUANTITY | 0.99+ |
George | PERSON | 0.99+ |
12 months | QUANTITY | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
AirBNB | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
two types | QUANTITY | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
iPhone X | COMMERCIAL_ITEM | 0.99+ |
DataTorrent | ORGANIZATION | 0.99+ |
seven | QUANTITY | 0.99+ |
Guy Churchward | PERSON | 0.99+ |
tomorrow morning | DATE | 0.99+ |
Lyft | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
six months ago | DATE | 0.99+ |
next month | DATE | 0.99+ |
three months | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
EMC | ORGANIZATION | 0.98+ |
first day | QUANTITY | 0.98+ |
tonight | DATE | 0.98+ |
Silicon Valley | LOCATION | 0.98+ |
tomorrow | DATE | 0.98+ |
one chef | QUANTITY | 0.98+ |
10 years | QUANTITY | 0.98+ |
one piece | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
Cloudera | ORGANIZATION | 0.97+ |
three entrees | QUANTITY | 0.97+ |
Strata Data Conference | EVENT | 0.97+ |
first thing | QUANTITY | 0.97+ |
first year | QUANTITY | 0.96+ |
single application | QUANTITY | 0.96+ |
today | DATE | 0.95+ |
couple of weeks ago | DATE | 0.95+ |
telco | ORGANIZATION | 0.95+ |
900 millionth time | QUANTITY | 0.95+ |
one company | QUANTITY | 0.94+ |
HCL | ORGANIZATION | 0.94+ |
a quarter | QUANTITY | 0.94+ |
DataTorret | ORGANIZATION | 0.93+ |
three channels | QUANTITY | 0.93+ |
two | QUANTITY | 0.92+ |
Big Data SV | EVENT | 0.92+ |
Big Data SV 2018 | EVENT | 0.91+ |
three further applications | QUANTITY | 0.86+ |
Apex | TITLE | 0.84+ |
a year | QUANTITY | 0.82+ |
Tableau | ORGANIZATION | 0.81+ |
Hadoop | PERSON | 0.81+ |
about a year ago | DATE | 0.8+ |
couple dozen animals | QUANTITY | 0.8+ |
product | QUANTITY | 0.78+ |
eight plus years | QUANTITY | 0.77+ |
Apache | ORGANIZATION | 0.76+ |
agile | TITLE | 0.76+ |
Guy | PERSON | 0.73+ |
Epoxy | ORGANIZATION | 0.71+ |
Tableau | TITLE | 0.71+ |
DataTorrent | PERSON | 0.7+ |
around 3.10 | DATE | 0.69+ |
Spark | TITLE | 0.68+ |
restaurants | QUANTITY | 0.66+ |
Gatekeeper | TITLE | 0.66+ |
model | QUANTITY | 0.63+ |
Dr. Tendu Yogurtcu, Syncsort | Big Data SV 2018
>> Announcer: Live from San Jose, it's theCUBE. Presenting data, Silicon Valley brought to you by Silicon Angle Media and it's ecosystem partners. >> Welcome back to theCUBE. We are live in San Jose at our event, Big Data SV. I'm Lisa Martin, my co-host is George Gilbert and we are down the street from the Strata Data Conference. We are at a really cool venue: Forager Eatery Tasting Room. Come down and join us, hang out with us, we've got a cocktail par-tay tonight. We also have an interesting briefing from our analysts on big data trends tomorrow morning. I want to welcome back to theCUBE now one of our CUBE VIP's and alumna Tendu Yogurtcu, the CTO at Syncsort, welcome back. >> Thank you. Hello Lisa, hi George, pleasure to be here. >> Yeah, it's our pleasure to have you back. So, what's going on at Syncsort, what are some of the big trends as CTO that you're seeing? >> In terms of the big trends that we are seeing, and Syncsort has grown a lot in the last 12 months, we actually doubled our revenue, it has been really an successful and organic growth path, and we have more than 7,000 customers now, so it's a great pool of customers that we are able to talk and see the trends and how they are trying to adapt to the digital disruption and make data as part of their core strategy. So data is no longer an enabler, and in all of the enterprise we are seeing data becoming the core strategy. This reflects in the four mega trends, they are all connected to enable business as well as operational analytics. Cloud is one, definitely. We are seeing more and more cloud adoption, even our financial services healthcare and banking customers are now, they have a couple of clusters running in the cloud, in public cloud, multiple workloads, hybrid seems to be the new standard, and it comes with also challenges. IT governance as well as date governance is a major challenge, and also scoping and planning for the workloads in the cloud continues to be a challenge, as well. Our general strategy for all of the product portfolio is to have our products following design wants and deploy any of our strategy. So whether it's a standalone environment on Linux or running on Hadoop or Spark, or running on Premise or in the Cloud, regardless of the Cloud provider, we are enabling the same education with no changes to run all of these environments, including hybrid. Then we are seeing the streaming trend, with the connected devices with the digital disruption and so much data being generated, being able to stream and process data on the age, with the Internet of things, and in order to address the use cases that Syncsort is focused on, we are really providing more on the Change Data Capture and near real-time and real-time data replication to the next generation analytics environments and big data environments. We launched last year our Change Data Capture, CDC, product offering with data integration, and we continue to strengthen that vision merger we had data replication, real-time data replication capabilities, and we are now seeing even Kafka database becoming a consumer of this data. Not just keeping the data lane fresh, but really publishing the changes from multiple, diverse set of sources and publishing into a Kafka database and making it available for applications and analytics in the data pipeline. So the third trend we are seeing is around data science, and if you noticed this morning's keynote was all about machine learning, artificial intelligence, deep learning, how to we make use of data science. And it was very interesting for me because we see everyone talking about the challenge of how do you prepare the data and how do you deliver the the trusted data for machine learning and artificial intelligence use and deep learning. Because if you are using bad data, and creating your models based on bad data, then the insights you get are also impacted. We definitely offer our products, both on the data integration and data quality side, to prepare the data, cleanse, match, and deliver the trusted data set for data scientists and make their life easier. Another area of focus for 2018 is can we also add supervised learning to this, because with the premium quality domain experts that we have now in Syncsort, we have a lot of domain experts in the field, we can infuse the machine learning algorithms and connect data profiling capabilities we have with the data quality capabilities recommending business rules for data scientists and helping them automate the mandate tasks with recommendations. And the last but not least trend is data governance, and data governance is almost a umbrella focus for everything we are doing at Syncsort because everything about the Cloud trend, the streaming, and the data science, and developing that next generation analytics environment for our customers depends on the data governance. It is, in fact, a business imperative, and the regulatory compliance use cases drives more importance today than governance. For example, General Data Protection Regulation in Europe, GDPR. >> Lisa: Just a few months away. >> Just a few months, May 2018, it is in the mind of every C-level executive. It's not just for European companies, but every enterprise has European data sourced in their environments. So compliance is a big driver of governance, and we look at governance in multiple aspects. Security and issuing data is available in a secure way is one aspect, and delivering the high quality data, cleansing, matching, the example Hilary Mason this morning gave in the keynote about half of what the context matters in terms of searches of her name was very interesting because you really want to deliver that high quality data in the enterprise, trust of data set, preparing that. Our Trillium Quality for big data, we launched Q4, that product is generally available now, and actually we are in production with very large deployment. So that's one area of focus. And the third area is how do you create visibility, the farm-to-table view of your data? >> Lisa: Yeah, that's the name of your talk! I love that. >> Yes, yes, thank you. So tomorrow I have a talk at 2:40, March 8th also, I'm so happy it's on the Women's Day that I'm talking-- >> Lisa: That's right, that's right! Get a farm-to-table view of your data is the name of your talk, track data lineage from source to analytics. Tell us a little bit more about that. >> It's all about creating more visibility, because for audit reasons, for understanding how many copies of my data is created, valued my data had been, and who accessed it, creating that visibility is very important. And the last couple of years, we saw everyone was focused on how do I create a data lake and make my data accessible, break the data silos, and liberate my data from multiple platforms, legacy platforms that the enterprise might have. Once that happened, everybody started worrying about how do I create consumable data set and how do I manage this data because data has been on the legacy platforms like Mainframe, IMBI series has been on relational data stores, it is in the Cloud, gravity of data originating in the Cloud is increasing, it's originating from mobile. Hadoop vendors like Hortonworks and Cloudera, they are creating visibility to what happens within the Hadoop framework. So we are deepening our integration with the Cloud Navigator, that was our announcement last week. We already have integration both with Hortonworks and Cloudera Navigator, this is one step further where we actually publish what happened to every single granular level of data at the field level with all of the transformations that data have been through outside of the cluster. So that visibility is now published to Navigator itself, we also publish it through the RESTful API, so governance is a very strong and critical initiative for all of the businesses. And we are playing into security aspect as well as data lineage and tracking aspect and the quality aspect. >> So this sounds like an extremely capable infrastructure service, so that it's trusted data. But can you sell that to an economic buyer alone, or do you go in in conjunction with anther solution like anti-money laundering for banks or, you know, what are the key things that they place enough value on that they would spend, you know, budget on it? >> Yes, absolutely. Usually the use cases might originate like anti-money laundering, which is very common, fraud detection, and it ties to getting a single view of an entity. Because in anti-money laundering, you want to understand the single view of your customer ultimately. So there is usually another solution that might be in the picture. We are providing the visibility of the data, as well as that single view of the entity, whether it's the customer view in this case or the product view in some of the use cases by delivering the matching capabilities and the cleansing capabilities, the duplication capabilities in addition to the accessing and integrating the data. >> When you go into a customer and, you know, recognizing that we still have tons of silos and we're realizing it's a lot harder to put everything in one repository, how do customers tell you they want to prioritize what they're bringing into the repository or even what do they want to work on that's continuously flowing in? >> So it depends on the business use case. And usually at the time that we are working with the customer, they selected that top priority use case. The risk here, and the anti-money laundering, or for insurance companies, we are seeing a trend, for example, building the data marketplace, as that tantalize data marketplace concept. So depending on the business case, many of our insurance customers in US, for example, they are creating the data marketplace and they are working with near real-time and microbatches. In Europe, Europe seems to be a bit ahead of the game in some cases, like Hadoop production was slow but certainly they went right into the streaming use cases. We are seeing more directly streaming and keeping it fresh and more utilization of the Kafka and messaging frameworks and database. >> And in that case, where they're sort of skipping the batch-oriented approach, how do they keep track of history? >> It's still, in most of the cases, microbatches, and the metadata is still associated with the data. So there is an analysis of the historical what happened to that data. The tools, like ours and the vendors coming to picture, to keep track, of that basically. >> So, in other words, by knowing what happened operationally to the data, that paints a picture of a history. >> Exactly, exactly. >> Interesting. >> And for the governance we usually also partner, for example, we partner with Collibra data platform, we partnered with ASG for creating that business rules and technical metadata and providing to the business users, not just to the IT data infrastructure, and on the Hadoop side we partner with Cloudera and Hortonworks very closely to complete that picture for the customer, because nobody is just interested in what happened to the data in Hadoop or in Mainframe or in my relational data warehouse, they are really trying to see what's happening on Premise, in the Cloud, multiple clusters, traditional environments, legacy systems, and trying to get that big picture view. >> So on that, enabling a business to have that, we'll say in marketing, 360 degree view of data, knowing that there's so much potential for data to be analyzed to drive business decisions that might open up new business models, new revenue streams, increase profit, what are you seeing as a CTO of Syncsort when you go in to meet with a customer, data silos, when you're talking to a Chief Data Officer, what's the cultural, I guess, not shift but really journey that they have to go on to start opening up other organizations of the business, to have access to data so they really have that broader, 360 degree view? What's that cultural challenge that they have to, journey that they have to go on? >> Yes, Chief Data Officers are actually very good partners for us, because usually Chief Data Officers are trying to break the silos of data and make sure that the data is liberated for the business use cases. Still most of the time the infrastructure and the cluster, whether it's the deployment in the Cloud versus on Premise, it's owned by the IT infrastructure. And the lines of business are really the consumers and the clients of that. CDO, in that sense, almost mitigates and connects to those line of businesses with the IT infrastructure with the same goals for the business, right? They have to worry about the compliance, they have to worry about creating multiple copies of data, they have to worry about the security of the data and availability of the data, so CDOs actually help. So we are actually very good partners with the CDOs in that sense, and we also usually have IT infrastructure owner in the room when we are talking with our customers because they have a big stake. They are like the gatekeepers of the data to make sure that it is accessed by the right... By the right folks in the business. >> Sounds like maybe they're in the role of like, good cop bad cop or maybe mediator. Well Tendu, I wish we had more time. Thanks so much for coming back to theCUBE and, like you said, you're speaking tomorrow at Strata Conference on International Women's Day: Get a farm-to-table view of your data. Love the title. >> Thank you. >> Good luck tomorrow, and we look forward to seeing you back on theCUBE. >> Thank you, I look forward to coming back and letting you know about more exciting both organic innovations and acquisitions. >> Alright, we look forward to that. We want to thank you for watching theCUBE, I'm Lisa Martin with my co-host George Gilbert. We are live at our event Big Data SV in San Jose. Come down and visit us, stick around, and we will be right back with our next guest after a short break. >> Tendu: Thank you. (upbeat music)
SUMMARY :
brought to you by Silicon Angle Media and we are down the street from the Strata Data Conference. Hello Lisa, hi George, pleasure to be here. Yeah, it's our pleasure to have you back. and in all of the enterprise we are seeing data and delivering the high quality data, Lisa: Yeah, that's the name of your talk! it's on the Women's Day that I'm talking-- is the name of your talk, track data lineage and make my data accessible, break the data silos, that they place enough value on that they would and the cleansing capabilities, the duplication So it depends on the business use case. It's still, in most of the cases, operationally to the data, that paints a picture And for the governance we usually also partner, and the cluster, whether it's the deployment Love the title. to seeing you back on theCUBE. and letting you know about more exciting and we will be right back with our next guest Tendu: Thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
George | PERSON | 0.99+ |
May 2018 | DATE | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Syncsort | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
US | LOCATION | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
ASG | ORGANIZATION | 0.99+ |
2018 | DATE | 0.99+ |
Tendu | PERSON | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
360 degree | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
more than 7,000 customers | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
last year | DATE | 0.99+ |
tomorrow morning | DATE | 0.99+ |
one aspect | QUANTITY | 0.99+ |
third area | QUANTITY | 0.99+ |
Linux | TITLE | 0.99+ |
Cloud Navigator | TITLE | 0.99+ |
2:40 | DATE | 0.98+ |
Women's Day | EVENT | 0.98+ |
Tendu Yogurtcu | PERSON | 0.98+ |
GDPR | TITLE | 0.98+ |
Spark | TITLE | 0.97+ |
tonight | DATE | 0.97+ |
Big Data SV | EVENT | 0.97+ |
Kafka | TITLE | 0.97+ |
International Women's Day | EVENT | 0.97+ |
both | QUANTITY | 0.97+ |
CDC | ORGANIZATION | 0.96+ |
Navigator | TITLE | 0.96+ |
Strata Data Conference | EVENT | 0.96+ |
single view | QUANTITY | 0.96+ |
Hadoop | TITLE | 0.95+ |
third trend | QUANTITY | 0.95+ |
one step | QUANTITY | 0.95+ |
single view | QUANTITY | 0.95+ |
Dr. | PERSON | 0.94+ |
theCUBE | ORGANIZATION | 0.94+ |
CUBE | ORGANIZATION | 0.94+ |
this morning | DATE | 0.94+ |
Cloud | TITLE | 0.92+ |
last 12 months | DATE | 0.91+ |
Change Data Capture | ORGANIZATION | 0.9+ |
today | DATE | 0.9+ |
European | OTHER | 0.88+ |
last couple of years | DATE | 0.88+ |
General Data Protection Regulation in Europe | TITLE | 0.86+ |
Strata Conference | EVENT | 0.84+ |
one | QUANTITY | 0.83+ |
one repository | QUANTITY | 0.83+ |
tons of silos | QUANTITY | 0.82+ |
one area | QUANTITY | 0.82+ |
Q4 | DATE | 0.82+ |
Big Data SV 2018 | EVENT | 0.81+ |
four mega trends | QUANTITY | 0.76+ |
March 8th | DATE | 0.76+ |
Matthew Baird, AtScale | Big Data SV 2018
>> Announcer: Live from San Jose. It's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and it's ecosystem partners. (techno music) >> Welcome back to theCUBE, our continuing coverage on day one of our event, Big Data SV. I'm Lisa Martin with George Gilbert. We are down the street from the Strata Data Conference. We've got a great, a lot of cool stuff going on. You can see the cool set behind me. We are at Forager Tasting Room & Eatery. Come down and join us, be in our audience today. We have a cocktail event tonight, who doesn't want to join that? And we have a nice presentation tomorrow morning of our Wikibon's 2018 Big Data Forecast and Review. Joining us next is Matthew Baird the co-founder of AtScale. Matthew, welcome to theCUBE. >> Thanks for having me. Fantastic venue, by the way. >> Isn't it cool? >> This is very cool. >> Yeah, it is. So, talking about Big Data, you know, Gardner says, "85% of Big Data projects have failed." I often say failure is not a bad F word, because it can spawn the genesis of a lot of great business opportunities. Data lakes were big a few years ago, turned into swamps. AtScale has this vision of Data Lake 2.0, what is that? >> So, you're right. There have been a lot of failures, there's no doubt about it. And you're also right that is how we evolve, and we're a Silicon Valley based company. We don't give up when faced with these things. It's just another way to not do something. So, what we've seen and what we've learned through our customers is they need to have a solution that is integrated with all the technologies that they've adopted in the enterprise. And it's really about, if you're going to make a data lake, you're going to have data on there that is the crown jewels of your business. How are you going to get that in the hands of your constituents, so that they can analyze it, and they can use it to make decisions? And how can we, furthermore, do that in a way that supplies governance and auditability on top of it, so that we aren't just sending data out into the ether and not knowing where it goes? We have a lot of customers in the insurance, health insurance space, and with financial customers that the data absolutely must be managed. I think one of the biggest changes is around that integration with the current technologies. There's a lot of movement into the Cloud. The new data lake is kind of focused more on these large data stores, where it was HDFS with Hadoop. Now it's S3, Google's object storage, and Azure ADLS. Those are the sorts of things that are backing the new data lake I believe. >> So if we take these, where the Data Lake Store didn't have to be something that's a open source HDFS implementation, it could even be through just through a HDSF API. >> Matthew: Yeah, absolutely. >> What are some of the, how should we think about the data sources and feeds, for this repository, and then what is it on top that we need to put to make the data more consumable? >> Yeah, that's a good point. S3, Google Object Storage, and Azure, they all have a characteristic of, they are large stores. You can store as much as you want. They generally on the Clouds, and in the open source on-prem software for landing the data exists, for streaming the data and landing it, but the important thing there is it's cost-effective. S3 is a cost-effective storage system. HDFS is a mostly cost-effective storage system. You have to manage it, so it has a slightly higher cost, but the advice has been, get it to the place you're going to store it. Store it in a unified format. You get a halo effect when you have a unified format, and I think the industry is coalescing around... I'd probably say ParK's in the lead right now, but once ParK can be read by, let's take Amazon for instance, can be read by Athena, can be read by Redshift Spectrum, it can be read by their EMR, now you have this halo effect where your data's always there, always available to be consumed by a tool or a technology that can then deliver it to your end users. >> So when we talk about ParK, we're talking about columnar serialization format, >> Matthew: Yes. but there's more on top of that that needs to be layered, so that you can, as we were talking about earlier, combine the experience of a data warehouse, and the curated >> Absolutely data access where there's guard rails, >> Matthew: Yes >> and it's simple, versus sort of the wild west, but where I capture everything in a data lake. How do you bring those two together? >> Well, specifically for AtScale, we allow you to integrate multiple data access tools in AtScale, and then we use the appropriate tool to access the data for the use case. So let me give you an example, in the Amazon case, Redshift is wonderful for accessing interactive data, which BI users want, right? They want fast queries, sub-second queries. They don't want to pay to have all the raw data necessarily stored in Redshift 'cause that's pretty expensive. So they have this Redshift spectrum, it's sitting in S3, that's cost effective. So when we go and we read raw data to build these summary tables, to deliver the data fast, we can read from Spectrum, we can put it all together, drop it into Redshift, a much smaller volume of data, so it has faster characteristics for being accessed. And it delivers it to the user that way. We do that in Hadoop when we access via Hive for building aggregate tables, but Spark or Impala, is a much faster interactive engine, so we use those. As I step back and look at this, I think the Data Lake 2.0, from a technical perspective is about abstraction, and abstraction's sort of what separates us from the animals, right? It's a concept where we can pack a lot of sophistication and complexity behind an interface that allows people to just do what they want to do. You don't know how, or maybe you do know how a car engine works, I don't really, kind of, a little bit, but I do know how to press the gas pedal and steer. >> Right. >> I don't need to know these things, and I think the Data Lake 2.0 is about, well I don't need to know how Century, or Ranger, or Atlas, or any of these technologies work. I need to know that they're there, and when I access data, they're going to be applied to that data, and they're going to deliver me the stuff that I have access to and that I can see. >> So a couple things, it sounded like I was hearing abstraction, and you said really that's kind of the key, that sounds like a differentiator for AtScale, is giving customers that abstraction they need. But I'm also curious from a data value perspective, you talked about in Redshift from an expense perspective. Do you also help customers gain abstraction by helping them evaluate value of data and where they ought to keep it, and then you give them access to it? Or is that something that they need to do, kind of bring to the table? >> We don't really care, necessarily, about the source of the data, as long as it can be expressed in a way that can be accessed by whatever engine it is. Lift and shift is an example. There's a big move to move from Teradata or from Netezza into a Cloud-based offering. People want to lift it and shift it. It's the easiest way to do this. Same table definitions, but that's not optimized necessarily for the underlying data store. Take BigQuery for example, BigQuery's an amazing piece of technology. I think there's nothing like it out there in the market today, but if you really want BigQuery to be cost-effective, and perform and scale up to concurrency of... one of our customers is going to roll out about 8,000 users on this. You have to do things in BigQuery that are BigQuery-friendly. The data structures, the way that you store the data, repeated values, those sorts of things need to be taken into consideration when you build your schema out for consumption. With AtScale they don't need to think about that, they don't need to worry about it, we do it for them. They drop the schema in the same way that it exists on their current technology, and then behind the scenes, what we're doing is we're looking at signals, we're looking at queries, we're looking at all the different ways that people access the data naturally, and then we restructure those summary tables using algorithms and statistics, and I think people would broadly call it ML type approaches, to build out something that answers those questions, and adapts over time to new questions, and new use cases. So it's really about, imagine you had the best data engineering team in the world, in a box, they're never tired, they never stop, and they're always interacting with what the customers really want, which is "Now I want to look at the data this way". >> It's sounds actually like what your talking about is you have a whole set of sources, and targets, and you understand how they operate, but why I say you, I mean your software. And so that you can take data from wherever it's coming in, and then you apply, if it's machine learning or whatever other capabilities to learn from the access methods, how to optimize that data for that engine. >> Matthew: Exactly. >> And then the end users have an optimal experience and it's almost like the data migration service that Amazon has, it's like, you give us your Postgres or Oracle database, and we'll migrate it to the cloud. It sounds like you add a lot of intelligence to that process for decision support workloads. >> Yes. >> And figure out, so now you're going to... It's not Postgres to Postgres, but it might be Teradata to Redshift, or S3, that's going to be accessed by Athena or Redshift, and then let's put that in the right format. >> I think you sort of hit something that we've noticed is very powerful, which is if you can set up, and we've done this with a number of customers, if you can set up at the abstraction layer that is AtScale, on your on-prem data, literally in, say hours, you can move it into the Cloud, obviously you have to write the detail to move it into the Cloud, but once it's in the Cloud you take the same AtScale instance, you re-point it at that new data source, and it works. We've done that with multiple customers, and it's fast and effective, and it let's you actually try out things that you may not have the agility to do before because there's differences in how the SQL dialects work, there's differences in, potentially, how the schema might be built. >> So a couple things I'm interested in, I'm hearing two A-words, that abstraction that we've talked about a number of times, you also mention adaptability. So when you're talking with customers, what are some of the key business outcomes they need to drive, where adaptability and abstraction are concerned, in terms of like cost reduction, revenue generation. What are some of those see-swee business objectives that AtScale can help companies achieve? >> So looking at, say, a customer, a large retailer on the East Coast, everybody knows the stores, they're everywhere, they sell hardware. they have a 20-terabyte cube that they use for day-to-day revenue analytics. So they do period over period analysis. When they're looking at stores, they're looking at things like, we just tried out a new marketing approach... I was talking to somebody there last week about how they have these special stores where they completely redo one area and just see how that works. They have to be able to look at those analytics, and they run those for a short amount of time. So if you're window for getting data, refreshing data, building cubes, which in the old world could take a week, you know my co-founder at Yahoo, he had a week and a half build time. That data is now two weeks old, maybe three weeks old. There might be bugs in it-- >> And the relevance might be, pshh... >> And the relevance goes down, or you can't react as fast. I've been at companies where... Speed is so important these days, and the new companies that are grasping data aggressively, putting it somewhere where they can make decisions on it on a day-to-day basis, they're winning. And they're spending... I was at a company that was spending three million dollars on pay-per-click data, a month. If you can't get data everyday, you're on the wrong campaigns, and everything goes off the rails, and you only learn about it a week later, that's 25% of your spend, right there, gone. >> So the biggest thing, sorry George, it really sounds to me like what AtScale can facilitate for probably customers in any industry is the ability to truly make data-driven business decisions that can really directly affect revenue and profit. >> Yes, and in an agile format. So, you can build-- >> That's the third A; agile, adaptability, abstraction. >> There ya go, the three A's. (Lisa laughs) We had the three V's, now we have the three A's. >> Yes. >> The fact that you're building a curated model, so in retail the calendars are complex. I'm sure everybody that uses Tableau is good at analyzing data, but they might not know what your rules are around your financial calendar, or around the hierarchies of your product. There's a lot of things that happen where you want an enterprise group of data modelers to build it, bless it, and roll it out, but then you're a user, and you say, wait, you forgot x, y, and z, I don't want to wait a week, I don't want to wait two weeks, three weeks, a month, maybe more. I want that data to be available in the model an hour later 'cause that's what I get with Tableau today. And that's where we've taken the two approaches of enterprise analytics and self-service, and tried to create a scenario where you get the best of both worlds. >> So, we know that an implication of what you're telling us is that insights are perishable, and latency is becoming more and more critical. How do you plan to work with streaming data where you've got a historical archive, but you've got fresh data coming in? But fresh could mean a variety of things. Tell us what some of those scenarios look like. >> Absolutely, I think there's two approaches to this problem, and I'm seeing both used in practice, and I'm not exactly sure, although I have some theories on which one's going to win. In one case, you are streaming everything into, sort of a... like I talked about, this data lake, S3, and you're putting it in a format like ParK, and then people are accessing it. The other way is access the data where it is. Maybe it's already in, this is a common BI scenario, you have a big data store, and then you have a dimensional data store, like Oracle has your customers, Hadoop has machine data about those customers accessing on their mobile devices or something. If there was some way to access those data without having to move the Oracle stuff into the big data store, that's a Federation story that I think we've talked about in the Bay Area for a long time, or around the world for a long time. I think we're getting closer to understanding how we can do that in practice, and have it be tenable. You don't move the big data around, you move the small data around. For data coming in from outside sources it's probably a little bit more difficult, but it is kind of a degenerate version of the same story. I would say that streaming is gaining a lot of momentum, and with what we do, we're always mapping, because of the governance piece that we've built into the product, we're always mapping where did the data come from, where did it land, and how did we use it to build summary tables. So if we build five summary tables, 'cause we're answering different types of questions, we still need to know that it goes back to this piece of data, which has these security constraints, and these audit requirements, and we always track it back to that, and we always apply those to our derived data. So when you're accessing this automatically ETLed summary tables, it just works the way it is. So I think that there are two ways that this is going to expand and I'm excited about Federation because I think the time has come. I'm also excited about streaming. I think they can serve two different use cases, and I don't actually know what the answer will be, because I've seen both in customers, it's some of the biggest customers we have. >> Well Matthew thank you so much for stopping by, and four A's, AtScale can facilitate abstraction, adaptability, and agility. >> Yes. Hashtag four A's. >> There we go. I don't even want credit for that. (laughs) >> Oh wow, I'm going to get five more followers, I know it! (George laughs) >> There ya go! >> We want to thank you for watching theCUBE, I am Lisa Martin, we are live in San Jose, at our event Big Data SV, I'm with George Gilbert. Stick around, we'll be back with our next guest after a short break. (techno music)
SUMMARY :
Brought to you by SiliconANGLE Media, We are down the street from the Strata Data Conference. Thanks for having me. because it can spawn the genesis that is the crown jewels of your business. So if we take these, that can then deliver it to your end users. and the curated and it's simple, versus sort of the wild west, And it delivers it to the user that way. and they're going to deliver me the stuff and then you give them access to it? The data structures, the way that you store the data, And so that you can take data and it's almost like the data migration service but it might be Teradata to Redshift, and it let's you actually try out things they need to drive, and just see how that works. And the relevance goes down, or you can't react as fast. is the ability to truly make data-driven business decisions Yes, and in an agile format. We had the three V's, now we have the three A's. where you get the best of both worlds. How do you plan to work with streaming data and then you have a dimensional data store, and four A's, AtScale can facilitate abstraction, Yes. I don't even want credit for that. We want to thank you for watching theCUBE,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Matthew | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Matthew Baird | PERSON | 0.99+ |
George | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
three weeks | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
25% | QUANTITY | 0.99+ |
Gardner | PERSON | 0.99+ |
two approaches | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
two weeks | QUANTITY | 0.99+ |
Redshift | TITLE | 0.99+ |
S3 | TITLE | 0.99+ |
three million dollars | QUANTITY | 0.99+ |
two ways | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
one case | QUANTITY | 0.99+ |
85% | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
a month | QUANTITY | 0.99+ |
Century | ORGANIZATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
a week | QUANTITY | 0.99+ |
BigQuery | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
20-terabyte | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
a week and a half | QUANTITY | 0.99+ |
a week later | DATE | 0.99+ |
Data Lake 2.0 | COMMERCIAL_ITEM | 0.99+ |
two | QUANTITY | 0.99+ |
tomorrow morning | DATE | 0.99+ |
AtScale | ORGANIZATION | 0.99+ |
Atlas | ORGANIZATION | 0.99+ |
Bay Area | LOCATION | 0.98+ |
Lisa | PERSON | 0.98+ |
ParK | TITLE | 0.98+ |
Tableau | TITLE | 0.98+ |
five more followers | QUANTITY | 0.98+ |
an hour later | DATE | 0.98+ |
Ranger | ORGANIZATION | 0.98+ |
Netezza | ORGANIZATION | 0.98+ |
tonight | DATE | 0.97+ |
today | DATE | 0.97+ |
both worlds | QUANTITY | 0.97+ |
about 8,000 users | QUANTITY | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
Strata Data Conference | EVENT | 0.97+ |
one | QUANTITY | 0.97+ |
Big Data SV 2018 | EVENT | 0.97+ |
Teradata | ORGANIZATION | 0.96+ |
AtScale | TITLE | 0.96+ |
Big Data SV | EVENT | 0.93+ |
East Coast | LOCATION | 0.93+ |
Hadoop | TITLE | 0.92+ |
two different use cases | QUANTITY | 0.92+ |
day one | QUANTITY | 0.91+ |
one area | QUANTITY | 0.91+ |
Scott Gnau, Hortonworks | Big Data SV 2018
>> Narrator: Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to the Cube's continuing coverage of Big Data SV. >> This is out tenth Big Data event, our fifth year in San Jose. We are down the street from the Strata Data Conference. We invite you to come down and join us, come on down! We are at Forager Tasting Room & Eatery, super cool place. We've got a cocktail event tonight, and a endless briefing tomorrow morning. We are excited to welcome back to the Cube, Scott Gnau, the CTO of Hortonworks. Hey, Scott, welcome back. >> Thanks for having me, and I really love what you've done with the place. I think there's as much energy here as I've seen in the entire show. So, thanks for having me over. >> Yeah! >> We have done a pretty good thing to this place that we're renting for the day. So, thanks for stopping by and talking with George and I. So, February, Hortonworks announced some news about Hortonworks DataFlow. What was in that announcement? What does that do to help customers simplify data in motion? What industries is it going to be most impactful for? I'm thinking, you know, GDPR is a couple months away, kind of what's new there? >> Well, yeah, and there are a couple of topics in there, right? So, obviously, we're very committed to, which I think is one of our unique value propositions, is we're committed to really creating an easy to use data management platform, as it were, for the entire lifecycle of data, from one data created at the edge and as data are streaming from one place to another place, and, at rest, analytics get run, analytics get pushed back out to the edge. So, that entire lifecycle is really the footprint that we're looking at, and when you dig a level into that, obviously, the data in motion piece is usually important, and So I think one a the things that we've looked at is we don't want to be just a streaming engine or just a tool for creating pipes and data flows and so on. We really want to create that entire experience around what needs to happen for data that's moving, whether it be acquisition at the edge in a protected way with provenance and encryption, whether it be applying streaming analytics as the data are flowing and everywhere kind of in between, and so that's what HDF represents, and what we released in our latest release, which, to your point, was just a few weeks ago, is a way for our customers to go build their data in motion applications using a very simple drag and drop GUI interface. So, they don't have to understand all of the different animals in the zoo, and the different technologies that are in play. It's like, "I want to do this." Okay, here's a GUI tool, you can have all of the different operators that are represented by the different underlying technologies that we provide as Hortonworks DataFlow, and you can stream them together, and then, you can make those applications and test those applications. One of the biggest enhancements that we did, is we made it very easy then for once those things are built in a laptop environment or in a dev environment, to be published out to production or to be published out to other developers who might want to enhance them and so on. So, the idea is to make it consumable inside of an enterprise, and when you think about data in motion and IOT and all those use cases, it's not going to be one department, one organization, or one person that's doing it. It's going to be a team of people that are distributed just like the data and the sensors, and, so, being able to have that sharing capability is what we've enhanced in the experience. >> So, you were just saying, before we went live, that you're here having speed dates with customers. What are some of the things... >> It's a little bit more sincere than that, but yeah. >> (laughs) Isn't speed dating sincere? It's 2018, I'm not sure. (Scott laughs) What are some of the things that you're hearing from customers, and how is that helping to drive what's coming out from Hortonworks? >> So, the two things that I'm hearing right, number one, certainly, is that they really appreciate our approach to the entire lifecycle of data, because customers are really experiencing huge data volume increases and data just from everywhere, and it's no longer just from the ERP system inside the firewall. It's from third party, it's from Sensors, it's from mobile devices, and, so, they really do appreciate kind of the territory that we cover with the tools and technologies we bring to market, and, so, that's been very rewarding. Clearly, customers who are now well into this path, they're starting to think about, in this new world, data governance, and data governance, I just took all of the energy out of the room, governance, it sounds like, you know, hard. What I mean by data governance, really, is customers need to understand, with all of this diverse, connected data everywhere, in the cloud, on PRIM, then Sensors, third party, partners, is, frankly, they need a trail of breadcrumbs that say what is it, where'd it come from, who had access to it, and then, what did they do with it? If you start to piece that together, that's what they really need to understand, the data estate that belongs to them, so they can turn that into refined product, and, so, when you then segway in one of your earlier questions, that GDPR is, certainly, a triggering point where if it's like, okay, the penalties are huge, oh my God, it's a whole new set of regulations that I have to comply with, and when you think about that trail of breadcrumbs that I just described, that actually becomes a roadmap for compliance under regulations like GDPR, where if a European customer calls up and says, "Forget my data.", the only way that you can guarantee that you forgot that person's data, is to actually understand where it all is, and that requires proper governance, tools, and techniques, and, so, when I say governance, it's, really, not like, you know, the governor and the government, and all that. That's an aspect, but the real, important part is how do I keep all of that connectivity so that I can understand the landscape of data that I've got access to, and I'm hearing a lot of energy around that, and when you think about an IOT kind of world, distributed processing, multiple hybrid cloud footprints, data is just everywhere, and, so, the perimeter is no longer fixed, it's kind of variable, and being able to keep track of that is a very important thing for our customers. >> So, continuing on that theme, Scott. Data lakes seem to be the first major new repository we added after we had data warehouses and data marts, and it looked like the governance solutions were sort of around that perimeter of the data lake. Tell us, you were alluding to, sort of, how many more repositories, whether at rest or in motion, there are for data. Do we have to solve the governance problem end-to-end before we can build meaningful applications? >> So, I would argue personally, that governance is one of the most strategic things for us as an industry, collectively, to go solve in a universal way, and what I mean by that, is throughout my career, which is probably longer than I'd like to admit, in an EDW centric world, where things are somewhat easier in terms of the perimeter and where the data came from, data sources were much more controlled, typically ERP systems, owned wholly by a company. Even in that era, true data governance, meta data management, and that provenance was never really solved adequately. There were 300 different solutions, none of which really won. They were all different, non-compatible, and the problem was easier. In this new world, with connected data, the problem is infinitely more difficult to go solve, and, so, that same kind of approach of 300 different proprietary solutions I don't think is going to work. >> So, tell us, how does that approach have to change and who can make that change? >> So, one of the things, obviously, that we're driving is we're leveraging our position in the open community to try to use the community to create that common infrastructure, common set of APIs for meta data management, and, of course, we call that Apache Atlas, and we work with a lot of partners, some of whom are customers, some of whom are other vendors, even some of whom could be considered competitors, to try to drive an Apache open source kind of project to become that standard layer that's common into which vendors can bring their applications. So, now, if I have a common API for tracking meta data in that trail of breadcrumbs that's commonly understood, I can bring in an application that helps customers go develop the taxonomy of the rules that they want to implement, and, then, that helps visualize all of the other functionality, which is also extremely important, and that's where I think specialization comes into play, but having that common infrastructure, I think, is a really important thing, because that's going to enable data, data lakes, IOT to be trusted, and if it's not trusted, it's not going to be successful. >> Okay, there's a chicken and an egg there it sounds like, potentially. >> Am I the chicken or the egg? >> Well, you're the CTO. (Lisa laughs) >> Okay. >> The thing I was thinking of was, the broader the scope of trust that you're trying to achieve at first, the more difficult the problem, do you see customers wanting to pick off one high value application, not necessarily that's about managing what's in Atlas, in the meta data, so much as they want to do an IOT app and they'll implement some amount of governance to solve that app. In other words, which comes first? Do they have to do the end-to-end meta data management and governance, or do they pick a problem off first? >> In this case, I think it's chicken or egg. I mean, you could start from either point. I see customers who are implementing applications in the IOT space, and they're saying, "Hey, this requires a new way to think of governance, "so, I'm going to go and build that out, but I'm going to "think about it being pluggable into the next app." I also see a lot of customers, especially in highly regulated industries, and especially in highly regulated jurisdictions, who are stepping back and saying, "Forget the applications, this is a data opportunity, "and, so, I want to go solve my data fabric, "and I want to have some consistency across "that data fabric into which I can publish data "for specific applications and guarantee "that, wholistically, I am compliant "and that I'm sitting inside of our corporate mission "and all of those things." >> George: Okay. >> So, one of the things you mention, and we talk about this a lot, is the proliferation of data. It's so many, so many different sources, and companies have an opportunity, you had mentioned the phrase data opportunity, there is massive opportunity there, but you said, you know, from even a GDR perspective alone, I can't remove the data if I don't know where it is to the breadcrumbs. As a marketer, we use terms like get a 360 degree view of your customer. Is that actually really something that customers can achieve leveraging a data. Can they actually really get, say a retailer, a 360, a complete view of their customer? >> Alright, 358. >> That's pretty good! >> And we're getting there. (Lisa laughs) Yeah, I mean, obviously, the idea is to get a much broader view, and 360 is a marketing term. I'm not a marketing person, >> Yes. But it, certainly, creates a much broader view of highly personalized information that help you interact with your customer better, and, yes, we're seeing customers do that today and have great success with it and actually change and build new business models based on that capability, for sure. The folks who've done that have realized that in this new world, the way that that works is you have to have a lot of people have access to a lot of data, and that's scary, because that's not the way it used to be, right? >> Right. >> It used to be you go to the DBA and you ask for access, and then, your boss has to sign off and say it's what you asked for. In this world, you need to have access to all of it. So, when you think about this new governance capability where as part of the governance integrated with security, personalized information can be encrypted, it can be blurred out, but you still have access to the data to look at the relationships to be found in the data to build out those sophisticated models. So, that's where not only is it a new opportunity for governance just because the sources, the variety at the different landscape, but it's, ultimately, very much required, because if you're the CSO, you're not going to give access to the marketing team all of its customer data unless you understand that, right, but it has to be, "I'm just giving it to you, "and I know that it's automatically protected." versus, "I'm going to let you ask for it." to be successful. >> Right. >> I guess, following up on that, it sounds like what we were talking about, chicken or egg. Are you seeing an accelerating shift from where data is sort of collected, centrally, from applications, or, what we hear on Amazon, is the amount coming off the edge is accelerating. >> It is, and I think that that is a big drive to, frankly, faster clouded option, you know, the analytic space, particularly, has been a laggard in clouded option for many reasons, and we've talked about it previously, but one of the biggest reasons, obviously, is that data has gravity, data movement is expensive, and, so, now, when you think about where data is being created, where it lives, being further out on the edge, and may live its entire lifecycle in the cloud, you're seeing a reversal of gravity more towards cloud, and that, again, creates more opportunities in terms of driving a more varied perimeter and just keeping track of where all the assets are. Finally, I think it also leads to this notion of managing entire lifecycle of data. One of the implications of that is if data is not going to be centralized, it's going to live in different places, applications have to be portable to move to where the data exists. So, when I think about that landscape of creating ubiquitous data management within Hortonworks' portfolio, that's one of the big values that we can create for our customers. Not only can we be an on-ramp to their hybrid architecture, but as we become that on-ramp, we can also guarantee the portability of the applications that they've built out to those cloud footprints and, ultimately, even out to the edge. >> So, a quick question, then, to clarify on that, or drill down, would that mean you could see scenarios where Hortonworks is managing the distribution of models that do the inferencing on the edge, and you're collecting, bringing back the relevant data, however that's defined, to do the retraining of any models or recreation of new models. >> Absolutely, absolutely. That's one of the key things about the NiFi project in general and Hortonworks DataFlow, specifically, is the ability to selectively move data, and the selectivity can be based on analytic models as well. So, the easiest case to think about is self-driving cars. We all understand how that works, right? A self-driving car has cameras, and it's looking at things going on. It's making decisions, locally, based on models that have been delivered, and they have to be done locally, because of latency, right, but, selectively, hey, here's something that I saw as an image I didn't recognize. I need to send that up, so that it can be added to my lexicon of what images are and what action should be taken. So, of course, that's all very futuristic, but we understand how that works, but that has application in things that are very relevant today. Think about jet engines that have diagnostics running. Do I need to send that terabyte of data an hour over an expensive thing? No, but I have a model that runs locally that says, "Wow, this thing looks interesting. "Let me send a gigabyte now for immediate action." So, that decision making capability is extremely important. >> Well, Scott, thanks so much for taking some time to come chat with us once again on the Cube. We appreciate your insights. >> Appreciate it, time flies. This is great. >> Doesn't it? When you're having fun! >> Yeah. >> Alright, we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at Forager Tasting Room in downtown San Jose at our own event, Big Data SV. We'd love for you to come on down and join us tonight, today, tonight, and tomorrow. Stick around, we'll be right back with our next guest after a short break. (techno music) >> Narrator: Since the dawn of the cloud, the Cube
SUMMARY :
Brought to you by SiliconANGLE Media Welcome back to the Cube's We are down the street from the Strata Data Conference. as I've seen in the entire show. What does that do to help customers simplify data in motion? So, the idea is to make it consumable What are some of the things... It's a little bit more from customers, and how is that helping to drive what's that I have to comply with, and when you think and it looked like the governance solutions the problem is infinitely more difficult to go solve, So, one of the things, obviously, Okay, there's a chicken and an egg there it sounds like, Well, you're the CTO. of governance to solve that app. "so, I'm going to go and build that out, but I'm going to So, one of the things you mention, is to get a much broader view, that help you interact with your customer better, in the data to build out those sophisticated models. off the edge is accelerating. if data is not going to be centralized, of models that do the inferencing on the edge, is the ability to selectively move data, to come chat with us once again on the Cube. This is great. Alright, we want to thank you for watching the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George | PERSON | 0.99+ |
Scott | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Scott Gnau | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
February | DATE | 0.99+ |
360 degree | QUANTITY | 0.99+ |
2018 | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
358 | OTHER | 0.99+ |
GDPR | TITLE | 0.99+ |
today | DATE | 0.99+ |
tomorrow morning | DATE | 0.99+ |
fifth year | QUANTITY | 0.99+ |
tonight | DATE | 0.99+ |
Lisa | PERSON | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
Hortonworks' | ORGANIZATION | 0.99+ |
one department | QUANTITY | 0.99+ |
one organization | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
360 | QUANTITY | 0.98+ |
one person | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Cube | ORGANIZATION | 0.97+ |
Strata Data Conference | EVENT | 0.96+ |
300 different solutions | QUANTITY | 0.96+ |
an hour | QUANTITY | 0.95+ |
One | QUANTITY | 0.95+ |
tenth | QUANTITY | 0.95+ |
300 different proprietary solutions | QUANTITY | 0.95+ |
Big Data SV 2018 | EVENT | 0.93+ |
few weeks ago | DATE | 0.92+ |
one data | QUANTITY | 0.87+ |
Atlas | TITLE | 0.86+ |
Hortonworks DataFlow | ORGANIZATION | 0.85+ |
Big Data | EVENT | 0.85+ |
Cube | COMMERCIAL_ITEM | 0.84+ |
Silicon Valley | LOCATION | 0.83+ |
European | OTHER | 0.82+ |
DBA | ORGANIZATION | 0.82+ |
Apache | TITLE | 0.79+ |
Tasting | ORGANIZATION | 0.76+ |
Apache | ORGANIZATION | 0.73+ |
CTO | PERSON | 0.72+ |
Sensors | ORGANIZATION | 0.71+ |
downtown San Jose | LOCATION | 0.7+ |
Forager Tasting Room | LOCATION | 0.67+ |
SV | EVENT | 0.66+ |
terabyte of data | QUANTITY | 0.66+ |
NiFi | ORGANIZATION | 0.64+ |
Forager | LOCATION | 0.62+ |
Narrator: | TITLE | 0.6+ |
Big Data | ORGANIZATION | 0.55+ |
Room | LOCATION | 0.52+ |
Eatery | ORGANIZATION | 0.45+ |
Paul Appleby, Kinetica | Big Data SV 2018
>> Announcer: From San Jose, it's theCUBE. (upbeat music) Presenting Big Data, Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to theCUBE. We are live on our first day of coverage of our event, Big Data SV. This is our tenth Big Data event. We've done five here in Silicon Valley. We also do them in New York City in the fall. We have a great day of coverage. We're next to where the Startup Data conference is going on at Forger Tasting Room and Eatery. Come on down, be part of our audience. We also have a great party tonight where you can network with some of our experts and analysts. And tomorrow morning, we've got a breakfast briefing. I'm Lisa Martin with my co-host, Peter Burris, and we're excited to welcome to theCUBE for the first time the CEO of Kinetica, Paul Appleby. Hey Paul, welcome. >> Hey, thanks, it's great to be here. >> We're excited to have you here, and I saw something marketer, and terms, I grasp onto them. Kinetica is the insight engine for the extreme data economy. What is the extreme data economy, and what are you guys doing to drive insight from it? >> Wow, how do I put that in a snapshot? Let me share with you my thoughts on this because the fundamental principals around data have changed. You know, in the past, our businesses are really validated around data. We reported out how our business performed. We reported to our regulators. Over time, we drove insights from our data. But today, in this kind of extreme data world, in this world of digital business, our businesses need to be powered by data. >> So what are the, let me task this on you, so one of the ways that we think about it is that data has become an asset. >> Paul: Oh yeah. >> It's become an asset. But now, the business has to care for, has to define it, care for it, feed it, continue to invest in it, find new ways of using it. Is that kind of what you're suggesting companies to think about? >> Absolutely what we're saying. I mean, if you think about what Angela Merkel said at the World Economic Forum earlier this year, that she saw data as the raw material of the 21st century. And talking about about Germany fundamentally shifting from being an engineering, manufacturing centric economy to a data centric economy. So this is not just about data powering our businesses, this is about data powering our economies. >> So let me build on that if I may because I think it gets to what, in many respects Kinetica's Core Value proposition is. And that is, is that data is a different type of an asset. Most assets are characterized by, you apply it here, or you apply it there. You can't apply it in both places at the same time. And it's one of the misnomers of the notion of data as fuels. Because fuel is still an asset that has certain specificities, you can't apply it to multiple places. >> Absolutely. >> But data, you can, which means that you can copy it, you can share it. You can combine it in interesting ways. But that means that the ... to use data as an asset, especially given the velocity and the volume that we're talking about, you need new types of technologies that are capable of sustaining the quality of that data while making it possible to share it to all the different applications. Have I got that right? And what does Kinetica do in that regard? >> You absolutely nailed it because what you talked about is a shift from predictability associated with data, to unpredictability. We actually don't know the use cases that we're going to leverage for our data moving forward, but we understand how valuable an asset it is. And I'll give you two examples of that. There's a company here, based in the Bay Area, a really cool company called Liquid Robotics. And they build these autonomous aquatic robots. And they've carried a vast array of senses and now we're collecting data. And of course, that's hugely powerful to oil and gas exploration, to research, to shipping companies, etc. etc. etc. Even homeland security applications. But what they did, they were selling the robots, and what they realized over time is that the value of their business wasn't the robots. It was the data. And that one piece of data has a totally different meaning to a shipping company than it does to a fisheries companies. But they could sell that exact same piece of data to multiple companies. Now, of course, their business has grown on in Scaldon. I think they were acquired by Bowing. But what you're talking about is exactly where Kinetica sits. It's an engine that allows you to deal with the unpredictability of data. Not only the sources of data, but the uses of data, and enables you to do that in real time. >> So Kinetica's technology was actually developed to meet some intelligence needs of the US Army. My dad was a former army ranger airborne. So tell us a little bit about that and kind of the genesis of the technology. >> Yeah, it's a fascinating use case if you think about it, where we're all concerned, globally, about cyber threat. We're all concerned about terrorist threats. But how do you identity terrorist threats in real time? And the only way to do that is to actually consume vast amount of data, whether it's drone footage, or traffic cameras. Whether it's mobile phone data or social data. but the ability to stream all of those sources of data and conduct analytics on that in real time was, really, the genesis of this business. It was a research project with the army and the NSA that was aimed at identifying terrorist threats in real time. >> But at the same time, you not only have to be able to stream all the data in and do analytics on it, you also have to have interfaces and understandable approaches to acquiring the data, because I have a background, some background in that as well, to then be able to target the threat. So you have to be able to get the data in and analyze it, but also get it out to where it needs to be so an action can be taken. >> Yeah, and there are two big issues there. One issue is the inter-offer ability of the platform and the ability for you to not only consume data in real time from multiple sources, but to push that out to a variety of platforms in real time. That's one thing. The other thing is to understand that in this world that we're talking about today, there are multiple personas that want to consume that data, and many of them are not data scientists. They're not IT people, they're business people. They could be executives, or they could be field operatives in the case of intelligence. So you need to be able to push this data out in real time onto platforms that they consume, whether it's via mobile devices or any other device for that matter. >> But you also have to be able to build applications on it, right? >> Yeah, absolutely. >> So how does Kinetica facilitate that process? Because it looks more like a database, which is, which is, it's more than that, but it satisfies some of those conventions so developers have an afinity for it. >> Absolutely, so in the first instance, we provide tools ourselves for people to consume that data and to leverage the power of that data in real time in an incredibly visual way with a geospatial platform. But we also create the ability for a, to interface with really commonly used tools, because the whole idea, if you think about providing some sort of ubiquitous access to the platform, the easiest way to do that is to provide that through tools that people are used to using, whether that's something like Tablo, for example, or Esri, if you want to talk about geospatial data. So the first instance, it's actually providing access, in real time, through platforms that people are used to using. And then, of course, by building our technology in a really, really open framework with a broadly published set of APIs, we're able to support, not only the ability for our customers to build applications on that platform, and it could well be applications associated with autonomous vehicles. It could well be applications associated with Smart City. We're doing some incredible things with some of the bigger cities on the planet and leveraging the power of big data to optimize transportation, for example, in the city of London. It's those sorts of things that we're able to do with the platform. So it's not just about a database platform or an insights engine for dealing with these complex, vast amounts of data, but also the tools that allow you to visualize and utilize that data. >> Turn that data into an action. >> Yeah, because the data is useless until you're doing something with it. And that's really, if you think about the promise of things like smart grid. Collecting all of that data from all of those smart sensors is absolutely useless until you take an action that is meaningful for a consumer or meaningful in terms of the generational consumption of power. >> So Paul, as the CEO, when you're talking to customers, we talk about chief data officer, chief information officer, chief information security officer, there's a lot, data scientist engineers, there's just so many stakeholders that need access to the data. As businesses transform, there's new business models that can come into development if, like you were saying, the data is evaluated and it's meaningful. What are the conversations that you're having, I guess I'm curious, maybe, which personas are the table (Paul laughs) when you're talking about the business values that this technology can deliver? >> Yeah, that's a really, really good question because the truth is, there are multiple personas at the table. Now, we, in the technology industry, are quite often guilty of only talking to the technology personas. But as I've traveled around the world, whether I'm meeting with the world's biggest banks, the world's biggest Telco's, the world's biggest auto manufacturers, the people we meet, more often than not, are the business leaders. And they're looking for ways to solve complex problems. How do you bring the connected card alive? How do you really bring it to life? One car traveling around the city for a full day generates a terabyte of data. So what does that really mean when we start to connect the billions of cars that are in the marketplace in the framework of connected car, and then, ultimately, in a world of autonomous vehicles? So, for us, we're trying to navigate an interesting path. We're dragging the narrative out of just a technology-based narrative speeds and feeds, algorithms, and APIs, into a narrative about, well what does it mean for the pharmaceutical industry, for example? Because when you talk to pharmaceutical executives, the holy grail for the pharma industry is, how do we bring new and compelling medicines to market faster? Because the biggest challenge for them is the cycle times to bring new drugs to market. So we're helping companies like GSK shorten the cycle times to bring drugs to market. So they're the kinds of conversations that we're having. It's really about how we're taking data to power a transformational initiative in retail banking, in retail, in Telco, in pharma, rather than a conversation about the role of technology. Now, we always needs to deal with the technologists. We need to deal with the data scientists and the IT executives, and that's an important part of the conversation. But you would have seen, in recent times, the conversation that we're trying to have is far more of a business conversation. >> So if I can build on that. So do you think, in your experience, and recognizing that you have a data management tool with some other tools that helps people use the data that gets into Kinetica, are we going to see the population of data scientists increase fast enough so our executives don't have to become familiar with this new way of thinking, or are executives going to actually adopt some of these new ways of thinking about the problem from a data risk perspective? I know which way I think. >> Paul: Wow, >> Which way do you think? >> It's a loaded question, but I think if we're going to be in a world where business is powered by data, where our strategy is driven by data, our investment decisions are driven by data, and the new areas of business that we explored to creat new paths to value are driven by data, we have to make data more accessible. And if what you need to get access to the data is a whole team of data scientists, it kind of creates a barrier. I'm not knocking data scientists, but it does create a barrier. >> It limits the aperture. >> Absolutely, because every company I talk to says, "Our biggest challenge is, we can't get access to the data scientists that we need." So a big part of our strategy from the get go was to actually build a platform with all of these personas in mind, so it is built on this standard principle, the common principles of a relational database, that you're built around anti-standard sequel. >> Peter: It's recognizable. >> And it's recognizable, and consistent with the kinds of tools that executives have been using throughout their careers. >> Last question, we've got about 30 seconds left. >> Paul: Oh, okay. >> No pressure. >> You have said Kinetica's plan is to measure the success of the business by your customers' success. >> Absolutely. >> Where are you on that? >> We've begun that journey. I won't say we're there yet. We announced three weeks ago that we created a customer success organization. We've put about 30% of the company's resources into that customer success organization, and that entire team is measured not on revenue, not on project delivered on time, but on value delivered to the customer. So we baseline where the customer is at. We agree what we're looking to achieve with each customer, and we're measuring that team entirely against the delivery of those benefits to the customer. So it's a journey. We're on that journey, but we're committed to it. >> Exciting. Well, Paul, thank you so much for stopping by theCUBE for the first time. You're now a CUBE alumni. >> Oh, thank you, I've had a lot of fun. >> And we want to thank you for watching theCUBE. I'm Lisa Martin, live in San Jose, with Peter Burris. We are at the Forger Tasting Room and Eatery. Super cool place. Come on down, hang out with us today. We've got a cocktail party tonight. Well, you're sure to learn lots of insights from our experts, and tomorrow morning. But stick around, we'll be right back with our next guest after a short break. (CUBE theme music)
SUMMARY :
brought to you by Silicon Angle Media the CEO of Kinetica, Paul Appleby. We're excited to have you here, You know, in the past, our businesses so one of the ways that we think about it But now, the business has to care for, that she saw data as the raw material of the 21st century. And it's one of the misnomers of the notion But that means that the ... is that the value of their business wasn't the robots. and kind of the genesis of the technology. but the ability to stream all of those sources of data So you have to be able to get the data in of the platform and the ability for you So how does Kinetica facilitate that process? but also the tools that allow you to visualize Yeah, because the data is useless that need access to the data. is the cycle times to bring new drugs to market. and recognizing that you have a data management tool and the new areas of business So a big part of our strategy from the get go and consistent with the kinds of tools is to measure the success of the business the delivery of those benefits to the customer. for stopping by theCUBE for the first time. We are at the Forger Tasting Room and Eatery.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Paul | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Angela Merkel | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Kinetica | ORGANIZATION | 0.99+ |
Paul Appleby | PERSON | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
London | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Telco | ORGANIZATION | 0.99+ |
tomorrow morning | DATE | 0.99+ |
One issue | QUANTITY | 0.99+ |
US Army | ORGANIZATION | 0.99+ |
NSA | ORGANIZATION | 0.99+ |
21st century | DATE | 0.99+ |
Liquid Robotics | ORGANIZATION | 0.99+ |
tonight | DATE | 0.99+ |
first instance | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Bay Area | LOCATION | 0.99+ |
CUBE | ORGANIZATION | 0.99+ |
five | QUANTITY | 0.99+ |
two examples | QUANTITY | 0.99+ |
first day | QUANTITY | 0.99+ |
both places | QUANTITY | 0.99+ |
billions of cars | QUANTITY | 0.99+ |
GSK | ORGANIZATION | 0.98+ |
One car | QUANTITY | 0.98+ |
three weeks ago | DATE | 0.98+ |
each customer | QUANTITY | 0.98+ |
two big issues | QUANTITY | 0.98+ |
first time | QUANTITY | 0.97+ |
earlier this year | DATE | 0.97+ |
tenth | QUANTITY | 0.96+ |
Bowing | ORGANIZATION | 0.96+ |
Startup Data | EVENT | 0.96+ |
one | QUANTITY | 0.96+ |
Esri | TITLE | 0.95+ |
Big Data | EVENT | 0.94+ |
about 30 seconds | QUANTITY | 0.93+ |
about 30% | QUANTITY | 0.93+ |
Tablo | TITLE | 0.93+ |
World Economic Forum | EVENT | 0.92+ |
one thing | QUANTITY | 0.92+ |
theCUBE | ORGANIZATION | 0.88+ |
2018 | DATE | 0.87+ |
Big Data SV | EVENT | 0.84+ |
a terabyte | QUANTITY | 0.81+ |
one piece of data | QUANTITY | 0.77+ |
Forger Tasting Room and | ORGANIZATION | 0.73+ |
Big Data SV | ORGANIZATION | 0.72+ |
Eatery | LOCATION | 0.7+ |
Tasting | ORGANIZATION | 0.67+ |
Germany | LOCATION | 0.67+ |
data | QUANTITY | 0.65+ |
Forger | LOCATION | 0.65+ |
Room | LOCATION | 0.56+ |
CEO | PERSON | 0.55+ |
Kinetica | COMMERCIAL_ITEM | 0.45+ |
Eatery | ORGANIZATION | 0.43+ |
Scaldon | ORGANIZATION | 0.38+ |