David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
David Abercrombie	PERSON	0.99+
Michael Nixon	PERSON	0.99+
Michael	PERSON	0.99+
June	DATE	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Scala	TITLE	0.99+
first	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
five-minute	QUANTITY	0.99+
Snowflake	TITLE	0.99+
Christmas	EVENT	0.98+
Strata Data Conference	EVENT	0.98+
three-layer	QUANTITY	0.98+
first day	QUANTITY	0.98+
a dozen	QUANTITY	0.98+
two petabytes	QUANTITY	0.97+
Sharethrough	ORGANIZATION	0.97+
JSON	TITLE	0.97+
SQL	TITLE	0.96+
one place	QUANTITY	0.95+
six months	QUANTITY	0.94+
Forager Tasting Room & Eatery	ORGANIZATION	0.91+
today	DATE	0.89+
Snowflake	ORGANIZATION	0.87+
Spark	TITLE	0.87+
12 billion impressions a month	QUANTITY	0.87+
Machine Learning	TITLE	0.84+
Big Data	ORGANIZATION	0.84+
billions of impressions	QUANTITY	0.8+
CuBOL	TITLE	0.79+
Big Data SV 2018	EVENT	0.77+
once	QUANTITY	0.72+
theCUBE	ORGANIZATION	0.63+
JSONs	TITLE	0.61+
times	QUANTITY	0.55+

Ziya Ma, Intel | Big Data SV 2018

>> Live from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big data SV. I'm Lisa Martin with my co-host George Gilbert. We're down the street from the Strata Data Conference, hearing a lot of interesting insights on big data. Peeling back the layers, looking at opportunities, some of the challenges, barriers to overcome but also the plethora of opportunities that enterprises alike have that they can take advantage of. Our next guest is no stranger to theCUBE, she was just on with me a couple days ago at the Women in Data Science Conference. Please welcome back to theCUBE, Ziya Ma. Vice President of Software and Services Group and the Director of Big Data Technologies from Intel. Hi Ziya! >> Hi Lisa. >> Long time, no see. >> I know, it was just really two to three days ago. >> It was, well and now I can say happy International Women's Day. >> The same to you, Lisa. >> Thank you, it's great to have you here. So as I mentioned, we are down the street from the Strata Data Conference. You've been up there over the last couple days. What are some of the things that you're hearing with respect to big data? Trends, barriers, opportunities? >> Yeah, so first it's very exciting to be back at the conference again. The one biggest trend, or one topic that's hit really hard by many presenters, is the power of bringing the big data system and data science solutions together. You know, we're definitely seeing in the last few years the advancement of big data and advancement of data science or you know, machine learning, deep learning truly pushing forward business differentiation and improve our life quality. So that's definitely one of the biggest trends. Another thing I noticed is there was a lot of discussion on big data and data science getting deployed into the cloud. What are the learnings, what are the use cases? So I think that's another noticeable trend. And also, there were some presentations on doing the data science or having the business intelligence on the edge devices. That's another noticeable trend. And of course, there were discussion on security, privacy for data science and big data so that continued to be one of the topics. >> So we were talking earlier, 'cause there's so many concepts and products to get your arms around. If someone is looking at AI and machine learning on the back end, you know, we'll worry about edge intelligence some other time, but we know that Intel has the CPU with the Xeon and then this lower power one with Atom. There's the GPU, there's ASICs, FPGAS, and then there are these software layers you know, with higher abstraction layer, higher abstraction level. Help us put some of those pieces together for people who are like saying, okay, I know I've got a lot of data, I've got to train these sophisticated models, you know, explain this to me. >> Right, so Intel is a real solution provider for data science and big data. So at the hardware level, and George, as you mentioned, we offer a wide range of products from general purpose like Xeon to targeted silicon such as FPGA, Nervana, and other ASICs chips like Nervana. And also we provide adjacencies like networking the hardware, non-volatile memory and mobile. You know, those are the other adjacent products that we offer. Now on top of the hardware layer, we deliver fully optimized software solutions stack from libraries, frameworks, to tools and solutions. So that we can help engineers or developers to create AI solutions with greater ease and productivity. For instance, we deliver Intel optimized math kernel library. That leverage of the latest instruction set gives us significant performance boosts when you are running your software on Intel hardware. We also deliver framework like BigDL and for Spark and big data type of customers if they are looking for deep learning capabilities. We also optimize some popular open source deep learning frameworks like Caffe, like TensorFlow, MXNet, and a few others. So our goal is to provide all the necessary solutions so that at the end our customers can create the applications, the solutions that they really need to address their biggest pinpoints. >> Help us think about the maturity level now. Like, we know that the very most sophisticated internet service providers who are sort of all over this machine learning now for quite a few years. Banks, insurance companies, people who've had this. Statisticians and actuaries who have that sort of skillset are beginning to deploy some of these early production apps. Where are we in terms of getting this out to the mainstream? What are some of the things that have to happen? >> To get it to mainstream, there are so many things we could do. First I think we will continue to see the wide range of silicon products but then there are a few things Intel is pushing. For example, we're developing this in Nervana, graph compiler that will encapsulate the hardware integration details and present a consistent API for developers to work with. And this is one thing that we hope that we can eventually help the developer community with. And also, we are collaborating with the end user. Like, from the enterprise segment. For example, we're working with the financial services industry, we're working with a manufacturing sector and also customers from the medical field. And online retailers, trying to help them to deliver or create the data science and analytics solutions on Intel-based hardware or Intel optimized software. So that's another thing that we do. And we're seeing actually very good progress in this area. Now we're also collaborating with many cloud service providers. For instance, we work with some of the top seven cloud service providers, both in the U.S. and also in China to democratize the, not only our hardware, but also our libraries and tools, BigDL, MKL, and other frameworks and libraries so that our customers, including individuals and businesses, can easily access to those building blocks from the cloud. So definitely we're working from different factors. >> So last question in the last couple of minutes. Let's kind of vibe on this collaboration theme. Tell us a little bit about the collaboration that you're having with, you mentioned customers in some highly regulated industries, for as an example. But a little bit to understand what's that symbiosis? What is Intel learning from your customers that's driving Intel's innovation of your technologies and big data? >> That's an excellent question. So Lisa, maybe I can start my sharing a couple of customer use cases. What kind of a solution that we help our customer to address. I think it's always wise not to start a conversation with the customer on technology that you deliver. You want to understand the customer's needs first. And then so that you can provide a solution that really address their biggest pinpoint rather than simply selling technology. So for example, we have worked with an online retailer to better understand their customers' shopping behavior and to assess their customers' preferences and interests. And based upon that analysis, the online retailer made different product recommendations and maximized its customers' purchase potential. And it drove up the retailer's sales. You know, that's one type of use case that we have worked. We also have partnered with the customers from the medical field. Actually, today at the Strata Conference we actually had somebody highlighting, we had a joint presentation with UCSF where we helped the medical center to automate the diagnosis and grading of meniscus lesions. And so today actually, that's all done manually by the radiologist but now that entire process is automated. The result is much more accurate, much more consistent, and much more timely. Because you don't have to wait for the availability of a radiologist to read all the 3D MRI images. And that can all be done by machines. You know, so those are the areas that we work with our customers, understand their business need, and give them the solution they are looking for. >> Wow, the impact there. I wish we had more time to dive into some of those examples. But we thank you so much, Ziya, for stopping by twice in one week to theCUBE and sharing your insights. And we look forward to having you back on the show in the near future. >> Thanks, so thanks Lisa, thanks George for having me. >> And for my co-host George Gilbert, I'm Lisa Martin. We are live at Big Data SV in San Jose. Come down, join us for the rest of the afternoon. We're at this cool place called Forager Tasting and Eatery. We will be right back with our next guest after a short break. (electronic outro music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media some of the challenges, barriers to overcome What are some of the things that you're So that's definitely one of the biggest trends. on the back end, So at the hardware level, and George, as you mentioned, What are some of the things that have to happen? and also customers from the medical field. So last question in the last couple of minutes. customers from the medical field. And we look forward to having you We will be right back with our

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
UCSF	ORGANIZATION	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
San Jose	LOCATION	0.99+
China	LOCATION	0.99+
Ziya Ma	PERSON	0.99+
U.S.	LOCATION	0.99+
International Women's Day	EVENT	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Ziya	PERSON	0.99+
one week	QUANTITY	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
First	QUANTITY	0.99+
Strata Data Conference	EVENT	0.99+
one topic	QUANTITY	0.98+
Spark	TITLE	0.98+
both	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
one thing	QUANTITY	0.98+
three days ago	DATE	0.98+
Women in Data Science Conference	EVENT	0.97+
Strata Conference	EVENT	0.96+
first	QUANTITY	0.96+
BigDL	TITLE	0.96+
TensorFlow	TITLE	0.96+
one type	QUANTITY	0.95+
two	DATE	0.94+
MXNet	TITLE	0.94+
Caffe	TITLE	0.92+
theCUBE	ORGANIZATION	0.91+
one	QUANTITY	0.9+
Software and Services Group	ORGANIZATION	0.9+
Forager Tasting and Eatery	ORGANIZATION	0.88+
Vice President	PERSON	0.86+
Big Data Technologies	ORGANIZATION	0.84+
seven cloud service providers	QUANTITY	0.81+
last couple days	DATE	0.81+
Atom	COMMERCIAL_ITEM	0.76+
Silicon Valley	LOCATION	0.76+
Big Data SV 2018	EVENT	0.74+
a couple days ago	DATE	0.72+
Big Data SV	ORGANIZATION	0.7+
Xeon	COMMERCIAL_ITEM	0.7+
Nervana	ORGANIZATION	0.68+
Big Data	EVENT	0.62+
last	DATE	0.56+
data	EVENT	0.54+
case	QUANTITY	0.52+
3D	QUANTITY	0.48+
couple	QUANTITY	0.47+
years	DATE	0.47+
Nervana	TITLE	0.45+
Big	ORGANIZATION	0.32+

Blaine Mathieu, VANTIQ | Big Data SV 2018

>> Announcer: Live from San Jose, it's The Cube, presenting Big Data, Silicon Valley. Brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to The Cube. Our continuing coverage of our event, Big Data SV continues. I am Lisa Martin joined by Peter Burris. We're in downtown San Jose at a really cool place called Forager Tasting and Eatery. Come down, hang out with us today as we have continued conversations around all things big data, everything in between. This is our second day here and we're excited to welcome to The Cube the CMO of VANTIQ, Blaine Mathieu. Blaine, great to meet you, great to have you on the program. >> Great to be here, thanks for inviting me. >> So, VANTIQ, you guys are up the street in Walnut Creek. What do you guys do, what are you about, what makes VANTIQ different? >> Well, in a nutshell, VANTIQ is a so called high productivity application development platform to allow developers to build, deploy, and manage so called event driven real time applications, the kind of applications that are critical for driving many of the digital transformation initiatives that enterprises are trying to get on top of these days. >> Digital trasformation, it's a term that can mean so many different things, but today, it's essential for companies to be able to compete, especially enterprise companies with newer companies that are more agile, more modern. But if we peel apart digital transformation, there's so many elements that are essential. How do you guys help companies, enterprises, say, evolve their application architectures that might currently not be able to support an actual transformation to a digital business? >> Well, I think that's a great question, thank you. I think the key to digital trasformation is really a lot around the concept of real time, okay. The reason Uber is disrupting or has disrupted the taxi industry is the old way of doing it was somebody called a taxi and then they waited 30 minutes for a taxi to show up and then they told the taxi where to go and hopefully they got there. Whereas, Uber, turned that into a real time business, right? You called, you pinged something on your phone. They knew your location. They knew the location of the driver. They matched those up, brought 'em together in real time. Already knew where to bring you to and ensured you had the right route and that location. All of this data flowing, all of these actions have been taken in real time. The same thing applies to a disruptor like Netflix, okay? In the old days, Blockbuster used to send you, you know, a leaflet in the mail telling you what the new movies are. Maybe it was personalized for you. Probably not. No, Netflix knows who you are instantly, gives you that information, again, in real time based on what you've done in the past and is able to give you, deliver the movie also, in real time pretty well. Every disruptor you look at around digital transformation is bringing a business or a process that was done slowly and impersonally to make it happen in real time. Unfortunately, enterprise applications and the architectures, as you said a second ago, that are being used in most applications today weren't designed to enable these real time use cases. A great example is sales force. So, a sales force is a pretty standard, what you'd call a request application. So, you make a request, a person, generally, makes a request of the system, system goes into a database, queries that database, find information and then returns it back to the user. And that whole process could take, you know, significant amounts of time, especially if the right data isn't in the database at the time and you have to go request it or find it or create it. A new type of application needs to be created that's not fundamentally database centric, but it's able to take these real time data streams coming in from devices, from people, from enterprise systems, process them in real time and then take an action. >> So, let's pretend I'm a CEO. >> Yeah. >> One of the key things you said, and I want you to explain it better, is event. What is event? What is an event and how does that translate into a digital business decision? >> This notion of complex event processing CEP has been around in technology for a long time and yet, it surprises me still a lot of folks we talk to, CEOs, have never heard of the concept. And, it's very simple really. An event is just something that happens in the context of business. That's as complex and as simple as it is. An event could be a machine increases in temperature by one degree, a car moves from one location to another location. It could be an enterprise system, like an ERP system, you know, approves a PO. It could be a person pressing a button on a mobile device. All of those, or it could be an IOT device putting off a signal about the state of a machine. Increasingly, we're getting a lot of events coming from IOT devices. So, really, any particular interesting business situation or a change in a situation that happens is an event And increasingly driven, as you know, by IOT, by augmented reality, by AI and machine learning, by autonomous vehicles, by all these new real time technologies are spinning off more and more events, streams of these events coming off in rapid fashion and we have to be able to do something about them. >> Let me take a crack at it and you tell me if I've got this right. That, historically, applications have been defined in terms of processes and so, in many respects, there was a very concrete, discreet, well established program, set of steps that were performed and then the transaction took place. And event, it seems to me is, yeah, we generally described it, but it changes in response to the data. >> Right, right. >> So, an event is kind of like an outside in driven by data. >> Right, right. >> System response, whereas, your traditional transaction processing is an inside out driven by a sequence of programmed steps, and that decision might have been made six years ago. So, the event is what's happening right now informed by data versus a transaction, traditional transaction is much more, what did we decide to do six years ago and it just gets sustained. Have I got that right? >> That's right. Absolutely right or six hours ago or even six minutes ago, which might seem wow, six minutes, that's pretty good, but take a use case for a field service agent trying to fix a machine or an air conditioner on top of a building. In today's world now, that air conditioner has hundreds of sensors that are putting off data about the state of that air conditioner in real time. A service tech has the ability to, while the machine is still putting off that data, be able to make repairs and changes and fixes, again, in the moment, see how that is changing the data coming off the machine, and then, continue to make the appropriate repairs in collaboration with a smart system or an application that's helping them. >> That's how identifying patterns about what the problem is, versus some of the old ways was where we had recipe of, you know, steps that you went through in the call center. >> Right, right. And the customer is getting more and more frustrated. >> They got their clipboard out and had the 52 steps they followed to see oh that didn't work, now the next step. No, data can help us do that much more efficiently and effectively if we're able to process it in real time. >> So, in many respects, what we're really talking about is an application world or a world looking forward where the applications, which historically have been very siloed, process driven, to a world where the application function is much more networked together and the application, the output of one application is having a significant impact through data on the performance of an application somewhere else. That seems like it's got the potential to be an extremely complex fabric. (laughing) So, do I wait until I figure all that out (laughing) and then I start building it? Or do I, I mean, how do I do it? Do I start small and create and grow into it? What's the best way for people to start working on this? >> Well, you're absolutely right. Building these complex, geeking out a little bit, you know, asynchronous, non-blocking, so called reactive applications, that's the concept that we've been using in computer science for some time, is very hard, frankly. Okay, it's much easier to build computing systems that process things step one, step, two, step three, in order, but if you have to build a system that is able to take real time inputs or changes at any point in the process at any time and go in a different direction, it's very complex. And, computer scientists have been writing applications like this for decades. It's possible to do, but that isn't possible to do at the speed that companies now want to transform themselves, right? By the time you spec out an application and spend two years writing it, your business competitors have already disrupted you. The requirements have already changed. You need to be much more rapid and agile. And so, the secret sauce to this whole thing is to be able to write these transformative applications or create them, not even write is actually the wrong word to use, to be able to create them. >> Generate them. >> Yeah, generate them in a way which is very fast, does not require a guru level developer and reactive Java or some super low level code that you'd have to use to otherwise do it, so that you can literally have business people help design the applications, conceptually build them almost in real time, get them out into the market, and then be able to modify them as you need to, you know, on the fly. >> If I can build on that for just one second. So, it used to be we had this thing called computer assisted software engineer. >> (laughs) Right, right. >> We were going to operate this very very high level language. It's kind of-- But then, we would use code and build a code and the two of them were separated and so the minute that we deployed, somebody would go off and maintain and the whole thing would break. >> Right, right. >> Do you have that problem? >> No, well, that's exactly right. So, the old, you know, the old, the previous way of doing it was about really modeling an application, maybe visually, drag and drop, but then fundamentally, you created a bunch of code and then your job, as you said after, was to maintain and deploy and manage. >> Try to sustain some connection back up to that beautiful visual model. >> And you probably didn't because that was too much. That was too much work, so forget about the model after that. Instead, what we're able to do these days is to build the applications visually, you know, really for the most part with either super low code or, in many cases, no code because we have the ability to abstract away a lot of the complexity, a lot of the complex code that you'd have to write, we can represent that, okay, with these logical abstractions, create the applications themselves, and then continue to maintain, add to, modify the application using the exact same structure. You're not now stuck on, now you're stuck with 20,000 lines of code that you have to, that you have to edit. You're continuing to run and maintain the application just the way you built it, okay. We've now got to the place in computer science where we can actually do these things. We couldn't do them, you know, 20 years ago with case, but we can absolutely do them now. >> So, I'm hearing from a customer internal perspective a lot of operational efficiencies that VANTIQ can drive. Let's look now from a customer's perspective. What are the business impacts you're able to make? You mentioned the word reactive a minute ago when you were talking about applications, but do you have an example where you've, VANTIQ, has enabled a customer, a business, to be more, to be proactive and be able to identify through, you know, complex event processing, what their customers are doing to be able to deliver relevant messages and really drive revenue, drive profit? >> Right, right. So many, you know, so many great examples. And, I mentioned field service a few minutes ago. I've got a lot of clients in that doing this real time field service using these event processing applications. One that I want to bring up right now is one of the largest global shoe manufacturers, actually, that's a client of VANTIQ. I, unfortunately, can't say the name right now 'cause they want to keep what they're doing under wraps, but we all definitely know the company. And they're using this to manage the security, primarily, around their real time global supply chain. So, they've got a big challenge with companies in different countries redirecting shipments of their shoes, selling them on the gray market, at different prices than what are allowed in different regions of the world. And so, through both sensorizing the packages, the barcode scanning, the enterprise systems bringing all that data together in real time, they can literally tell in the moment is something is be-- If a package is redirected to the wrong region or if literally a shoe or a box of shoes is being sold where it shouldn't be sold at the wrong price. They used to get a monthly report on the activities and then they would go and investigate what happened last month. Now, their fraud detection manager is literally sitting there getting this in real time, saying, oh, Singapore sold a pallet of shoes that they should not have been able to sell five minute ago. Call up the guy in Singapore and have him go down and see what's going on and fix that issue. That's pretty powerful when you think about it. >> Definitely, so like reduction in fraud or increase in fraud detection. Sounds like, too, there's a potential for a significant amount of cost savings to the business, not just meeting the external customer needs, but from a, from a cost perspective reduction. Not just some probably TCO, but in operational expenses. >> For sure, although, I would say most of the digital transformation initiatives, when we talk to CEOs and CIOs, they're not focused as much on cost savings, as they're focused on A, avoiding being disrupted by the next interesting startup, B, creating new lines of business, new revenue streams, finding out a way to do something differently dramatically better than they're currently doing it. It's not only about optimizing or squeezing some cost out of their current application. This thing that we are talking about, I guess you could say it's an improvement on their current process, but really, it's actually something they just weren't even really doing before. Just a total different way of doing fraud detection and managing their global supply chain that they just fundamentally weren't even doing. And now, of course, they're looking at many other use cases across the company, not just in supply chain, but, you know, smart manufacturing, so many use cases. Your point about savings, though, there's, you know, what value does the application itself bring? Then, there's the question of what does it cost to build and maintain and deploy the application itself, right? And, again, with these new visual development tools, they're not modeling tools, you're literally developing the application visually. You know, I've been in so many scenarios where we talked to large enterprises. You know, we talk about what we're doing, like we talk about right now, and they say, okay, we'd love to do a POC, proof of concept. We want to allocate six months for this POC, like normally you would probably do for building most enterprise applications. And, we inevitably say, well, how about Friday? How about we have the POC done by Friday? And, you know, we get the Germans laugh, you know, laugh uncomfortably and we go away and deliver the POC by Friday because of how much different it is to build applications this way versus writing low level Java or C-sharp code and sticking together a bunch of technologies and tools 'cause we abstract all that away. And, you know, the eyes drop open and the mouth drops open and it's incredible what modern technology can do to radically change how software is being developed. >> Wow, big impact in a short period of time. That's always a nice thing to be able to deliver. >> It is, it is to-- It's great to be able to surprise people like that. >> Exactly, exactly. Well, Blaine, thank you so much for stopping by, sharing what VANTIQ is doing to help companies be disruptive and for sharing those great customer examples. We appreciate your time. >> You're welcome. Appreciate the time. >> And for my co-host, Peter Burris, I'm Lisa Martin. You're watching The Cube's continuing coverage of our event, Big Data SV Live from San Jose, down the street from the Strata Data Conference. Stick around, we'll be right back with our next guest after a short breal. (techy music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by Silicon Angle Media the CMO of VANTIQ, Blaine Mathieu. So, VANTIQ, you guys are up the street in Walnut Creek. for driving many of the digital transformation that might currently not be able to support and the architectures, as you said a second ago, One of the key things you said, in the context of business. in response to the data. So, an event is kind of like an outside in So, the event is what's happening right now and changes and fixes, again, in the moment, of the old ways was where we had recipe of, you know, And the customer is getting more and more frustrated. they followed to see oh that didn't work, and the application, the output of one application And so, the secret sauce to this whole thing to modify them as you need to, you know, on the fly. So, it used to be we had this thing and so the minute that we deployed, So, the old, you know, the old, Try to sustain just the way you built it, okay. but do you have an example where you've, that they should not have been able to sell to the business, not just meeting and deliver the POC by Friday because to be able to deliver. It's great to be able to surprise people Well, Blaine, thank you so much for stopping by, Appreciate the time. down the street from the Strata Data Conference.

ENTITIES

Entity	Category	Confidence
Blaine	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Peter Burris	PERSON	0.99+
Singapore	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
VANTIQ	ORGANIZATION	0.99+
Blaine Mathieu	PERSON	0.99+
20,000 lines	QUANTITY	0.99+
30 minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
52 steps	QUANTITY	0.99+
Walnut Creek	LOCATION	0.99+
six months	QUANTITY	0.99+
Java	TITLE	0.99+
one degree	QUANTITY	0.99+
Friday	DATE	0.99+
second day	QUANTITY	0.99+
last month	DATE	0.99+
one second	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
six years ago	DATE	0.98+
both	QUANTITY	0.98+
Strata Data Conference	EVENT	0.98+
Big Data SV Live	EVENT	0.98+
One	QUANTITY	0.98+
The Cube	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
20 years ago	DATE	0.98+
Big Data SV 2018	EVENT	0.97+
six hours ago	DATE	0.97+
six minutes ago	DATE	0.97+
five minute ago	DATE	0.97+
a minute ago	DATE	0.96+
hundreds of sensors	QUANTITY	0.95+
The Cube	TITLE	0.94+
Blockbuster	ORGANIZATION	0.91+
few minutes ago	DATE	0.89+
step one	QUANTITY	0.89+
step three	QUANTITY	0.85+
Forager Tasting and Eatery	ORGANIZATION	0.85+
decades	QUANTITY	0.84+
six minutes	QUANTITY	0.84+
C	TITLE	0.83+
Big Data	ORGANIZATION	0.81+
one location	QUANTITY	0.78+
one application	QUANTITY	0.77+
second ago	DATE	0.71+
CEP	ORGANIZATION	0.53+
big	ORGANIZATION	0.52+
Germans	PERSON	0.51+
techy	ORGANIZATION	0.41+
Data	EVENT	0.31+

Maribel Lopez, Lopez Research | Big Data SV 2018

>> Narrator: Live, from San Jose. It's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconAngle Media, and its ecosystem partners. >> Welcome come back to theCUBE, we are live in San Jose, at our event, Big Data SV. I'm Lisa Martin. And we are down the street from the Strata Data Conference. We've had a great day so far, talking with a lot of folks from different companies that are all involved in the big data unraveling process. I'm excited to welcome back to theCUBE one of our extinguished alumni, Maribel Lopez; the founder and principal analyst at Lopez research. Welcome back to theCUBE. >> Thank you. I'm excited to be here. >> Yeah, so you've been, a startup conference started a couple days ago. What are some the trends and things that you're hearing that are really kind of top of mind for not just the customers that are attending, the companies that are creating or are trying to create solutions around this big data challenge and opportunity? >> Yeah absolutely, I mean I think we talked a lot about data in the years past. How do you gather the data? How do you store the data? How you might want to process the data? This year seems to be all about how do I make something interesting happen with the data? How do I make an intelligent inside? How do I cure prostate cancer? How do I make sure I can classify images? It's a really different show, and we've also changed some of the terminology a lot more in machine learning now, and artificial intelligence, and frankly a lot of discussion around ethics. So it's been very interesting. >> Data ethics you mean? >> Data ethics; how do we do privacy? How do we maintain the right level of data so that we don't have bias in our data? How do we get Diversity Inclusion going? Lots really interesting powerful human topics, not just about the data. >> I love that the human topics especially where you know AI and ML come into play. You talked, data diversity. Or bias that we were just at that women and data science conference a couple of days ago talking to a lot of female leaders in in data science, computer science, both in academia as well as in industry. And one of the interesting topics about the gender disparity, is the fact that that is limiting the analyses on data in terms of, there may be a few perspectives looking on it. So there's an inherent bias there. So that's one issue, and I'd like to get your thoughts on that. Another is with that thought, lack of thought diversity, I guess I would say going into analyzing the data, companies might be potentially limiting themselves on the types of products that they can create, how to monetize the data and actually drive new revenue streams. On the kind of thought diversity will start there. What are some of the things that you're hearing, and what are some of your recommendations for your clients on how to get some of that bias out of data analysis? >> Yes it's interesting. One is trying to find multiple sources of data. So there's data that you have and that you own. But there is a wide range of openly available data now. There's some challenges around making sure that that data is clean before you integrated with your data. But basically, diversifying your data sources with third party data is one big thing that we're talking about. In previous analytical generations, I think we talked a lot about how to have a hypothesis, and you were trying to prove a hypothesis. And now I think we're trying to be a little more open and looser, and not really lead the data where per se, but try to find the right patterns and correlations in the data. And then just awareness in general. Like we don't believe we're biased. But if we have data that's biased who gets put into the system. So we have to really be thoughtful about what we put into the system. So I think that those three things combined have really changed the way people are looking at it. And there's a lot of awareness now around that. Because we assume at some point, the machines might be making certain decisions for us. And we want to make sure that they have the best information to do that. And that they don't limit our opportunities as a society. >> Where are companies in terms of the clients that you see, culturally in terms of embracing the openness? 'Cause you're right! From a scientific scientific method perspective. People go into, I'm going to hypothesize this because I think I'm going to find this. And maybe wanting the data to say this. Where are companies, we'll say enterprises, in becoming culturally more open to not leading the data somewhere and bringing up bias? >> Well, there are two interesting things here, right? I think there are some people that have gone down the data route for a while now, sort of the industry leading companies. They're in this mindset now trying to make sure they don't leave the data, they don't create biases in the data. They have ways to explain how the data and the analysis of the learning came about, not just for regulation, but so that they can make sure they ethically done the right thing. But then I think there's the other 95 percent of companies that they're not even there yet. They don't know that this is a problem yet. So they're still dealing with the "I've got a pool in the data." "I've got to do something with it." They don't even know what they want to do with it let alone if it's biased or not. So we're not quite at the leading the witness point there with a lot of organizations. >> But that's something that you expect to see maybe down the road. >> I'm hoping we'll get ahead of it. I'm really hoping that we'll get ahead of it. >> It's a good positive outlook on it, yeah? >> I think that, I think because the real analysis of the data problem in a big machine learning, deep learning way is so new, and the people are actually out seeking guidance, that there is an opportunity to get ahead of it. The second thing that's happening is, people don't have data scientists, right? So they don't necessarily have the people that can code this. So what they're doing now, is they're depending on the vendor landscape to provide them with an entry level set of tools. So if you're Microsoft, if you're Google, if you're Amazon, you're trying very hard to make sure that you're giving tools that have the right ethics in them, and that can help kickstart people's Machine Learning efforts. So I think that's going to be a real win for us. And we talked a lot today at the Strata conference about how, oh you don't have enough images, you can't do that. Or you don't have enough data, you can't do that. Or you don't have enough data scientists. And some of what came back is that, some of the best and the brightest have coded some things that you can start to use to kickstart that will get you to a better place than you ever could have started with yourself. So that was pretty exciting, you know. Transfer learning as an example of taking you know, image node from Google and some algorithms, and using those to take your images and try to figure out if somebody has Alzheimer's or not. Encode things Alzheimer's or not characteristic. So, very cool stuff, very exciting and nice to see that we've got some minds working on this for us. >> Yeah, definitely. Where you're meeting with clients that don't have a data scientist, or chief analytics officer? Sounds like a lot of the technologies need to or some have built in sort of enablement for a difference data citizen within a company. If you talking to clients that don't have a data scientist or data science team, who are your constituents there? Where are companies that don't maybe have that skill gap? Who do they go to in their organization to start evaluating the data that they have to get to know what and start to understand what their potential is? >> Yeah, there's a couple of places people go. They go to their business decision analytics people. So the people that were working with their BI dashboards, for example. The second place they go is to the cloud computing guys, cuz we're hearing a lot about cloud computing and maybe I can buy some of the stuff from the cloud. I'm just going to roll up and get all my machine learning in the cloud, right? So we're not there yet. So the biggest thing that I talk to people about right now is, what are the realities around Machine Learning and AI? We've made tremendous progress but you know you read the newspaper, and something is going to get rid of your job, and AI's going to take over the world, and we're kind of far from that reality. First of all it's very dystopian and negative. But even if it weren't that, you know what you can do today, is not that. So there's a lot of stages in between. So the first thing is just trying to get people comfortable with. No you can't just buy one product, and throw in some data, and you've got everything you need. >> Right. >> We're not there yet. But we're getting closer. You can add some components, you can get some new information, you could do some new correlations. So just getting a reality and grounding of where we are, and that we have a lot of opportunity, and that it's moving very fast. that's the other thing. >> Right. >> IT leaders are used to all evaluated once a year, evaluated once every couple of years. These things are moving in monthly increments. Like really huge changes in product categories. So you kind of have to keep on top of it to make sure you know what's available to you. >> Right. And if they don't they miss out on not only the ability to monetize data streams, but essentially going out of business. Because somebody will come in may be more nimble and agile, and be able to do it faster. >> Yeah. And we already saw those with the digital native companies that started born in the cloud companies, we used to call them. Well, now, everybody can be using the cloud. So the question then is like what's the next wave of that? The next wave of that is around understanding how to use your data, understanding how to get third-party data, and being able to rapidly make decisions and change models based on that. >> One of the things that's interesting about big data is you know it was a big buzzword, and it seems to be becoming less of a buzzword now. Gartner even was saying I think the number was 85 percent of big data projects and I think that's more in tested environments fail. And I often say, "Failure in a lot of cases is not a bad effort." Because it spawns genesis of new products, new ideas, et cetera. But when you're talking with clients who go, alright, we've embraced Hadoop, we've got this big data lake, now it's turning really swampy. We don't know-- >> We've got lakes, we've got oceans, we've got ponds. Yeah. >> Right. What's the conversation there where you're helping a customer clean that swamp up, get broader visibility across their datasets and enable different lines of business. Not just you know, the BI folks or the cloud folks or IT. But marketing, logistics, sales. What's that conversation like to clean up the swamp and do more enablement for visibility? >> I think one of the things that we got really hung up on was, you know, creating a data ocean, right? We're going to bring everything all in one place, it's going to be this one massive data source. >> It sounded great. >> It's going to be awesome. And this is not the reality of the world, right? So I think the first thing in the cleaning up that we have to do, is being able to figure out what's the source of truth for any given dataset that somebody needs. So you see 15 salespeople walk in and they all have different versions of the data that shouldn't happen. >> Right. >> So we need to get to the point where they know where the source of truth is for that data. The second is sort of governance around the data. We spent a lot of time dumping the data but not a lot of time in terms of getting governance around who can access it, what they can do with it, for how long they could have access to it. Is it just internal? Is it internal and external? So I think that's the second thing around like harassing and haranguing the swamps, and the lakes and the ponds, right? And then assuming that you do that, I think the other thing is, You know, if you have a hammer everything looks like a nail. Well, in reality you know when you construct things you have nails, you have screws, you have bolts, right? And picking the right tool for the job is something that the IT leadership has to work with. And the only way that they get that right is to work very closely with the different lines of business so they can understand the problem. Because the business leader knows the problem, they don't know the solution. If you put them together which we've talked about forever, frankly. But now I think we're seeing more imperatives for those two to work closely together. And sometimes it's even driven by security, just to make sure that the data isn't leaking into other places or that it's secure and that they've met regulatory compliance. So we're in a much better space than we were two, three, five years ago cuz we're thinking about the real problems now. Not just how do you collect it, and how do you store it. But how do we actually make it an actionable manageable set of solutions. >> Exactly, and make it work for the business. Well Maribel, I wish we had more time, but thank you so much for stopping by theCUBE, sharing the insights that you've seen. Not just at a conference, but also with your clients. >> Thank you. >> We want to thank you for watching theCUBE. Again, I'm Lisa Martin, live from Big Data SV, in Downtown San Jose. Get involved in the conversation #BigDataSV. Come see us at the Forager Eatery & Tasting Room, and I'll be right back with our next guest. (upbeat music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconAngle Media, that are all involved in the big data unraveling process. I'm excited to be here. just the customers that are attending, a lot about data in the years past. so that we don't have bias in our data? and I'd like to get your thoughts on that. and looser, and not really lead the data where per se, that you see, culturally in terms of embracing the openness? and the analysis of the learning came about, But that's something that you expect to see I'm really hoping that we'll get ahead of it. and the brightest have coded some things that they have to get to know and maybe I can buy some of the stuff from the cloud. and that we have a lot of opportunity, to make sure you know and be able to do it faster. that started born in the cloud companies, and it seems to be becoming less of a buzzword now. we've got oceans, we've got ponds. What's that conversation like to clean up the swamp that we got really hung up on was, you know, So you see 15 salespeople walk in and they all have is something that the IT leadership has to work with. sharing the insights that you've seen. and I'll be right back with our next guest.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Maribel	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Maribel Lopez	PERSON	0.99+
San Jose	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
15 salespeople	QUANTITY	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
85 percent	QUANTITY	0.99+
95 percent	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
one issue	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
both	QUANTITY	0.98+
Strata Data Conference	EVENT	0.98+
Big Data SV	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
one product	QUANTITY	0.98+
first thing	QUANTITY	0.98+
three things	QUANTITY	0.97+
once a year	QUANTITY	0.97+
second	QUANTITY	0.96+
This year	DATE	0.96+
One	QUANTITY	0.96+
First	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.96+
Downtown San Jose	LOCATION	0.96+
Strata	EVENT	0.94+
two interesting things	QUANTITY	0.94+
five years ago	DATE	0.94+
Big Data	ORGANIZATION	0.9+
couple days ago	DATE	0.87+
couple of days ago	DATE	0.85+
once	QUANTITY	0.78+
#BigDataSV	ORGANIZATION	0.75+
one place	QUANTITY	0.75+
second place	QUANTITY	0.75+
every couple of years	QUANTITY	0.75+
Forager	LOCATION	0.7+
Data	ORGANIZATION	0.69+
Narrator: Live	TITLE	0.69+
wave	EVENT	0.68+
years past	DATE	0.66+
three	QUANTITY	0.66+
Alzheimer	OTHER	0.66+
Big	EVENT	0.65+
Hadoop	TITLE	0.64+
Big Data SV	EVENT	0.59+
Eatery & Tasting Room	ORGANIZATION	0.57+
Lopez Research	ORGANIZATION	0.55+
SV 2018	EVENT	0.54+
thing	QUANTITY	0.53+
Lopez	ORGANIZATION	0.49+

Dr. Tendu Yogurtcu, Syncsort | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE. Presenting data, Silicon Valley brought to you by Silicon Angle Media and it's ecosystem partners. >> Welcome back to theCUBE. We are live in San Jose at our event, Big Data SV. I'm Lisa Martin, my co-host is George Gilbert and we are down the street from the Strata Data Conference. We are at a really cool venue: Forager Eatery Tasting Room. Come down and join us, hang out with us, we've got a cocktail par-tay tonight. We also have an interesting briefing from our analysts on big data trends tomorrow morning. I want to welcome back to theCUBE now one of our CUBE VIP's and alumna Tendu Yogurtcu, the CTO at Syncsort, welcome back. >> Thank you. Hello Lisa, hi George, pleasure to be here. >> Yeah, it's our pleasure to have you back. So, what's going on at Syncsort, what are some of the big trends as CTO that you're seeing? >> In terms of the big trends that we are seeing, and Syncsort has grown a lot in the last 12 months, we actually doubled our revenue, it has been really an successful and organic growth path, and we have more than 7,000 customers now, so it's a great pool of customers that we are able to talk and see the trends and how they are trying to adapt to the digital disruption and make data as part of their core strategy. So data is no longer an enabler, and in all of the enterprise we are seeing data becoming the core strategy. This reflects in the four mega trends, they are all connected to enable business as well as operational analytics. Cloud is one, definitely. We are seeing more and more cloud adoption, even our financial services healthcare and banking customers are now, they have a couple of clusters running in the cloud, in public cloud, multiple workloads, hybrid seems to be the new standard, and it comes with also challenges. IT governance as well as date governance is a major challenge, and also scoping and planning for the workloads in the cloud continues to be a challenge, as well. Our general strategy for all of the product portfolio is to have our products following design wants and deploy any of our strategy. So whether it's a standalone environment on Linux or running on Hadoop or Spark, or running on Premise or in the Cloud, regardless of the Cloud provider, we are enabling the same education with no changes to run all of these environments, including hybrid. Then we are seeing the streaming trend, with the connected devices with the digital disruption and so much data being generated, being able to stream and process data on the age, with the Internet of things, and in order to address the use cases that Syncsort is focused on, we are really providing more on the Change Data Capture and near real-time and real-time data replication to the next generation analytics environments and big data environments. We launched last year our Change Data Capture, CDC, product offering with data integration, and we continue to strengthen that vision merger we had data replication, real-time data replication capabilities, and we are now seeing even Kafka database becoming a consumer of this data. Not just keeping the data lane fresh, but really publishing the changes from multiple, diverse set of sources and publishing into a Kafka database and making it available for applications and analytics in the data pipeline. So the third trend we are seeing is around data science, and if you noticed this morning's keynote was all about machine learning, artificial intelligence, deep learning, how to we make use of data science. And it was very interesting for me because we see everyone talking about the challenge of how do you prepare the data and how do you deliver the the trusted data for machine learning and artificial intelligence use and deep learning. Because if you are using bad data, and creating your models based on bad data, then the insights you get are also impacted. We definitely offer our products, both on the data integration and data quality side, to prepare the data, cleanse, match, and deliver the trusted data set for data scientists and make their life easier. Another area of focus for 2018 is can we also add supervised learning to this, because with the premium quality domain experts that we have now in Syncsort, we have a lot of domain experts in the field, we can infuse the machine learning algorithms and connect data profiling capabilities we have with the data quality capabilities recommending business rules for data scientists and helping them automate the mandate tasks with recommendations. And the last but not least trend is data governance, and data governance is almost a umbrella focus for everything we are doing at Syncsort because everything about the Cloud trend, the streaming, and the data science, and developing that next generation analytics environment for our customers depends on the data governance. It is, in fact, a business imperative, and the regulatory compliance use cases drives more importance today than governance. For example, General Data Protection Regulation in Europe, GDPR. >> Lisa: Just a few months away. >> Just a few months, May 2018, it is in the mind of every C-level executive. It's not just for European companies, but every enterprise has European data sourced in their environments. So compliance is a big driver of governance, and we look at governance in multiple aspects. Security and issuing data is available in a secure way is one aspect, and delivering the high quality data, cleansing, matching, the example Hilary Mason this morning gave in the keynote about half of what the context matters in terms of searches of her name was very interesting because you really want to deliver that high quality data in the enterprise, trust of data set, preparing that. Our Trillium Quality for big data, we launched Q4, that product is generally available now, and actually we are in production with very large deployment. So that's one area of focus. And the third area is how do you create visibility, the farm-to-table view of your data? >> Lisa: Yeah, that's the name of your talk! I love that. >> Yes, yes, thank you. So tomorrow I have a talk at 2:40, March 8th also, I'm so happy it's on the Women's Day that I'm talking-- >> Lisa: That's right, that's right! Get a farm-to-table view of your data is the name of your talk, track data lineage from source to analytics. Tell us a little bit more about that. >> It's all about creating more visibility, because for audit reasons, for understanding how many copies of my data is created, valued my data had been, and who accessed it, creating that visibility is very important. And the last couple of years, we saw everyone was focused on how do I create a data lake and make my data accessible, break the data silos, and liberate my data from multiple platforms, legacy platforms that the enterprise might have. Once that happened, everybody started worrying about how do I create consumable data set and how do I manage this data because data has been on the legacy platforms like Mainframe, IMBI series has been on relational data stores, it is in the Cloud, gravity of data originating in the Cloud is increasing, it's originating from mobile. Hadoop vendors like Hortonworks and Cloudera, they are creating visibility to what happens within the Hadoop framework. So we are deepening our integration with the Cloud Navigator, that was our announcement last week. We already have integration both with Hortonworks and Cloudera Navigator, this is one step further where we actually publish what happened to every single granular level of data at the field level with all of the transformations that data have been through outside of the cluster. So that visibility is now published to Navigator itself, we also publish it through the RESTful API, so governance is a very strong and critical initiative for all of the businesses. And we are playing into security aspect as well as data lineage and tracking aspect and the quality aspect. >> So this sounds like an extremely capable infrastructure service, so that it's trusted data. But can you sell that to an economic buyer alone, or do you go in in conjunction with anther solution like anti-money laundering for banks or, you know, what are the key things that they place enough value on that they would spend, you know, budget on it? >> Yes, absolutely. Usually the use cases might originate like anti-money laundering, which is very common, fraud detection, and it ties to getting a single view of an entity. Because in anti-money laundering, you want to understand the single view of your customer ultimately. So there is usually another solution that might be in the picture. We are providing the visibility of the data, as well as that single view of the entity, whether it's the customer view in this case or the product view in some of the use cases by delivering the matching capabilities and the cleansing capabilities, the duplication capabilities in addition to the accessing and integrating the data. >> When you go into a customer and, you know, recognizing that we still have tons of silos and we're realizing it's a lot harder to put everything in one repository, how do customers tell you they want to prioritize what they're bringing into the repository or even what do they want to work on that's continuously flowing in? >> So it depends on the business use case. And usually at the time that we are working with the customer, they selected that top priority use case. The risk here, and the anti-money laundering, or for insurance companies, we are seeing a trend, for example, building the data marketplace, as that tantalize data marketplace concept. So depending on the business case, many of our insurance customers in US, for example, they are creating the data marketplace and they are working with near real-time and microbatches. In Europe, Europe seems to be a bit ahead of the game in some cases, like Hadoop production was slow but certainly they went right into the streaming use cases. We are seeing more directly streaming and keeping it fresh and more utilization of the Kafka and messaging frameworks and database. >> And in that case, where they're sort of skipping the batch-oriented approach, how do they keep track of history? >> It's still, in most of the cases, microbatches, and the metadata is still associated with the data. So there is an analysis of the historical what happened to that data. The tools, like ours and the vendors coming to picture, to keep track, of that basically. >> So, in other words, by knowing what happened operationally to the data, that paints a picture of a history. >> Exactly, exactly. >> Interesting. >> And for the governance we usually also partner, for example, we partner with Collibra data platform, we partnered with ASG for creating that business rules and technical metadata and providing to the business users, not just to the IT data infrastructure, and on the Hadoop side we partner with Cloudera and Hortonworks very closely to complete that picture for the customer, because nobody is just interested in what happened to the data in Hadoop or in Mainframe or in my relational data warehouse, they are really trying to see what's happening on Premise, in the Cloud, multiple clusters, traditional environments, legacy systems, and trying to get that big picture view. >> So on that, enabling a business to have that, we'll say in marketing, 360 degree view of data, knowing that there's so much potential for data to be analyzed to drive business decisions that might open up new business models, new revenue streams, increase profit, what are you seeing as a CTO of Syncsort when you go in to meet with a customer, data silos, when you're talking to a Chief Data Officer, what's the cultural, I guess, not shift but really journey that they have to go on to start opening up other organizations of the business, to have access to data so they really have that broader, 360 degree view? What's that cultural challenge that they have to, journey that they have to go on? >> Yes, Chief Data Officers are actually very good partners for us, because usually Chief Data Officers are trying to break the silos of data and make sure that the data is liberated for the business use cases. Still most of the time the infrastructure and the cluster, whether it's the deployment in the Cloud versus on Premise, it's owned by the IT infrastructure. And the lines of business are really the consumers and the clients of that. CDO, in that sense, almost mitigates and connects to those line of businesses with the IT infrastructure with the same goals for the business, right? They have to worry about the compliance, they have to worry about creating multiple copies of data, they have to worry about the security of the data and availability of the data, so CDOs actually help. So we are actually very good partners with the CDOs in that sense, and we also usually have IT infrastructure owner in the room when we are talking with our customers because they have a big stake. They are like the gatekeepers of the data to make sure that it is accessed by the right... By the right folks in the business. >> Sounds like maybe they're in the role of like, good cop bad cop or maybe mediator. Well Tendu, I wish we had more time. Thanks so much for coming back to theCUBE and, like you said, you're speaking tomorrow at Strata Conference on International Women's Day: Get a farm-to-table view of your data. Love the title. >> Thank you. >> Good luck tomorrow, and we look forward to seeing you back on theCUBE. >> Thank you, I look forward to coming back and letting you know about more exciting both organic innovations and acquisitions. >> Alright, we look forward to that. We want to thank you for watching theCUBE, I'm Lisa Martin with my co-host George Gilbert. We are live at our event Big Data SV in San Jose. Come down and visit us, stick around, and we will be right back with our next guest after a short break. >> Tendu: Thank you. (upbeat music)

Published Date : Mar 7 2018

SUMMARY :

brought to you by Silicon Angle Media and we are down the street from the Strata Data Conference. Hello Lisa, hi George, pleasure to be here. Yeah, it's our pleasure to have you back. and in all of the enterprise we are seeing data and delivering the high quality data, Lisa: Yeah, that's the name of your talk! it's on the Women's Day that I'm talking-- is the name of your talk, track data lineage and make my data accessible, break the data silos, that they place enough value on that they would and the cleansing capabilities, the duplication So it depends on the business use case. It's still, in most of the cases, operationally to the data, that paints a picture And for the governance we usually also partner, and the cluster, whether it's the deployment Love the title. to seeing you back on theCUBE. and letting you know about more exciting and we will be right back with our next guest Tendu: Thank you.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George	PERSON	0.99+
May 2018	DATE	0.99+
George Gilbert	PERSON	0.99+
Syncsort	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
US	LOCATION	0.99+
Hilary Mason	PERSON	0.99+
San Jose	LOCATION	0.99+
ASG	ORGANIZATION	0.99+
2018	DATE	0.99+
Tendu	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
tomorrow	DATE	0.99+
Collibra	ORGANIZATION	0.99+
more than 7,000 customers	QUANTITY	0.99+
last week	DATE	0.99+
last year	DATE	0.99+
tomorrow morning	DATE	0.99+
one aspect	QUANTITY	0.99+
third area	QUANTITY	0.99+
Linux	TITLE	0.99+
Cloud Navigator	TITLE	0.99+
2:40	DATE	0.98+
Women's Day	EVENT	0.98+
Tendu Yogurtcu	PERSON	0.98+
GDPR	TITLE	0.98+
Spark	TITLE	0.97+
tonight	DATE	0.97+
Big Data SV	EVENT	0.97+
Kafka	TITLE	0.97+
International Women's Day	EVENT	0.97+
both	QUANTITY	0.97+
CDC	ORGANIZATION	0.96+
Navigator	TITLE	0.96+
Strata Data Conference	EVENT	0.96+
single view	QUANTITY	0.96+
Hadoop	TITLE	0.95+
third trend	QUANTITY	0.95+
one step	QUANTITY	0.95+
single view	QUANTITY	0.95+
Dr.	PERSON	0.94+
theCUBE	ORGANIZATION	0.94+
CUBE	ORGANIZATION	0.94+
this morning	DATE	0.94+
Cloud	TITLE	0.92+
last 12 months	DATE	0.91+
Change Data Capture	ORGANIZATION	0.9+
today	DATE	0.9+
European	OTHER	0.88+
last couple of years	DATE	0.88+
General Data Protection Regulation in Europe	TITLE	0.86+
Strata Conference	EVENT	0.84+
one	QUANTITY	0.83+
one repository	QUANTITY	0.83+
tons of silos	QUANTITY	0.82+
one area	QUANTITY	0.82+
Q4	DATE	0.82+
Big Data SV 2018	EVENT	0.81+
four mega trends	QUANTITY	0.76+
March 8th	DATE	0.76+

Matthew Baird, AtScale | Big Data SV 2018

>> Announcer: Live from San Jose. It's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and it's ecosystem partners. (techno music) >> Welcome back to theCUBE, our continuing coverage on day one of our event, Big Data SV. I'm Lisa Martin with George Gilbert. We are down the street from the Strata Data Conference. We've got a great, a lot of cool stuff going on. You can see the cool set behind me. We are at Forager Tasting Room & Eatery. Come down and join us, be in our audience today. We have a cocktail event tonight, who doesn't want to join that? And we have a nice presentation tomorrow morning of our Wikibon's 2018 Big Data Forecast and Review. Joining us next is Matthew Baird the co-founder of AtScale. Matthew, welcome to theCUBE. >> Thanks for having me. Fantastic venue, by the way. >> Isn't it cool? >> This is very cool. >> Yeah, it is. So, talking about Big Data, you know, Gardner says, "85% of Big Data projects have failed." I often say failure is not a bad F word, because it can spawn the genesis of a lot of great business opportunities. Data lakes were big a few years ago, turned into swamps. AtScale has this vision of Data Lake 2.0, what is that? >> So, you're right. There have been a lot of failures, there's no doubt about it. And you're also right that is how we evolve, and we're a Silicon Valley based company. We don't give up when faced with these things. It's just another way to not do something. So, what we've seen and what we've learned through our customers is they need to have a solution that is integrated with all the technologies that they've adopted in the enterprise. And it's really about, if you're going to make a data lake, you're going to have data on there that is the crown jewels of your business. How are you going to get that in the hands of your constituents, so that they can analyze it, and they can use it to make decisions? And how can we, furthermore, do that in a way that supplies governance and auditability on top of it, so that we aren't just sending data out into the ether and not knowing where it goes? We have a lot of customers in the insurance, health insurance space, and with financial customers that the data absolutely must be managed. I think one of the biggest changes is around that integration with the current technologies. There's a lot of movement into the Cloud. The new data lake is kind of focused more on these large data stores, where it was HDFS with Hadoop. Now it's S3, Google's object storage, and Azure ADLS. Those are the sorts of things that are backing the new data lake I believe. >> So if we take these, where the Data Lake Store didn't have to be something that's a open source HDFS implementation, it could even be through just through a HDSF API. >> Matthew: Yeah, absolutely. >> What are some of the, how should we think about the data sources and feeds, for this repository, and then what is it on top that we need to put to make the data more consumable? >> Yeah, that's a good point. S3, Google Object Storage, and Azure, they all have a characteristic of, they are large stores. You can store as much as you want. They generally on the Clouds, and in the open source on-prem software for landing the data exists, for streaming the data and landing it, but the important thing there is it's cost-effective. S3 is a cost-effective storage system. HDFS is a mostly cost-effective storage system. You have to manage it, so it has a slightly higher cost, but the advice has been, get it to the place you're going to store it. Store it in a unified format. You get a halo effect when you have a unified format, and I think the industry is coalescing around... I'd probably say ParK's in the lead right now, but once ParK can be read by, let's take Amazon for instance, can be read by Athena, can be read by Redshift Spectrum, it can be read by their EMR, now you have this halo effect where your data's always there, always available to be consumed by a tool or a technology that can then deliver it to your end users. >> So when we talk about ParK, we're talking about columnar serialization format, >> Matthew: Yes. but there's more on top of that that needs to be layered, so that you can, as we were talking about earlier, combine the experience of a data warehouse, and the curated >> Absolutely data access where there's guard rails, >> Matthew: Yes >> and it's simple, versus sort of the wild west, but where I capture everything in a data lake. How do you bring those two together? >> Well, specifically for AtScale, we allow you to integrate multiple data access tools in AtScale, and then we use the appropriate tool to access the data for the use case. So let me give you an example, in the Amazon case, Redshift is wonderful for accessing interactive data, which BI users want, right? They want fast queries, sub-second queries. They don't want to pay to have all the raw data necessarily stored in Redshift 'cause that's pretty expensive. So they have this Redshift spectrum, it's sitting in S3, that's cost effective. So when we go and we read raw data to build these summary tables, to deliver the data fast, we can read from Spectrum, we can put it all together, drop it into Redshift, a much smaller volume of data, so it has faster characteristics for being accessed. And it delivers it to the user that way. We do that in Hadoop when we access via Hive for building aggregate tables, but Spark or Impala, is a much faster interactive engine, so we use those. As I step back and look at this, I think the Data Lake 2.0, from a technical perspective is about abstraction, and abstraction's sort of what separates us from the animals, right? It's a concept where we can pack a lot of sophistication and complexity behind an interface that allows people to just do what they want to do. You don't know how, or maybe you do know how a car engine works, I don't really, kind of, a little bit, but I do know how to press the gas pedal and steer. >> Right. >> I don't need to know these things, and I think the Data Lake 2.0 is about, well I don't need to know how Century, or Ranger, or Atlas, or any of these technologies work. I need to know that they're there, and when I access data, they're going to be applied to that data, and they're going to deliver me the stuff that I have access to and that I can see. >> So a couple things, it sounded like I was hearing abstraction, and you said really that's kind of the key, that sounds like a differentiator for AtScale, is giving customers that abstraction they need. But I'm also curious from a data value perspective, you talked about in Redshift from an expense perspective. Do you also help customers gain abstraction by helping them evaluate value of data and where they ought to keep it, and then you give them access to it? Or is that something that they need to do, kind of bring to the table? >> We don't really care, necessarily, about the source of the data, as long as it can be expressed in a way that can be accessed by whatever engine it is. Lift and shift is an example. There's a big move to move from Teradata or from Netezza into a Cloud-based offering. People want to lift it and shift it. It's the easiest way to do this. Same table definitions, but that's not optimized necessarily for the underlying data store. Take BigQuery for example, BigQuery's an amazing piece of technology. I think there's nothing like it out there in the market today, but if you really want BigQuery to be cost-effective, and perform and scale up to concurrency of... one of our customers is going to roll out about 8,000 users on this. You have to do things in BigQuery that are BigQuery-friendly. The data structures, the way that you store the data, repeated values, those sorts of things need to be taken into consideration when you build your schema out for consumption. With AtScale they don't need to think about that, they don't need to worry about it, we do it for them. They drop the schema in the same way that it exists on their current technology, and then behind the scenes, what we're doing is we're looking at signals, we're looking at queries, we're looking at all the different ways that people access the data naturally, and then we restructure those summary tables using algorithms and statistics, and I think people would broadly call it ML type approaches, to build out something that answers those questions, and adapts over time to new questions, and new use cases. So it's really about, imagine you had the best data engineering team in the world, in a box, they're never tired, they never stop, and they're always interacting with what the customers really want, which is "Now I want to look at the data this way". >> It's sounds actually like what your talking about is you have a whole set of sources, and targets, and you understand how they operate, but why I say you, I mean your software. And so that you can take data from wherever it's coming in, and then you apply, if it's machine learning or whatever other capabilities to learn from the access methods, how to optimize that data for that engine. >> Matthew: Exactly. >> And then the end users have an optimal experience and it's almost like the data migration service that Amazon has, it's like, you give us your Postgres or Oracle database, and we'll migrate it to the cloud. It sounds like you add a lot of intelligence to that process for decision support workloads. >> Yes. >> And figure out, so now you're going to... It's not Postgres to Postgres, but it might be Teradata to Redshift, or S3, that's going to be accessed by Athena or Redshift, and then let's put that in the right format. >> I think you sort of hit something that we've noticed is very powerful, which is if you can set up, and we've done this with a number of customers, if you can set up at the abstraction layer that is AtScale, on your on-prem data, literally in, say hours, you can move it into the Cloud, obviously you have to write the detail to move it into the Cloud, but once it's in the Cloud you take the same AtScale instance, you re-point it at that new data source, and it works. We've done that with multiple customers, and it's fast and effective, and it let's you actually try out things that you may not have the agility to do before because there's differences in how the SQL dialects work, there's differences in, potentially, how the schema might be built. >> So a couple things I'm interested in, I'm hearing two A-words, that abstraction that we've talked about a number of times, you also mention adaptability. So when you're talking with customers, what are some of the key business outcomes they need to drive, where adaptability and abstraction are concerned, in terms of like cost reduction, revenue generation. What are some of those see-swee business objectives that AtScale can help companies achieve? >> So looking at, say, a customer, a large retailer on the East Coast, everybody knows the stores, they're everywhere, they sell hardware. they have a 20-terabyte cube that they use for day-to-day revenue analytics. So they do period over period analysis. When they're looking at stores, they're looking at things like, we just tried out a new marketing approach... I was talking to somebody there last week about how they have these special stores where they completely redo one area and just see how that works. They have to be able to look at those analytics, and they run those for a short amount of time. So if you're window for getting data, refreshing data, building cubes, which in the old world could take a week, you know my co-founder at Yahoo, he had a week and a half build time. That data is now two weeks old, maybe three weeks old. There might be bugs in it-- >> And the relevance might be, pshh... >> And the relevance goes down, or you can't react as fast. I've been at companies where... Speed is so important these days, and the new companies that are grasping data aggressively, putting it somewhere where they can make decisions on it on a day-to-day basis, they're winning. And they're spending... I was at a company that was spending three million dollars on pay-per-click data, a month. If you can't get data everyday, you're on the wrong campaigns, and everything goes off the rails, and you only learn about it a week later, that's 25% of your spend, right there, gone. >> So the biggest thing, sorry George, it really sounds to me like what AtScale can facilitate for probably customers in any industry is the ability to truly make data-driven business decisions that can really directly affect revenue and profit. >> Yes, and in an agile format. So, you can build-- >> That's the third A; agile, adaptability, abstraction. >> There ya go, the three A's. (Lisa laughs) We had the three V's, now we have the three A's. >> Yes. >> The fact that you're building a curated model, so in retail the calendars are complex. I'm sure everybody that uses Tableau is good at analyzing data, but they might not know what your rules are around your financial calendar, or around the hierarchies of your product. There's a lot of things that happen where you want an enterprise group of data modelers to build it, bless it, and roll it out, but then you're a user, and you say, wait, you forgot x, y, and z, I don't want to wait a week, I don't want to wait two weeks, three weeks, a month, maybe more. I want that data to be available in the model an hour later 'cause that's what I get with Tableau today. And that's where we've taken the two approaches of enterprise analytics and self-service, and tried to create a scenario where you get the best of both worlds. >> So, we know that an implication of what you're telling us is that insights are perishable, and latency is becoming more and more critical. How do you plan to work with streaming data where you've got a historical archive, but you've got fresh data coming in? But fresh could mean a variety of things. Tell us what some of those scenarios look like. >> Absolutely, I think there's two approaches to this problem, and I'm seeing both used in practice, and I'm not exactly sure, although I have some theories on which one's going to win. In one case, you are streaming everything into, sort of a... like I talked about, this data lake, S3, and you're putting it in a format like ParK, and then people are accessing it. The other way is access the data where it is. Maybe it's already in, this is a common BI scenario, you have a big data store, and then you have a dimensional data store, like Oracle has your customers, Hadoop has machine data about those customers accessing on their mobile devices or something. If there was some way to access those data without having to move the Oracle stuff into the big data store, that's a Federation story that I think we've talked about in the Bay Area for a long time, or around the world for a long time. I think we're getting closer to understanding how we can do that in practice, and have it be tenable. You don't move the big data around, you move the small data around. For data coming in from outside sources it's probably a little bit more difficult, but it is kind of a degenerate version of the same story. I would say that streaming is gaining a lot of momentum, and with what we do, we're always mapping, because of the governance piece that we've built into the product, we're always mapping where did the data come from, where did it land, and how did we use it to build summary tables. So if we build five summary tables, 'cause we're answering different types of questions, we still need to know that it goes back to this piece of data, which has these security constraints, and these audit requirements, and we always track it back to that, and we always apply those to our derived data. So when you're accessing this automatically ETLed summary tables, it just works the way it is. So I think that there are two ways that this is going to expand and I'm excited about Federation because I think the time has come. I'm also excited about streaming. I think they can serve two different use cases, and I don't actually know what the answer will be, because I've seen both in customers, it's some of the biggest customers we have. >> Well Matthew thank you so much for stopping by, and four A's, AtScale can facilitate abstraction, adaptability, and agility. >> Yes. Hashtag four A's. >> There we go. I don't even want credit for that. (laughs) >> Oh wow, I'm going to get five more followers, I know it! (George laughs) >> There ya go! >> We want to thank you for watching theCUBE, I am Lisa Martin, we are live in San Jose, at our event Big Data SV, I'm with George Gilbert. Stick around, we'll be back with our next guest after a short break. (techno music)

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media, We are down the street from the Strata Data Conference. Thanks for having me. because it can spawn the genesis that is the crown jewels of your business. So if we take these, that can then deliver it to your end users. and the curated and it's simple, versus sort of the wild west, And it delivers it to the user that way. and they're going to deliver me the stuff and then you give them access to it? The data structures, the way that you store the data, And so that you can take data and it's almost like the data migration service but it might be Teradata to Redshift, and it let's you actually try out things they need to drive, and just see how that works. And the relevance goes down, or you can't react as fast. is the ability to truly make data-driven business decisions Yes, and in an agile format. We had the three V's, now we have the three A's. where you get the best of both worlds. How do you plan to work with streaming data and then you have a dimensional data store, and four A's, AtScale can facilitate abstraction, Yes. I don't even want credit for that. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
Matthew	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Matthew Baird	PERSON	0.99+
George	PERSON	0.99+
San Jose	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
25%	QUANTITY	0.99+
Gardner	PERSON	0.99+
two approaches	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two weeks	QUANTITY	0.99+
Redshift	TITLE	0.99+
S3	TITLE	0.99+
three million dollars	QUANTITY	0.99+
two ways	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
one case	QUANTITY	0.99+
85%	QUANTITY	0.99+
last week	DATE	0.99+
a month	QUANTITY	0.99+
Century	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
a week	QUANTITY	0.99+
BigQuery	TITLE	0.99+
both	QUANTITY	0.99+
20-terabyte	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
a week and a half	QUANTITY	0.99+
a week later	DATE	0.99+
Data Lake 2.0	COMMERCIAL_ITEM	0.99+
two	QUANTITY	0.99+
tomorrow morning	DATE	0.99+
AtScale	ORGANIZATION	0.99+
Atlas	ORGANIZATION	0.99+
Bay Area	LOCATION	0.98+
Lisa	PERSON	0.98+
ParK	TITLE	0.98+
Tableau	TITLE	0.98+
five more followers	QUANTITY	0.98+
an hour later	DATE	0.98+
Ranger	ORGANIZATION	0.98+
Netezza	ORGANIZATION	0.98+
tonight	DATE	0.97+
today	DATE	0.97+
both worlds	QUANTITY	0.97+
about 8,000 users	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
Strata Data Conference	EVENT	0.97+
one	QUANTITY	0.97+
Big Data SV 2018	EVENT	0.97+
Teradata	ORGANIZATION	0.96+
AtScale	TITLE	0.96+
Big Data SV	EVENT	0.93+
East Coast	LOCATION	0.93+
Hadoop	TITLE	0.92+
two different use cases	QUANTITY	0.92+
day one	QUANTITY	0.91+
one area	QUANTITY	0.91+

Scott Gnau, Hortonworks | Big Data SV 2018

>> Narrator: Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to the Cube's continuing coverage of Big Data SV. >> This is out tenth Big Data event, our fifth year in San Jose. We are down the street from the Strata Data Conference. We invite you to come down and join us, come on down! We are at Forager Tasting Room & Eatery, super cool place. We've got a cocktail event tonight, and a endless briefing tomorrow morning. We are excited to welcome back to the Cube, Scott Gnau, the CTO of Hortonworks. Hey, Scott, welcome back. >> Thanks for having me, and I really love what you've done with the place. I think there's as much energy here as I've seen in the entire show. So, thanks for having me over. >> Yeah! >> We have done a pretty good thing to this place that we're renting for the day. So, thanks for stopping by and talking with George and I. So, February, Hortonworks announced some news about Hortonworks DataFlow. What was in that announcement? What does that do to help customers simplify data in motion? What industries is it going to be most impactful for? I'm thinking, you know, GDPR is a couple months away, kind of what's new there? >> Well, yeah, and there are a couple of topics in there, right? So, obviously, we're very committed to, which I think is one of our unique value propositions, is we're committed to really creating an easy to use data management platform, as it were, for the entire lifecycle of data, from one data created at the edge and as data are streaming from one place to another place, and, at rest, analytics get run, analytics get pushed back out to the edge. So, that entire lifecycle is really the footprint that we're looking at, and when you dig a level into that, obviously, the data in motion piece is usually important, and So I think one a the things that we've looked at is we don't want to be just a streaming engine or just a tool for creating pipes and data flows and so on. We really want to create that entire experience around what needs to happen for data that's moving, whether it be acquisition at the edge in a protected way with provenance and encryption, whether it be applying streaming analytics as the data are flowing and everywhere kind of in between, and so that's what HDF represents, and what we released in our latest release, which, to your point, was just a few weeks ago, is a way for our customers to go build their data in motion applications using a very simple drag and drop GUI interface. So, they don't have to understand all of the different animals in the zoo, and the different technologies that are in play. It's like, "I want to do this." Okay, here's a GUI tool, you can have all of the different operators that are represented by the different underlying technologies that we provide as Hortonworks DataFlow, and you can stream them together, and then, you can make those applications and test those applications. One of the biggest enhancements that we did, is we made it very easy then for once those things are built in a laptop environment or in a dev environment, to be published out to production or to be published out to other developers who might want to enhance them and so on. So, the idea is to make it consumable inside of an enterprise, and when you think about data in motion and IOT and all those use cases, it's not going to be one department, one organization, or one person that's doing it. It's going to be a team of people that are distributed just like the data and the sensors, and, so, being able to have that sharing capability is what we've enhanced in the experience. >> So, you were just saying, before we went live, that you're here having speed dates with customers. What are some of the things... >> It's a little bit more sincere than that, but yeah. >> (laughs) Isn't speed dating sincere? It's 2018, I'm not sure. (Scott laughs) What are some of the things that you're hearing from customers, and how is that helping to drive what's coming out from Hortonworks? >> So, the two things that I'm hearing right, number one, certainly, is that they really appreciate our approach to the entire lifecycle of data, because customers are really experiencing huge data volume increases and data just from everywhere, and it's no longer just from the ERP system inside the firewall. It's from third party, it's from Sensors, it's from mobile devices, and, so, they really do appreciate kind of the territory that we cover with the tools and technologies we bring to market, and, so, that's been very rewarding. Clearly, customers who are now well into this path, they're starting to think about, in this new world, data governance, and data governance, I just took all of the energy out of the room, governance, it sounds like, you know, hard. What I mean by data governance, really, is customers need to understand, with all of this diverse, connected data everywhere, in the cloud, on PRIM, then Sensors, third party, partners, is, frankly, they need a trail of breadcrumbs that say what is it, where'd it come from, who had access to it, and then, what did they do with it? If you start to piece that together, that's what they really need to understand, the data estate that belongs to them, so they can turn that into refined product, and, so, when you then segway in one of your earlier questions, that GDPR is, certainly, a triggering point where if it's like, okay, the penalties are huge, oh my God, it's a whole new set of regulations that I have to comply with, and when you think about that trail of breadcrumbs that I just described, that actually becomes a roadmap for compliance under regulations like GDPR, where if a European customer calls up and says, "Forget my data.", the only way that you can guarantee that you forgot that person's data, is to actually understand where it all is, and that requires proper governance, tools, and techniques, and, so, when I say governance, it's, really, not like, you know, the governor and the government, and all that. That's an aspect, but the real, important part is how do I keep all of that connectivity so that I can understand the landscape of data that I've got access to, and I'm hearing a lot of energy around that, and when you think about an IOT kind of world, distributed processing, multiple hybrid cloud footprints, data is just everywhere, and, so, the perimeter is no longer fixed, it's kind of variable, and being able to keep track of that is a very important thing for our customers. >> So, continuing on that theme, Scott. Data lakes seem to be the first major new repository we added after we had data warehouses and data marts, and it looked like the governance solutions were sort of around that perimeter of the data lake. Tell us, you were alluding to, sort of, how many more repositories, whether at rest or in motion, there are for data. Do we have to solve the governance problem end-to-end before we can build meaningful applications? >> So, I would argue personally, that governance is one of the most strategic things for us as an industry, collectively, to go solve in a universal way, and what I mean by that, is throughout my career, which is probably longer than I'd like to admit, in an EDW centric world, where things are somewhat easier in terms of the perimeter and where the data came from, data sources were much more controlled, typically ERP systems, owned wholly by a company. Even in that era, true data governance, meta data management, and that provenance was never really solved adequately. There were 300 different solutions, none of which really won. They were all different, non-compatible, and the problem was easier. In this new world, with connected data, the problem is infinitely more difficult to go solve, and, so, that same kind of approach of 300 different proprietary solutions I don't think is going to work. >> So, tell us, how does that approach have to change and who can make that change? >> So, one of the things, obviously, that we're driving is we're leveraging our position in the open community to try to use the community to create that common infrastructure, common set of APIs for meta data management, and, of course, we call that Apache Atlas, and we work with a lot of partners, some of whom are customers, some of whom are other vendors, even some of whom could be considered competitors, to try to drive an Apache open source kind of project to become that standard layer that's common into which vendors can bring their applications. So, now, if I have a common API for tracking meta data in that trail of breadcrumbs that's commonly understood, I can bring in an application that helps customers go develop the taxonomy of the rules that they want to implement, and, then, that helps visualize all of the other functionality, which is also extremely important, and that's where I think specialization comes into play, but having that common infrastructure, I think, is a really important thing, because that's going to enable data, data lakes, IOT to be trusted, and if it's not trusted, it's not going to be successful. >> Okay, there's a chicken and an egg there it sounds like, potentially. >> Am I the chicken or the egg? >> Well, you're the CTO. (Lisa laughs) >> Okay. >> The thing I was thinking of was, the broader the scope of trust that you're trying to achieve at first, the more difficult the problem, do you see customers wanting to pick off one high value application, not necessarily that's about managing what's in Atlas, in the meta data, so much as they want to do an IOT app and they'll implement some amount of governance to solve that app. In other words, which comes first? Do they have to do the end-to-end meta data management and governance, or do they pick a problem off first? >> In this case, I think it's chicken or egg. I mean, you could start from either point. I see customers who are implementing applications in the IOT space, and they're saying, "Hey, this requires a new way to think of governance, "so, I'm going to go and build that out, but I'm going to "think about it being pluggable into the next app." I also see a lot of customers, especially in highly regulated industries, and especially in highly regulated jurisdictions, who are stepping back and saying, "Forget the applications, this is a data opportunity, "and, so, I want to go solve my data fabric, "and I want to have some consistency across "that data fabric into which I can publish data "for specific applications and guarantee "that, wholistically, I am compliant "and that I'm sitting inside of our corporate mission "and all of those things." >> George: Okay. >> So, one of the things you mention, and we talk about this a lot, is the proliferation of data. It's so many, so many different sources, and companies have an opportunity, you had mentioned the phrase data opportunity, there is massive opportunity there, but you said, you know, from even a GDR perspective alone, I can't remove the data if I don't know where it is to the breadcrumbs. As a marketer, we use terms like get a 360 degree view of your customer. Is that actually really something that customers can achieve leveraging a data. Can they actually really get, say a retailer, a 360, a complete view of their customer? >> Alright, 358. >> That's pretty good! >> And we're getting there. (Lisa laughs) Yeah, I mean, obviously, the idea is to get a much broader view, and 360 is a marketing term. I'm not a marketing person, >> Yes. But it, certainly, creates a much broader view of highly personalized information that help you interact with your customer better, and, yes, we're seeing customers do that today and have great success with it and actually change and build new business models based on that capability, for sure. The folks who've done that have realized that in this new world, the way that that works is you have to have a lot of people have access to a lot of data, and that's scary, because that's not the way it used to be, right? >> Right. >> It used to be you go to the DBA and you ask for access, and then, your boss has to sign off and say it's what you asked for. In this world, you need to have access to all of it. So, when you think about this new governance capability where as part of the governance integrated with security, personalized information can be encrypted, it can be blurred out, but you still have access to the data to look at the relationships to be found in the data to build out those sophisticated models. So, that's where not only is it a new opportunity for governance just because the sources, the variety at the different landscape, but it's, ultimately, very much required, because if you're the CSO, you're not going to give access to the marketing team all of its customer data unless you understand that, right, but it has to be, "I'm just giving it to you, "and I know that it's automatically protected." versus, "I'm going to let you ask for it." to be successful. >> Right. >> I guess, following up on that, it sounds like what we were talking about, chicken or egg. Are you seeing an accelerating shift from where data is sort of collected, centrally, from applications, or, what we hear on Amazon, is the amount coming off the edge is accelerating. >> It is, and I think that that is a big drive to, frankly, faster clouded option, you know, the analytic space, particularly, has been a laggard in clouded option for many reasons, and we've talked about it previously, but one of the biggest reasons, obviously, is that data has gravity, data movement is expensive, and, so, now, when you think about where data is being created, where it lives, being further out on the edge, and may live its entire lifecycle in the cloud, you're seeing a reversal of gravity more towards cloud, and that, again, creates more opportunities in terms of driving a more varied perimeter and just keeping track of where all the assets are. Finally, I think it also leads to this notion of managing entire lifecycle of data. One of the implications of that is if data is not going to be centralized, it's going to live in different places, applications have to be portable to move to where the data exists. So, when I think about that landscape of creating ubiquitous data management within Hortonworks' portfolio, that's one of the big values that we can create for our customers. Not only can we be an on-ramp to their hybrid architecture, but as we become that on-ramp, we can also guarantee the portability of the applications that they've built out to those cloud footprints and, ultimately, even out to the edge. >> So, a quick question, then, to clarify on that, or drill down, would that mean you could see scenarios where Hortonworks is managing the distribution of models that do the inferencing on the edge, and you're collecting, bringing back the relevant data, however that's defined, to do the retraining of any models or recreation of new models. >> Absolutely, absolutely. That's one of the key things about the NiFi project in general and Hortonworks DataFlow, specifically, is the ability to selectively move data, and the selectivity can be based on analytic models as well. So, the easiest case to think about is self-driving cars. We all understand how that works, right? A self-driving car has cameras, and it's looking at things going on. It's making decisions, locally, based on models that have been delivered, and they have to be done locally, because of latency, right, but, selectively, hey, here's something that I saw as an image I didn't recognize. I need to send that up, so that it can be added to my lexicon of what images are and what action should be taken. So, of course, that's all very futuristic, but we understand how that works, but that has application in things that are very relevant today. Think about jet engines that have diagnostics running. Do I need to send that terabyte of data an hour over an expensive thing? No, but I have a model that runs locally that says, "Wow, this thing looks interesting. "Let me send a gigabyte now for immediate action." So, that decision making capability is extremely important. >> Well, Scott, thanks so much for taking some time to come chat with us once again on the Cube. We appreciate your insights. >> Appreciate it, time flies. This is great. >> Doesn't it? When you're having fun! >> Yeah. >> Alright, we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at Forager Tasting Room in downtown San Jose at our own event, Big Data SV. We'd love for you to come on down and join us tonight, today, tonight, and tomorrow. Stick around, we'll be right back with our next guest after a short break. (techno music) >> Narrator: Since the dawn of the cloud, the Cube

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media Welcome back to the Cube's We are down the street from the Strata Data Conference. as I've seen in the entire show. What does that do to help customers simplify data in motion? So, the idea is to make it consumable What are some of the things... It's a little bit more from customers, and how is that helping to drive what's that I have to comply with, and when you think and it looked like the governance solutions the problem is infinitely more difficult to go solve, So, one of the things, obviously, Okay, there's a chicken and an egg there it sounds like, Well, you're the CTO. of governance to solve that app. "so, I'm going to go and build that out, but I'm going to So, one of the things you mention, is to get a much broader view, that help you interact with your customer better, in the data to build out those sophisticated models. off the edge is accelerating. if data is not going to be centralized, of models that do the inferencing on the edge, is the ability to selectively move data, to come chat with us once again on the Cube. This is great. Alright, we want to thank you for watching the Cube.

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Scott Gnau	PERSON	0.99+
Lisa Martin	PERSON	0.99+
San Jose	LOCATION	0.99+
February	DATE	0.99+
360 degree	QUANTITY	0.99+
2018	DATE	0.99+
tomorrow	DATE	0.99+
358	OTHER	0.99+
GDPR	TITLE	0.99+
today	DATE	0.99+
tomorrow morning	DATE	0.99+
fifth year	QUANTITY	0.99+
tonight	DATE	0.99+
Lisa	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Hortonworks'	ORGANIZATION	0.99+
one department	QUANTITY	0.99+
one organization	QUANTITY	0.99+
two things	QUANTITY	0.99+
360	QUANTITY	0.98+
one person	QUANTITY	0.98+
one	QUANTITY	0.98+
Cube	ORGANIZATION	0.97+
Strata Data Conference	EVENT	0.96+
300 different solutions	QUANTITY	0.96+
an hour	QUANTITY	0.95+
One	QUANTITY	0.95+
tenth	QUANTITY	0.95+
300 different proprietary solutions	QUANTITY	0.95+
Big Data SV 2018	EVENT	0.93+
few weeks ago	DATE	0.92+
one data	QUANTITY	0.87+
Atlas	TITLE	0.86+
Hortonworks DataFlow	ORGANIZATION	0.85+
Big Data	EVENT	0.85+
Cube	COMMERCIAL_ITEM	0.84+
Silicon Valley	LOCATION	0.83+
European	OTHER	0.82+
DBA	ORGANIZATION	0.82+
Apache	TITLE	0.79+
Tasting	ORGANIZATION	0.76+
Apache	ORGANIZATION	0.73+
CTO	PERSON	0.72+
Sensors	ORGANIZATION	0.71+
downtown San Jose	LOCATION	0.7+
Forager Tasting Room	LOCATION	0.67+
SV	EVENT	0.66+
terabyte of data	QUANTITY	0.66+
NiFi	ORGANIZATION	0.64+
Forager	LOCATION	0.62+
Narrator:	TITLE	0.6+
Big Data	ORGANIZATION	0.55+
Room	LOCATION	0.52+
Eatery	ORGANIZATION	0.45+

Paul Appleby, Kinetica | Big Data SV 2018

>> Announcer: From San Jose, it's theCUBE. (upbeat music) Presenting Big Data, Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to theCUBE. We are live on our first day of coverage of our event, Big Data SV. This is our tenth Big Data event. We've done five here in Silicon Valley. We also do them in New York City in the fall. We have a great day of coverage. We're next to where the Startup Data conference is going on at Forger Tasting Room and Eatery. Come on down, be part of our audience. We also have a great party tonight where you can network with some of our experts and analysts. And tomorrow morning, we've got a breakfast briefing. I'm Lisa Martin with my co-host, Peter Burris, and we're excited to welcome to theCUBE for the first time the CEO of Kinetica, Paul Appleby. Hey Paul, welcome. >> Hey, thanks, it's great to be here. >> We're excited to have you here, and I saw something marketer, and terms, I grasp onto them. Kinetica is the insight engine for the extreme data economy. What is the extreme data economy, and what are you guys doing to drive insight from it? >> Wow, how do I put that in a snapshot? Let me share with you my thoughts on this because the fundamental principals around data have changed. You know, in the past, our businesses are really validated around data. We reported out how our business performed. We reported to our regulators. Over time, we drove insights from our data. But today, in this kind of extreme data world, in this world of digital business, our businesses need to be powered by data. >> So what are the, let me task this on you, so one of the ways that we think about it is that data has become an asset. >> Paul: Oh yeah. >> It's become an asset. But now, the business has to care for, has to define it, care for it, feed it, continue to invest in it, find new ways of using it. Is that kind of what you're suggesting companies to think about? >> Absolutely what we're saying. I mean, if you think about what Angela Merkel said at the World Economic Forum earlier this year, that she saw data as the raw material of the 21st century. And talking about about Germany fundamentally shifting from being an engineering, manufacturing centric economy to a data centric economy. So this is not just about data powering our businesses, this is about data powering our economies. >> So let me build on that if I may because I think it gets to what, in many respects Kinetica's Core Value proposition is. And that is, is that data is a different type of an asset. Most assets are characterized by, you apply it here, or you apply it there. You can't apply it in both places at the same time. And it's one of the misnomers of the notion of data as fuels. Because fuel is still an asset that has certain specificities, you can't apply it to multiple places. >> Absolutely. >> But data, you can, which means that you can copy it, you can share it. You can combine it in interesting ways. But that means that the ... to use data as an asset, especially given the velocity and the volume that we're talking about, you need new types of technologies that are capable of sustaining the quality of that data while making it possible to share it to all the different applications. Have I got that right? And what does Kinetica do in that regard? >> You absolutely nailed it because what you talked about is a shift from predictability associated with data, to unpredictability. We actually don't know the use cases that we're going to leverage for our data moving forward, but we understand how valuable an asset it is. And I'll give you two examples of that. There's a company here, based in the Bay Area, a really cool company called Liquid Robotics. And they build these autonomous aquatic robots. And they've carried a vast array of senses and now we're collecting data. And of course, that's hugely powerful to oil and gas exploration, to research, to shipping companies, etc. etc. etc. Even homeland security applications. But what they did, they were selling the robots, and what they realized over time is that the value of their business wasn't the robots. It was the data. And that one piece of data has a totally different meaning to a shipping company than it does to a fisheries companies. But they could sell that exact same piece of data to multiple companies. Now, of course, their business has grown on in Scaldon. I think they were acquired by Bowing. But what you're talking about is exactly where Kinetica sits. It's an engine that allows you to deal with the unpredictability of data. Not only the sources of data, but the uses of data, and enables you to do that in real time. >> So Kinetica's technology was actually developed to meet some intelligence needs of the US Army. My dad was a former army ranger airborne. So tell us a little bit about that and kind of the genesis of the technology. >> Yeah, it's a fascinating use case if you think about it, where we're all concerned, globally, about cyber threat. We're all concerned about terrorist threats. But how do you identity terrorist threats in real time? And the only way to do that is to actually consume vast amount of data, whether it's drone footage, or traffic cameras. Whether it's mobile phone data or social data. but the ability to stream all of those sources of data and conduct analytics on that in real time was, really, the genesis of this business. It was a research project with the army and the NSA that was aimed at identifying terrorist threats in real time. >> But at the same time, you not only have to be able to stream all the data in and do analytics on it, you also have to have interfaces and understandable approaches to acquiring the data, because I have a background, some background in that as well, to then be able to target the threat. So you have to be able to get the data in and analyze it, but also get it out to where it needs to be so an action can be taken. >> Yeah, and there are two big issues there. One issue is the inter-offer ability of the platform and the ability for you to not only consume data in real time from multiple sources, but to push that out to a variety of platforms in real time. That's one thing. The other thing is to understand that in this world that we're talking about today, there are multiple personas that want to consume that data, and many of them are not data scientists. They're not IT people, they're business people. They could be executives, or they could be field operatives in the case of intelligence. So you need to be able to push this data out in real time onto platforms that they consume, whether it's via mobile devices or any other device for that matter. >> But you also have to be able to build applications on it, right? >> Yeah, absolutely. >> So how does Kinetica facilitate that process? Because it looks more like a database, which is, which is, it's more than that, but it satisfies some of those conventions so developers have an afinity for it. >> Absolutely, so in the first instance, we provide tools ourselves for people to consume that data and to leverage the power of that data in real time in an incredibly visual way with a geospatial platform. But we also create the ability for a, to interface with really commonly used tools, because the whole idea, if you think about providing some sort of ubiquitous access to the platform, the easiest way to do that is to provide that through tools that people are used to using, whether that's something like Tablo, for example, or Esri, if you want to talk about geospatial data. So the first instance, it's actually providing access, in real time, through platforms that people are used to using. And then, of course, by building our technology in a really, really open framework with a broadly published set of APIs, we're able to support, not only the ability for our customers to build applications on that platform, and it could well be applications associated with autonomous vehicles. It could well be applications associated with Smart City. We're doing some incredible things with some of the bigger cities on the planet and leveraging the power of big data to optimize transportation, for example, in the city of London. It's those sorts of things that we're able to do with the platform. So it's not just about a database platform or an insights engine for dealing with these complex, vast amounts of data, but also the tools that allow you to visualize and utilize that data. >> Turn that data into an action. >> Yeah, because the data is useless until you're doing something with it. And that's really, if you think about the promise of things like smart grid. Collecting all of that data from all of those smart sensors is absolutely useless until you take an action that is meaningful for a consumer or meaningful in terms of the generational consumption of power. >> So Paul, as the CEO, when you're talking to customers, we talk about chief data officer, chief information officer, chief information security officer, there's a lot, data scientist engineers, there's just so many stakeholders that need access to the data. As businesses transform, there's new business models that can come into development if, like you were saying, the data is evaluated and it's meaningful. What are the conversations that you're having, I guess I'm curious, maybe, which personas are the table (Paul laughs) when you're talking about the business values that this technology can deliver? >> Yeah, that's a really, really good question because the truth is, there are multiple personas at the table. Now, we, in the technology industry, are quite often guilty of only talking to the technology personas. But as I've traveled around the world, whether I'm meeting with the world's biggest banks, the world's biggest Telco's, the world's biggest auto manufacturers, the people we meet, more often than not, are the business leaders. And they're looking for ways to solve complex problems. How do you bring the connected card alive? How do you really bring it to life? One car traveling around the city for a full day generates a terabyte of data. So what does that really mean when we start to connect the billions of cars that are in the marketplace in the framework of connected car, and then, ultimately, in a world of autonomous vehicles? So, for us, we're trying to navigate an interesting path. We're dragging the narrative out of just a technology-based narrative speeds and feeds, algorithms, and APIs, into a narrative about, well what does it mean for the pharmaceutical industry, for example? Because when you talk to pharmaceutical executives, the holy grail for the pharma industry is, how do we bring new and compelling medicines to market faster? Because the biggest challenge for them is the cycle times to bring new drugs to market. So we're helping companies like GSK shorten the cycle times to bring drugs to market. So they're the kinds of conversations that we're having. It's really about how we're taking data to power a transformational initiative in retail banking, in retail, in Telco, in pharma, rather than a conversation about the role of technology. Now, we always needs to deal with the technologists. We need to deal with the data scientists and the IT executives, and that's an important part of the conversation. But you would have seen, in recent times, the conversation that we're trying to have is far more of a business conversation. >> So if I can build on that. So do you think, in your experience, and recognizing that you have a data management tool with some other tools that helps people use the data that gets into Kinetica, are we going to see the population of data scientists increase fast enough so our executives don't have to become familiar with this new way of thinking, or are executives going to actually adopt some of these new ways of thinking about the problem from a data risk perspective? I know which way I think. >> Paul: Wow, >> Which way do you think? >> It's a loaded question, but I think if we're going to be in a world where business is powered by data, where our strategy is driven by data, our investment decisions are driven by data, and the new areas of business that we explored to creat new paths to value are driven by data, we have to make data more accessible. And if what you need to get access to the data is a whole team of data scientists, it kind of creates a barrier. I'm not knocking data scientists, but it does create a barrier. >> It limits the aperture. >> Absolutely, because every company I talk to says, "Our biggest challenge is, we can't get access to the data scientists that we need." So a big part of our strategy from the get go was to actually build a platform with all of these personas in mind, so it is built on this standard principle, the common principles of a relational database, that you're built around anti-standard sequel. >> Peter: It's recognizable. >> And it's recognizable, and consistent with the kinds of tools that executives have been using throughout their careers. >> Last question, we've got about 30 seconds left. >> Paul: Oh, okay. >> No pressure. >> You have said Kinetica's plan is to measure the success of the business by your customers' success. >> Absolutely. >> Where are you on that? >> We've begun that journey. I won't say we're there yet. We announced three weeks ago that we created a customer success organization. We've put about 30% of the company's resources into that customer success organization, and that entire team is measured not on revenue, not on project delivered on time, but on value delivered to the customer. So we baseline where the customer is at. We agree what we're looking to achieve with each customer, and we're measuring that team entirely against the delivery of those benefits to the customer. So it's a journey. We're on that journey, but we're committed to it. >> Exciting. Well, Paul, thank you so much for stopping by theCUBE for the first time. You're now a CUBE alumni. >> Oh, thank you, I've had a lot of fun. >> And we want to thank you for watching theCUBE. I'm Lisa Martin, live in San Jose, with Peter Burris. We are at the Forger Tasting Room and Eatery. Super cool place. Come on down, hang out with us today. We've got a cocktail party tonight. Well, you're sure to learn lots of insights from our experts, and tomorrow morning. But stick around, we'll be right back with our next guest after a short break. (CUBE theme music)

Published Date : Mar 7 2018

SUMMARY :

brought to you by Silicon Angle Media the CEO of Kinetica, Paul Appleby. We're excited to have you here, You know, in the past, our businesses so one of the ways that we think about it But now, the business has to care for, that she saw data as the raw material of the 21st century. And it's one of the misnomers of the notion But that means that the ... is that the value of their business wasn't the robots. and kind of the genesis of the technology. but the ability to stream all of those sources of data So you have to be able to get the data in of the platform and the ability for you So how does Kinetica facilitate that process? but also the tools that allow you to visualize Yeah, because the data is useless that need access to the data. is the cycle times to bring new drugs to market. and recognizing that you have a data management tool and the new areas of business So a big part of our strategy from the get go and consistent with the kinds of tools is to measure the success of the business the delivery of those benefits to the customer. for stopping by theCUBE for the first time. We are at the Forger Tasting Room and Eatery.

ENTITIES

Entity	Category	Confidence
Paul	PERSON	0.99+
Peter Burris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Peter	PERSON	0.99+
Angela Merkel	PERSON	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Kinetica	ORGANIZATION	0.99+
Paul Appleby	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
London	LOCATION	0.99+
New York City	LOCATION	0.99+
Telco	ORGANIZATION	0.99+
tomorrow morning	DATE	0.99+
One issue	QUANTITY	0.99+
US Army	ORGANIZATION	0.99+
NSA	ORGANIZATION	0.99+
21st century	DATE	0.99+
Liquid Robotics	ORGANIZATION	0.99+
tonight	DATE	0.99+
first instance	QUANTITY	0.99+
today	DATE	0.99+
Bay Area	LOCATION	0.99+
CUBE	ORGANIZATION	0.99+
five	QUANTITY	0.99+
two examples	QUANTITY	0.99+
first day	QUANTITY	0.99+
both places	QUANTITY	0.99+
billions of cars	QUANTITY	0.99+
GSK	ORGANIZATION	0.98+
One car	QUANTITY	0.98+
three weeks ago	DATE	0.98+
each customer	QUANTITY	0.98+
two big issues	QUANTITY	0.98+
first time	QUANTITY	0.97+
earlier this year	DATE	0.97+
tenth	QUANTITY	0.96+
Bowing	ORGANIZATION	0.96+
Startup Data	EVENT	0.96+
one	QUANTITY	0.96+
Esri	TITLE	0.95+
Big Data	EVENT	0.94+
about 30 seconds	QUANTITY	0.93+
about 30%	QUANTITY	0.93+
Tablo	TITLE	0.93+
World Economic Forum	EVENT	0.92+
one thing	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.88+
2018	DATE	0.87+
Big Data SV	EVENT	0.84+
a terabyte	QUANTITY	0.81+
one piece of data	QUANTITY	0.77+
Forger Tasting Room and	ORGANIZATION	0.73+
Big Data SV	ORGANIZATION	0.72+
Eatery	LOCATION	0.7+
Tasting	ORGANIZATION	0.67+
Germany	LOCATION	0.67+
data	QUANTITY	0.65+
Forger	LOCATION	0.65+
Room	LOCATION	0.56+
CEO	PERSON	0.55+
Kinetica	COMMERCIAL_ITEM	0.45+
Eatery	ORGANIZATION	0.43+
Scaldon	ORGANIZATION	0.38+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Forager Tasting and Eatery: