Peter Smails, ImanisData | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's the Cube. Covering Dataworks Summit 2018 brought to you by Hortonworks. (upbeat music) >> Welcome back to The Cube's live coverage of Dataworks here in San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Peter Smails. He is the vice president of marketing at Imanis Data. Thanks so much for coming on The Cube. >> Thanks for having me, glad to be here. >> So you've been in the data storage solution industry for a long time, but you're new to Imanis, what made you jump? What was it about Imanis? >> Yep, so very easy to answer that. It's a hot market. So essentially what Imanis all about is we're an enterprise data management company. So the reason I jumped here is because if I put it in market context, if I take a small step back, I put it in market context, here's what happening. You've got your traditional application world, right? On prem typically already a mas based applications, that's the old world. New world is everybody's moving to the microservices based applications for IOT, for customer 360, for customer analysis, whatever you want. They're building these new modern applications. They're building those applications not in traditional RDMS, they're building them on microservices based architectures built on top of FEDOOP, or built on sequel databases. Those applications, as they go mainstream, and they go into production environments, they require data management. They require backup. They require backup and recovery. They require disaster recovery. They require archiving, etc. They require the whole plethora of data management capabilities. Nobody's touching that market. It's a blue ocean. So, that's why I'm here. >> Imanis as you were saying is one of the greatest little company no one's ever heard of. You've been around five years. (laughter) >> No, the company is not new. So, the thing that's exciting as a marketeer, what's exciting is that we're not sort of out there just pitching our wears untested technology. We have blue chip, we're getting into customers that people would die to get into. Big, blue chip companies because we're addressing a problem that's materialist. They roll out these new applications, they've got to have data management solutions for them. The company's been around five years. And I've only been on about a month, but what that's resulted is that over the last five years what they've had the opportunity, it's an enterprise product. And you don't build an enterprise product overnight. So they've had the last five years to really gestate the platform, gestate the technology, prove it in real world scenarios. And now, the opportunity for us as as a company is we're doubling down from a marketing standpoint. We're doubling down from the sales infrastructure standpoint. So the timing's right to essentially put this thing on the map, make sure everybody does know exactly what we do. Because we're solving a real world problem. >> Your backup and restore but much more. When you lay out the broad set of enterprise data and management capabilities, the mana state currently supports in your product portfolio on where you're going, on how you're going in terms of evolving in what you offer. >> Yeah, that's great. I love that question. So, think of us as the platform itself is this highly scalable distributed architecture. Okay, so we scale on multiple, and I'll come directly to your question. We scale on a number of different ways. One is we're infinitely scalable just in terms of computational power. So we're built for big data by definition. Number two is we're very, we scale very well from a storage efficiency standpoint. So we can store very large volumes of storage, which is a requirement. We also scale very much for the use case standpoint. So we support use cases throughout the life cycle. The one that gets all sort of the attention is obviously backup recovery. Because you have to protect your data. But if I look at it from a life cycle standpoint, our number use case is Test Def. So a lot of these organizations building these new apps now they want to spin up subsets of their data, cause they're supporting things like CICD. Okay, so they want to be able to do rapid testing and such. >> Develop Dev Opps and stuff like that. >> Yeah, Dev Opps and so worth. So, they need Test Def. So we help them automate the process and orchestrate the process of Test Def. Supporting things like sampling. I may have a one petabyte dataset, I'm not going to do Test Def against that. I want to do 10 percent of that and spin that up, and I want to do some masking of personal, PII data. So we can do masking and sampling against that Sport Test Def. We do backup and recovery. We do disaster recovery. So some customers, particularly in the big data space, they may for now say, well, I have replica so for some of this data it's not permanent data, it's transient data, but I do care about DR. So, DR is a key use case. We also do archiving. So if you just think of data through the life cycle, we support all of those. The piece in terms of where we're going is that what's truly unique, in addition to everything I just mentioned, is that we're the only data management platform that's machine learning based. Okay, so machine learning gets a lot of attention, and all that type of stuff, but we're actually delivering machine learning and abled capabilities today, so. >> And we discussed this before this interview. There's a bit of an anomaly detection. How exactly are you using machine learning? What value does it provide to a enterprise data administrator? They have ML inside your tool. >> Inside our platform, Great question. Very specifically, the product we're delivering today essentially there's a capability in the product called threat sets. Okay, so the number one use cases I mentioned is backup and recovery. So within backup and recovery, threat sense, what it will do with no user intervention whatsoever, what it will do is it will analyze your backups, as they go forward. And what it will do is it will learn what a normal pattern looks like across like 50 different metrics. The details of which I couldn't give you right now. But essentially, a whole bunch of different metrics that we look at to establish this is what a normal baseline looks like for you or for you, kind of thing. Great, that's number one. Number two is then we look and constantly analyze is anything occurring that is knocking things outside of that? Creating an anomaly, does something fall outside of that, and when it does, we're notifying the administrators. You might want to look at this, something could've happened. So the value very specifically is around ransomware typically one of the ways you're going to detect ransomware is you will see an anomaly in your backup set, because your data set will change materially. So we will be able to tell you, >> Cause somebody's holding for ransom is what you're saying. >> Correct, so something's going to happen in your data pattern. >> You've lost data that should be there, or whatever it might be. >> Correct, it could be that you lost data. Your change rate went way up, or something. >> Yeah, gotcha. >> There's any number of things that could trigger it. And then we let the administrator know, it happened here. So today we don't then turn around and just automatically solve that. But your point about where we're going. We've already broken the ice on delivering machine learning and abled data management. >> That might indicate you want to check point your backups to like a few days before this was detected. So the least you have, you know what data is most likely missing, so yeah, I understand. >> Bingo, that's exactly right now where we're going with that. As you could imagine, having a machine learning power data management platform at our core, how many different ways we can go with that. When do I backup? What data do I backup? How do I create the optimal RTO and IRPO? From a storage management standpoint, when do I put what data wear? There's all kinds of the whole library science of data management. The future of data management is machine learning based. There's too much data. There's too much complexity for humans to just be able to, you need to bring machine learning into the equation to help you harness the power of your data. We've broken the ice, we've got a long way to go. But we've got the platform to start with. And we've already introduced the first use case around this. And you can imagine all the places we can take this going forward. >> Very exciting. >> So you were the company that's using machine learning right now. What in your opinion will separate the winners from the losers? >> In terms of vendors, or in terms of the customers? >> Well, in terms of both. >> Yeah, let me answer that two ways. So, let me answer it sort of the inward/outward versus how we are unique. We are very unique, and since we're infinitely scalable, We are a single pane of glass for all of your distributed systems. We are very unique in terms of our multi-staged data reduction. And we're the only vendor that's doing, from a technology differentiation standpoint, we're the only vendor that's doing machine learning based stuff. >> Multi-stage data reduction, I want to break that down. What does that actually mean in practice? >> Sure, so we get the question frequently. Is that compression or duplication or is there something else in there? >> There's a couple different things actually. So why does that matter? So a lot of customers will ask a question, well by definition, no sequel or redo based environments, it's all based on replica, so how to back things up. First of all, replication isn't backup. So that's lesson number one. Point in time backup is very different than replication. Replication replicates bad data just as quickly as it replicates good. When you back up these very large data sets, you have to be incredibly efficient in how you do that. What we do with multi-stage data reduction is one, we will do de duplication, we'll do variable length, de duplication, we will do compression, we will do erasure coding, but the other thing that we'll also do in there, is what we call a global de plication pool. So when we're de duping your data, we're actually de duping it against a very large data set. So there's value in, this is where size matters. So the larger the data set, your data's all secured. But the larger the size of the data that I'm actually storing, the higher percentage I could get of de duplication. Because I've got a higher pool to reduce against. So the net result is we're incredibly efficient when you're talking about petabyte scale data management. We're incredibly efficient to the tune of 10 X easily 10 X over traditional de duplication, and multi X over technologies that are more current, if you will. So back to your question about, we are confident that we have a very strong head start. Our opportunity now is we got to drive why we're here. Cause we got to drive awareness. We got to make sure everybody knows who we are and how we're unique and how we're different. And you guys are great. Love being on The Cube. From a customer standpoint, the customers are going to win, and this is sort of a cliche, but it's true, the customer's the best harness of their data. They're the ones that are going to win. They're going to be more competitive, they're going to be able to find ways to be differentiated. And the only way they're going to do that is they're make the appropriate investments in their data infrastructure, in their data lakes, in their data management tool, so that they can harness all that data. >> Where do you see the future of your Hortonworks partnership going? So Hortonworks is, so we support a broad ecosystem. So, Hortonworks is just as important as any of our other data source partners. So, we are where we see that enfolding, is they're going to, we play an important part in, we feel our value, let me put it that way. We feel our value in helping Hortonworks, is as more and more organizations go mainstream with these applications. These are not corner cases anymore. This is not sort of in the lab. This is like the real deal. This is mainstream enterprises running business critical applications. The value we bring is you're not going to rely on those platforms without an enterprise data management solution that delivers what we deliver. So our value there is we can go to market, too. There's all kinds of ways we can go to market together. But net and that our value there is that we provide a very important enterprise data management capability that's important for customers that are deploying in these business critical environments. >> Great. >> Very good, as more of the data gets persisted out at the edge devices and the Internet of things, and so forth, what are the challenges in terms of protecting that data, backup and restore, de duplication, and so forth, and to what extent is your company's Imanis data maybe addressing those kinds of more distributed data management requirements going forward? Do you see that on the rise? Are you hearing that from customers? They want to do more of that? More of an edge cloud environment? Or is that way too far in the future? >> I don't think it's way too far in the future, but I do think there's an inside out. So my position on that is that it's not that there isn't edge work going on. What I would contend is that the big problem right now from an enterprise mainstreaming standpoint, is more getting the house is order, just your core house in order, from you move from sort of a traditional four wall data center to a hybrid cloud environment. Maybe not quite as edge. Combination of how do I leverage on prem and the cloud, so to speak. And how do I get the core data lake and the case of Hortonworks, how do I get that big data lake sorted out? You're touching on, I think, a longer discussion, which is where is the analysis going on? Where is the data going to persist? You know, where do you do some of that computational work? So you get all this information out at the edge. Does all that information end up going into the data lake? So, do you move the storage to where the lake is? Do you start pushing some of the lake functionale out to the edge where you have to then start doing some of the, so it's a much more complicated discussion. >> I know we had this discussion over lunch. This may be outside your wheelhouse, but let me just ask it anyway. We've seen more at Wikibon, I cover AI and distributed training and distributed inference and things so the edges are capturing the data and for more and more, there's a trend to where they're performing local training of their models, their embedded models, from the data they capture. But quite often, edge devices don't have a ton of storage and they're not going to retain that long. But some of that data will need to be archived. Will need to be persisted in a way and managed as a core resource, so we see that kind of requirement maybe not now, but in a few years time distributed training in persistence of that data, protection of that data, becoming a mainstream enterprise requirement. Where AI and machine learning, the whole pipeline is a concern. That's like I said, that's probably outside you guys wheelhouse. That's probably outside the realm for your customers But that kind of thing is coming out, as the likes of Hortonworks and IBM and everybody else, is starting to look at it and implement it, containerization of analytics and data management out to all these micro devices. >> Yes, and I think you're right there. And to your point about the, we're kind of going where the data is, if you will in volumes, kind of thing. And it's going that direction. And frankly, where we see that happening is, that's where the cloud plays a big role as well, because there's edge, but how do you get to the edge? You can get to the edge through the cloud. So, again, we run on AWS. We run on GCP, we run on Asher. So, to be clear, in terms of the data we can rotect, we got a broad portfolio, broad ecosystem of adute based big data, data sources that we support as well as no sequel. If they're running on AWS or GCP or Asher, we support ADLS, we support Asher's data lake stuff, HD Inside, we support a whole bunch of different things both from a cloud standpoint as on prem. Which is where we're seeing some of that edge work happening. >> Great, well Peter thank you so much for coming on The Cube. It's always a pleasure to have you on. >> Yes, thanks for having me and I look forward to being back sometime soon. >> We'll have you. >> Thank you both. >> When the time is right. >> Indeed, we will have more from The Cube's live coverage of Dataworks just after this. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

of Silicon Valley, it's the Cube. He is the vice president of So the reason I jumped here is because is one of the greatest little company So the timing's right to essentially evolving in what you offer. and I'll come directly to your question. and orchestrate the process of Test Def. And we discussed this So the value very specifically ransom is what you're saying. to happen in your data pattern. You've lost data that should be there, be that you lost data. So today we don't then turn around So the least you have, you know the power of your data. So you were the company the inward/outward What does that actually mean in practice? Sure, so we get the They're the ones that are going to win. This is not sort of in the lab. Where is the data going to persist? from the data they capture. of the data we can rotect, It's always a pleasure to have you on. and I look forward to Indeed, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Peter Smails	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Imanis	ORGANIZATION	0.99+
10 percent	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose	LOCATION	0.99+
today	DATE	0.99+
San Jose, California	LOCATION	0.99+
50 different metrics	QUANTITY	0.99+
both	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
two ways	QUANTITY	0.99+
one	QUANTITY	0.99+
Test Def	TITLE	0.98+
about a month	QUANTITY	0.98+
Asher	ORGANIZATION	0.98+
Imanis Data	ORGANIZATION	0.97+
Wikibon	ORGANIZATION	0.97+
around five years	QUANTITY	0.96+
10 X	QUANTITY	0.95+
One	QUANTITY	0.94+
Dataworks Summit 2018	EVENT	0.94+
Dev Opps	TITLE	0.94+
DataWorks Summit 2018	EVENT	0.94+
one petabyte	QUANTITY	0.93+
The Cube	ORGANIZATION	0.93+
First	QUANTITY	0.92+
Imanis	PERSON	0.91+
ImanisData	ORGANIZATION	0.89+
single pane	QUANTITY	0.87+
Number two	QUANTITY	0.86+
FEDOOP	TITLE	0.84+
first use case	QUANTITY	0.81+
last five years	DATE	0.76+
GCP	TITLE	0.65+
number one	QUANTITY	0.62+
couple	QUANTITY	0.6+
Dataworks	ORGANIZATION	0.59+
CICD	TITLE	0.55+
HD Inside	ORGANIZATION	0.55+
days	DATE	0.55+
ADLS	ORGANIZATION	0.5+
Test	TITLE	0.47+
IOT	TITLE	0.34+
Cube	ORGANIZATION	0.27+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for HD Inside: