Jesse Rothstein, ExtraHop | AWS re:Invent 2019

>> Announcer: Live from Las Vegas, it's theCUBE. Covering AWS re:Invent 2019, brought to you by Amazon Web Services, and Intel, along with its ecosystem partners. >> Welcome back, this is theCUBE seventh year of coverage of the mega AWS re:Invent show, here in Las Vegas. Somewhere between 60 and 65,000, up and down the street. We are here in the Sands Convention Center. I am Stu Miniman, my cohost for this segment is Justin Warren. And happy to welcome back to the program, one of our CUBE alumni Jesse Rothstein, who is the co-founder and CTO of ExtraHop, Jesse, great to see you. >> Thank you for having me again. >> So, we caught up with you at AWS re:Inforce-- >> We did. >> Not that long ago, in Boston. Where, it rains more often in Boston than it does in Vegas and it's raining here in Vegas, which is a little odd. >> Strangely it is raining here in Vegas, but re:Inforce at the end of June in Boston was the first AWS security conference. Great energy, great size, we had a lot of fun at that show. >> Yeah, so Dave Vellante, who was one of the ones at re:Inforce, and he actually came out of the three-hour keynote yesterday with Andy Jassy and said, "I'm a little surprised there wasn't as much security talk." You know, it's not like we can remove security from the discussion of cloud, it is you know one of the top issues here. So I want to get your viewpoint, were we missing something? Is it just there, what grabbed you? >> I know this thing as well. I think, perhaps, they're saving some announcements for, you know, re:Inforce coming again in June in Houston this year. There was at least one announcement around IAM Access Analyzer as I recall. But generally the announcements seem to focus in some other areas. You know some big announcements around data warehousing, you know for federated red shift queries I think. And some big announcements around machine learning tooling, like the SageMaker Studio. But I noticed that as well, not as many security announcements. >> You never know, Werner still has his keynote tomorrow. So we're sure there'll still be another 50 or 100 announcements before the week is done. ExtraHop also has something new this week, so why don't we make sure-- >> Well first I can assure you that cloud security is not solved. It's not a solved problem, in fact, unfortunately despite record spend year after year after year, we still continue to see record numbers of compromises and data breaches that are published. I think cloud security in particular remains a challenge. There's a lot of energy there and I think a lot of attention, people recognize it's a problem. But we're dealing with massive cyber security skill shortages. It's very hard to find people with the expertise needed to really secure these workloads. We're dealing with more sophisticated attackers. I think in many cases, attackers with nation state sponsorship. Which is scary, you know five or 10 years ago we didn't see that quite as much. More cyber criminals, fewer nation states. And of course, we're seeing an ever increasing attack surface. So ExtraHop's right in the mix here, and we focus on network detection and response. I'm a huge believer in the power of network security, and I'll talk more about that. At re:Inforce last June, we announced ExtraHop Reveal(x) Cloud, which is a SaaS offering using AWS's recent VPC Traffic Mirroring capability. So the idea is, all you do is you mirror a copy of the traffic, using VPC Traffic Mirroring, to our SaaS, and then we provide all of the sophisticated detection, investigation and response capabilities, as a product. So that's hosted, you still do the work of investigating it, but you know we provide the entire offering around that. Very low TCO, very turnkey capabilities. And of course, it wouldn't be a modern day security offering if we didn't leverage very sophisticated machine learning, to detect suspicious behaviors and potential threats. But this is something I think we do better than anybody else in the world. >> So walk us through some of what the machine learning actually does. 'Cause I feel that the machine learning and AI is kind of hitting peak hype cycle maybe. >> You know I almost can't say it with a straight face because it's so overused. But, it is absolutely real, that's where the state of the art is. Machine learning allows us to recognize behaviors, and behaviors are very important because we're looking for post-breach behaviors and indicators of compromise. So there are a million ways that you can be breached. The attack surface is absolutely enormous. But there's actually a relatively small number, and a relatively tractable set of post-breach behaviors that attackers will do once you're compromised. And I think more and more organizations are realizing that it's a matter of when and not if. So what we've done is we've built the machine learning behavioral model so that we can detect these suspicious behaviors. In some cases we have an entire team of threat researchers that are simulating attacks, simulating pen testing tools, lateral movement, exfiltration so we can train our models on these behaviors. In some cases, we're looking for very specific indicators of compromise. But in just about all cases, this results in very high quality detections. And because just detections alone are completely insufficient, ExtraHop is built on top of an entire analytics platform, so that you're always one or two clicks away from being able to determine, is this something that requires immediate attention and requires kind of an incident response scenario? One of the capabilities that we announced here at this show, is automated response. So we integrate with the AWS API, so that we can automatically isolate and quarantine a workload that's behaving suspiciously. You know in cyber security, some attacks are low and slow but some are very fast and destructive. And for the fast and destructive ones, you move faster than a human's ability to respond, so we need that automated response. And we also announced a continuous packet capture capability for forensics, because sometimes you need the packets. >> That's a response, a lot of different things that we'd actually like to bring the capability a little bit earlier than that so that we don't actually get breached. It's great that we can detect it and say, great we've got the indication of compromise and we can react very, very quickly to that. Are you able to help us get one step ahead of the cyber crimes? >> So I'll actually be a little contrarian on that. I'm going to say that organizations have really been investing in protection and prevention, for the last decade or two. You know this strategy's called defense and depth, and you should do it, everybody should, that's a best practice. But, you know, with defense and depth, you have lots of layers of defense at the perimeters. You know keep the attackers out of the perimeter, gateways, firewalls, proxies. Lots of layers of defense at the end point, you know keep attackers off of my workstations, my instances, my laptops, things like that. But, you know, I think again, organizations have learned that attackers can fire, you know, 1,000 arrows, or 100,000 arrows, or 100 million arrows and only one needs to land. So the pendulum is really swung toward detection response. How do I know if I'm breached right now? How can I detect it quickly? The industry average dwell time is over three months, which is unacceptably long, and we always hear about cases in the news that are three years or more. And what I like to say is if it were three weeks, that would be too long. If it were three days, that would be too long, if it were three hours, I think you could do a lot of damage in three hours. If you can start getting this down to three minutes, well maybe, you know, we can limit the blast radius in three minutes. >> So Jesse, you brought up the ever growing surface area of attack and one of the big themes we've seen at the show is AWS is pushing the boundaries of where they touch customers. You know I said if Amazon is the everything store, AWS is becoming the everywhere cloud. Outposts, from Amazon's perspective, they said Outposts just extends their security models. I see and hear a lot of the ecosystem talking about how they're leveraging that and integrating with that. Does Outposts or any of their other Edge solutions impact what your customers and your solutions are doing? >> So it's funny you say that, I was wondering that myself. My expectation is that Outposts are a good thing because they the have same security controls that we expect to see in any AWS kind of VPC enabled environment. Where I haven't gotten full clarification is do we have the full capabilities that we expect with VPCs? In particular, you know VPC Traffic Mirroring, which is the capability that was announced at re:Inforce, that I'm so excited about, because it allows us to actually analyze and inspect that traffic. Another capability that I think slipped in under the radar but it was announced yesterday is VPC Ingress Routing. This doesn't really effect ExtraHop that much, but as a network head, I like seeing Amazon enable organizations to kind of make their own choices around how they want to inspect and control traffic. And with VPC Ingress Routing, it actually allows you to run in-line devices between your VPCs, which previously you were unable to do. So I think that one slipped in under the radar, maybe you have to be a network head like me to really appreciate it. But I'm seeing more flexibility and not less and that's something that I'm really pleased with. >> That one thing that we definitely see with cloud is that explosion of customer choice, and all of these different methods that are available. And Amazon just keeps pushing the boundaries on how quickly they can release new features. What does that mean for ExtraHop in being able to keep up with the pace of change that customers are using all of these different features? >> That's a good question, I think that's just the reality, so I don't think about what it means or doesn't mean, that's just the way it is. In general though, I've seen this trend toward more flexibility. You know VPC Traffic Mirroring, to use that example again, was one of the few examples I could point to a year ago as something really useful and valuable that I could do on-premises, you know for diagnostic purposes, for forensics purposes, that for some reason wasn't available in public cloud, at least not easily. And, you know, with this announcement six months ago, and going to general availability, Amazon finally ticked that one off. And we're starting to see the rest of the public cloud ecosystem move that way as well. So I'm seeing more flexibility, and more control. Maybe that comes with a pace of innovation, but I think that's just the world we live in. >> You do mention that the customers are having to adopt this new regime, of look we need to look at compromise, can we detect if we've been compromised, and can we do it quickly. We have a lot of tools that are now being made available, like Igress Routing, but, sorry Ingress Routing. But what does that mean for customers in changing their mindset? One of the themes that we had from the keynote yesterday was transformation, so do customers need to just transform the way they think about security? >> Yes and no. You know certainly customers who are used to a certain set of on-prem tool set, tool chain can't necessarily just shoehorn that into their public cloud workloads. But on the other hand, I think that public cloud workloads have really suffered from an opacity problem, it's very difficult to see what's going on, you know its hard to sift through all those logs, it's hard to get the visibility that you expect. And I think that the cyber security tool set, tool chain, has been pretty fragmented. There are a lot of vulnerability scanners, there are a lot of kind of like API inspectors and recommendation engines. But I think the industry is still really trying to figure out what this means. So I'm seeing a lot of innovation, and I'm seeing kind of a rapid maturing of that kind of cloud security ecosystem. And for products like ExtraHop, I'm just a huge believer in the power of the network for security, because it's got these great properties that other sources of data don't have. It's as close to ground truth as you could possibly get, very hard to tamper with and impossible to turn off. With VPC Traffic Mirroring, we get the full power of network security and it's really designed with the controls and kind of the IAM roles and such that you would expect for these security use cases, which, I just, great, great advance. >> So along the discussion of transformation, one of the things Andy Jassy talked about is the you know, the senior leadership, the CEOs need to be involved. Something we've been saying in the security industry for years. Not only CEOs, the board is you know, talking about this and it's there, so you know, what are you seeing? You stated before that we haven't solved security yet, but so, bring us inside the mindset of your customers today, and what's the angst and you know, where are we making progress? >> That's a very interesting question. I'll probably be a little contrarian here as well, maybe not but I think we see a lot of pressure is regulatory pressure. You know were seeing a lot of new regulations come out around data privacy and security, GDPR was you know pretty transformative in terms of how organizations thought about that. I also think it's important that there are consequences. I was worried that for a few years data breaches were becoming so commonplace that people were getting kind of desensitized to it. Like, there was once a time that if, when there was a massive data breach kind of heads would roll. And there was a sense of consequences all the way up into the C-suite. But a few years ago I was starting to get concerned that people were getting a little lackadaisical like, "Oh just another data breach." My perception is that the pendulum's swinging back again. I think for truly massive data breaches, there really is a sense of brand. And I'm seeing the industry starting to demand better privacy. The consumer industry is perhaps leading the way. I think Apple's doing a very good job of actually selling privacy. So when you see the economics, I mean we're, it's a capitalist system. And when you see kind of the market economics align with the incentives, then that's when you actually see change. So I'm very encouraged by the alignment of kind of the market economics for paying greater attention to privacy and security. >> All right, want to give you a final word here, you said you'd like to have some contrarian viewpoints. So you know, the last question is just you know, what would you like to kind of just educate the marketplace on that maybe goes against the common perception when it comes to security in general, maybe network security specifically? >> Well, I'll probably just reiterate what I said earlier. Network security is a fundamental capability, and a fundamental source of data. I think organizations pay a lot of attention to their log files. I think organizations do invest in protection and prevention. But I think the ability to observe all of the network communications, and then the ability to detect suspicious behaviors and potential threats, bring it to your attention, take you through an investigative workflow, make sure that you're one click away from determining you know, whether this requires an actual incident response, and in some cases take an automated response. I think that is a very powerful solution and one that drastically increases an organization's cyber security posture. So I would always encourage organizations to invest there regardless of whether it's our solution or somebody else's. I'm a huge believer in the space. >> All right so, Jesse, thank you so much for sharing. We know that the security industry still has lots of work to do. So we look forward to catching ExtraHop soon at another event. And we have lots of work to do to cover all of the angles of this sprawling ecosystem here at AWS re:Invent. For Justin Warren, I'm Stu Miniman, be back with lots more right after this, and thank you for watching theCUBE. (bouncy electronic music)

Published Date : Dec 5 2019

SUMMARY :

brought to you by Amazon Web Services, of coverage of the mega AWS re:Invent show, and it's raining here in Vegas, which is a little odd. but re:Inforce at the end of June in Boston from the discussion of cloud, it is you know But generally the announcements seem to focus 50 or 100 announcements before the week is done. So the idea is, all you do is you mirror 'Cause I feel that the machine learning and AI One of the capabilities that we announced here at this show, It's great that we can detect it and say, and you should do it, You know I said if Amazon is the everything store, that we expect with VPCs? And Amazon just keeps pushing the boundaries And, you know, with this announcement six months ago, One of the themes that we had from the keynote yesterday that you would expect for these security use cases, is the you know, the senior leadership, My perception is that the pendulum's swinging back again. So you know, the last question is just you know, But I think the ability to observe We know that the security industry

ENTITIES

Entity	Category	Confidence
Jesse Rothstein	PERSON	0.99+
Justin Warren	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Boston	LOCATION	0.99+
Vegas	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
three years	QUANTITY	0.99+
Houston	LOCATION	0.99+
Jesse	PERSON	0.99+
three weeks	QUANTITY	0.99+
100,000 arrows	QUANTITY	0.99+
three hours	QUANTITY	0.99+
Werner	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
1,000 arrows	QUANTITY	0.99+
three minutes	QUANTITY	0.99+
June	DATE	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
50	QUANTITY	0.99+
100 million arrows	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
Sands Convention Center	LOCATION	0.99+
100 announcements	QUANTITY	0.99+
six months ago	DATE	0.99+
this year	DATE	0.99+
tomorrow	DATE	0.99+
SageMaker Studio	ORGANIZATION	0.99+
a year ago	DATE	0.98+
this week	DATE	0.98+
seventh year	QUANTITY	0.98+
end of June	DATE	0.98+
last June	DATE	0.98+
GDPR	TITLE	0.98+
One	QUANTITY	0.97+
five	DATE	0.97+
ExtraHop	ORGANIZATION	0.97+
first	QUANTITY	0.96+
65,000	QUANTITY	0.96+
one step	QUANTITY	0.95+
10 years ago	DATE	0.95+
last decade	DATE	0.94+
over three months	QUANTITY	0.94+
two clicks	QUANTITY	0.94+
60	QUANTITY	0.93+
today	DATE	0.91+
three-hour keynote	QUANTITY	0.9+
AWS re:Invent show	EVENT	0.87+
Inforce	ORGANIZATION	0.84+
Igress Routing	TITLE	0.82+
few years ago	DATE	0.81+
VPC	TITLE	0.79+
VPC Ingress Routing	TITLE	0.76+
re:Invent 2019	EVENT	0.76+

Dilip Advani, Uila | VTUG Winter Warmer 2018

(lively techno music) >> Announcer: From Gillette Stadium in Foxborough, Massachusetts, it's theCUBE. Covering VTUG Winter Warmer 2018, presented by SiliconANGLE. >> Hi, I'm Stu Miniman, and this is theCUBE's coverage of the VTUG Winter Warmer here in 2018. Happy to welcome to the program, first time guest and first time company on the program, Dilip Advani, who's the vice president of marketing at Uila. Great to see you. >> Thank you Stu. Great to be here. >> All right, so Dilip, first, tell us a little bit about your background and what brought you to Uila. >> Yeah. So again, my background has been on the analysis side and the protocol analysis side. I have been, in the past, focused on the wireless aspects of the business. I have led teams on product strategies and product marketing in my past history. What I have done is, the reason I came to Uila, is because of the rich history, from the founders who have great experience on the deep packet inspection and the protocol analysis side. And they decided to bring this to the virtualization world and that's what got me very interested in Uila. >> Okay. So Uila itself, we've worked with a number of the team. Fluke Networks? Was that where... >> This was, yeah this was from the original AirMagnet Fluke Networks team as well. So this is the team that actually built the world's first analyzer product, which was Net XRay from Cinco Networks. >> Okay, great, tell us the why of Uila, why today, what's different, what's the big problem it's helping us solve. >> Yeah, so before I talk about what Uila does, and then, what role it plays in the industry, I wanted to address one question that people frequently ask us, "What does Uila actually mean?" The joke around the office is that, because the founders like to go to Hawaii, a lot, >> Stu: (laughs) >> That's why they came up with the Hawaiian name. It actually means "lightning in the cloud" in Hawaiian. But there's a deeper meaning to that. We, actually, we are the power and the guiding light behind some of the challenges that people have with their cloud environment. So what Uila, If you step back and talk about what Uila as a company does, we are a young and dynamic company based out of the Silicon Valley, and what we do is, we do application-centric infrastructure monitoring. We pinpoint the bottlenecks that may exist on your infrastructure, and we also help users on the hybrid cloud workload migration strategy. >> Yeah, I hear "application-centric," and there's been hardware companies that sometimes use that term, and it really more infrastructure-centric, that applications sit on. So, maybe tell us a little bit about where you sit and what you look at and how much is kind of tied to the application versus the infrastructure. >> Absolutely, right. At the end of the day, everything goes back to the application, all the business service. And obviously, the business service is running on the infrastructure. So we target the IT operations team. We want to make sure that they don't end up being the fall guy, or the team to be blamed for anything and everything that goes wrong with the network. Sometimes it is the infrastructure, but at times it could be the application itself, as well. So, that is where Uila plays a role, to help in that full stack monitoring, to avoid the finger-pointing discussion that takes place between the operations team as well as the application teams, or any other teams within the organization. >> I think that's a great point. It's interesting, when the dev ops wave, some people throw out that term "no ops," it's like, operations is real important. I interviewed Solomon Hykes from Docker, and he said, "The reason we did container wasn't to get away from the operator, it was actually to create tools to help the operator, and it enables the developer and the application side, but ops is still pretty critical." >> Absolutely, absolutely. That's where, I think, everything ends. So that's been our focus, to make sure that we provide a solution for that particular team, so that they can help solve any challenges that you may have in your data center. >> Okay, need to understand where this lives, because, today, customers, especially at an event like this, there's virtualization and there's cloud, and there's a huge spectrum of what cloud means to customer. Some of them, cloud is, I'm a small company, maybe it is mostly public cloud. Everybody's doing SaaS. Most companies have some in their on-premises, whatever you want to call that, and heck, there's even the edge stuff, is becoming majorly important, but it's the, everything, whether you call it multi-cloud or hybrid cloud, how do you put that all together? There's lots of challenges there, where do you fit in this overall puzzle? >> Absolutely. In terms of the private cloud, like I mentioned, our main goal is to help you solve the performance bottleneck, whether it's on the application side or the infrastructure side, and help you solve that problem. But what trends we are seeing, is, a majority of the customers, just like the industry in general, is looking towards the hybrid cloud, or multi-cloud, or whatever you want to call that. We are seeing a lot of customers move towards that strategy, but again, they are struggling with defining that strategy. They're struggling with how you get going on this particular path of taking their applications or their business services, which, traditionally I've stated in the private data center and moving it to the public cloud as such. So that's where we've seen organizations struggle with understanding what their current scenario looks like, what their current applications look like, how they're dependent on each other. Again, documentation, obviously, as you know, is that last thing on IT people's minds. Or, if they have a document ready, it's outdated as soon as it's created. So that's where we've seen a lot of organizations struggle, with getting that visibility into what exists within their environment, as they plan about taking their applications to the hybrid cloud. >> Okay, so Dilip, I just want to make sure I understand. Things like performance management, do you look at both sides of a hybrid, both the public and the private, or is it primarily in the private? >> We look at both sides, on the private side as well as the public side. And on the private side, like I mentioned, not only do we help on the performance monitoring there, but we also help you define your migration strategy. >> Okay, when I think about all those things you were talking about, I'm surprised I haven't heard some mention of machine learning, artificial intelligence, 'cause things are growing, things are changing so fast, there's no way the administrator can do it themselves, what's the secret sauce, where's the software, where do you fit in, or do you just stay away from those buzzwords? >> No, no, no, again, I think everybody likes to use those buzzwords. >> Stu: (Laughs) >> We do the same as well. I think, when you think about artificial intelligence, or machine learning, at the end of the day, it goes back to the predictive analysis capabilities that organizations must have for their data centers, because at the end of the day, it's about being proactive, not just being reactive, to issues that could be occurring on your network. So, mining the data that's being collected on your current environment and using that, by artificial intelligence, or machine learning, to figure out what are the resources that will be needed as they expand their own capacities within their own environment and such. Or, being able to predict that they need to assign certain resources, or they're going to run into a certain issue, if they don't assign certain resources, or they don't do something, which could impact their business performance. >> Okay, Dilip, want to just step back for a second, give us a snapshot of the company. How many people, what can you share about funding, the state of the product, is it, actually GA? >> Yeah, absolutely. Like I mentioned, we are a young and dynamic company located in Silicon Valley. We are founded three or four years ago, we have a product that's shipping, we have lots of customers. In terms of funding, we have gone through Series A round of funding and such. And we have customers across different verticals. Whether it's healthcare, whether it's retail, and whether it's MSB type of customers as well. >> And you're 100 percent a software company, how do people engage? Is there like a free trial demo type thing, or how do people get started? >> Yeah. Again, we're a pure software company, so if you look at how Uila gets installed, we get installed as a guest VM, on top of the hypervisor. So this could be a Hyper-V environment, or it could be a VMware type of an environment. And then what we do is we do deep packet inspection to get the application and the network information. >> You mentioned VMware and Hyper-V, public clouds, which ones? >> Public clouds, AWS, Google cloud, so we are more agnostic on that side. >> Stu: Great. >> So we do deep packet inspection, to get those details, on the application and network side, and then we also talk to vCenter, to get all of the compute and storage statistics. So again, a pure software solution, we do have trials available, we have a 30-day trial available for our software, so in case anybody is interested, they can obviously go to our website, at Uila.com, and then request a trial. We work with the customer to install it, we train the person who's doing the trial, and then, after the trial, we even do data reviews, and show you what issues that may be existing in your network. So like a true performance assessment of your data center. >> Okay, and who's the typical administrator of this? Is this same person using vCenter admin, or doing their public cloud management? And I'm curious what dynamics you're seeing in the company, when they've got both sides of that, and how that plays? >> Yeah. So typically, we're seeing virtualization engineers, or IT architects, who are using the Uila solution. And the trend we are seeing between the private and the public cloud is that many of the people who had the responsibility on the private side, it's the same group of people who are still responsible for managing the environment on the public cloud side. So it's not only important to make sure the availability of the infrastructure continues, as you go from your private to your public cloud, but also the application and user experience continues, so that's why having the same group of people managing and monitoring is the trend that we are seeing with our customers. >> Okay. Dilip, want to give you the final word. What brings Uila to an event like this? >> Again, this is the first time we've come to VTUG, we have been doing many other community events, in other locations. Uila believes in working with the community, so that's why we've been engaged with the vExperts, as well as the community in general. And we think this is one of the premier events where the right people in the community, in terms of the technical professionals, hang out. So that's why we decided to come to the VTUG event. And I'm pretty sure I will be back for the Summer Slam as well. >> Well, Dilip Advani, really appreciate the updates, and telling our audience a little bit about Uila, it's lightning in the cloud. For some reason we haven't had the CUBE yet in Hawaii, maybe we need to re-change >> Instead of water, we'll have mai tais there. (laughing) >> Absolutely. Lots more coverage here, at the VTUG Winter Warmer 2018, I'm Stu Miniman, you're watching theCUBE. (energetic techno music)

Published Date : Feb 1 2018

SUMMARY :

it's theCUBE. of the VTUG Winter Warmer here in 2018. Great to be here. and what brought you to Uila. What I have done is, the reason I came to Uila, So Uila itself, the world's first analyzer product, Okay, great, tell us the why of Uila, out of the Silicon Valley, and what we do is, and what you look at and how much is kind of tied being the fall guy, or the team to be blamed and it enables the developer and the application side, So that's been our focus, to make sure that we and there's a huge spectrum of what cloud means to customer. or the infrastructure side, and help you solve that problem. or is it primarily in the private? And on the private side, like I mentioned, to use those buzzwords. at the end of the day, the state of the product, is it, actually GA? And we have customers across different verticals. to get the application and the network information. AWS, Google cloud, so we are more agnostic on that side. on the application and network side, and monitoring is the trend that we are seeing Dilip, want to give you the final word. in terms of the technical professionals, hang out. Uila, it's lightning in the cloud. Instead of water, we'll have mai tais there. at the VTUG Winter Warmer 2018,

ENTITIES

Entity	Category	Confidence
Dilip	PERSON	0.99+
Fluke Networks	ORGANIZATION	0.99+
Dilip Advani	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hawaii	LOCATION	0.99+
Cinco Networks	ORGANIZATION	0.99+
100 percent	QUANTITY	0.99+
Stu	PERSON	0.99+
AWS	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
Gillette Stadium	LOCATION	0.99+
AirMagnet	ORGANIZATION	0.99+
Uila	PERSON	0.99+
both sides	QUANTITY	0.99+
both	QUANTITY	0.99+
2018	DATE	0.99+
Summer Slam	EVENT	0.99+
first	QUANTITY	0.99+
one question	QUANTITY	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
Foxborough, Massachusetts	LOCATION	0.98+
three	DATE	0.98+
VTUG	EVENT	0.97+
Uila	ORGANIZATION	0.96+
SiliconANGLE	ORGANIZATION	0.95+
theCUBE	ORGANIZATION	0.94+
vCenter	TITLE	0.94+
VMware	TITLE	0.93+
Uila.com	ORGANIZATION	0.92+
four years ago	DATE	0.92+
Solomon Hykes	PERSON	0.92+
Uila	TITLE	0.9+
first analyzer product	QUANTITY	0.9+
Series A	OTHER	0.9+
30-day trial	QUANTITY	0.84+
VTUG Winter Warmer 2018	EVENT	0.84+
Uila	LOCATION	0.83+
Net XRay	ORGANIZATION	0.8+
Hawaiian	OTHER	0.8+
VTUG Winter Warmer	EVENT	0.78+
VTUG Winter	EVENT	0.78+
Hyper	TITLE	0.75+
Google cloud	ORGANIZATION	0.73+
vExperts	ORGANIZATION	0.69+
Docker	ORGANIZATION	0.63+
a second	QUANTITY	0.59+
GA	LOCATION	0.52+
Warmer	TITLE	0.52+
premier events	QUANTITY	0.51+
CUBE	ORGANIZATION	0.45+

Zachary Bosin and Anna Simpson | Veritas Vision 2017

>> Announcer: Live from Las Vegas, it's theCube. Covering Veritas Vision 2017. Brought to you by Veritas. >> Welcome back to Las Vegas everybody, this is theCube, the leader in live tech coverage. This is day one of two day coverage of Veritas Vision #VtasVision. My name is Dave Vellante, and I'm here with my co-host Stu Miniman. Zach Bosin is here. He's the director of information governance solutions at Veritas. And Anna Simpson is a distinguished systems engineer at Veritas. Which Anna means you know where all the skeletons are buried and how to put the pieces back together again. Welcome to theCube, thanks for coming on. >> Thank You. >> Thank You. >> Let's start with, we've heard a little bit today about information governance, Zach we'll start with you. It's like every half a decade or so every decade, there's a new thing. And GDPR is now the new thing. What's the state of information governance today? How would you describe it? >> I think the primary problem that organizations are still trying to fight off, is exponential data growth. We release research every year called the Data Genomics Index, and what came back this past year is that data growth has continued to accelerate, as a matter of fact, 49% year over year. So this problem isn't going anywhere and now it's actually being magnified by the fact that data is being stored, not only in the data center on premises, but across the multi-cloud. So information governance, digital compliance is all about trying to understand that data, control that data, put the appropriate policies against it. And that's really what we try to do with helping customers. >> I always wonder how you even measure data. I guess you could measure capacity that leaves the factory. There's so much data that's created that's not even persistent. We don't even know, I think, how fast data is growing. And it feels like, and I wonder if you guys agree or have any data suggestions, it feels like the curve is reshaping. I remember when we were talking to McAfee and Brynjolfsson it feels the curve is just going even more exponential. What's your sense? >> That's typically what we see. And then you have IoT data coming online, faster and faster and it really is a vertical shot up. And all different types and new files types. One of the other really interesting insights, is that unknown file types jumped 30-40%. Things that we don't even recognize with our file analysis tools today, are jumping off the charts. >> It used to be that PST was the little nag, it looks trivial compared to what we face today, Anna. What's your role as a distinguished systems engineer? How do you spend your time? And what are you seeing out there? >> I definitely spend my time dealing with customers around the world. Speaking to them about information governance. Particularly around risk mitigation these day. In terms of the issues we see in information governance, data privacy is a big one. I'm sure you've been hearing about GDPR quite a bit today already. That's definitely a hot topic and something our customers are concerned about. >> Are they ringing you up saying, "Hey, get in here. "I need to talk you about GDPR?" Or is more you going in saying, "You ready for GDPR? How does that conversation go? >> It's definitely a combination between the two. I think there is definitely a lot of denial out there. A lot of people don't understand that it will apply to them. Obviously if they are storing or processing data which belongs to an EU resident, containing their personal data. I think organizations are either in that denial phase or otherwise they're probably too aware, so they've probably started a project, done some assessment, and then they're buried in the panic mode if we have to remediate all these issues before May next year. >> What's the bell curve look like? Let's make it simple. One is, "we got this nailed." That's got to be tiny. The fat middle which is "we get it, we know it's coming, "we got to allocate some budget, let's go." Versus kind of clueless. What's the bell curve look like? >> I would say that there's 2% of companies, maybe, that think they have it nailed. >> Definitely in single digits, a low single digits. >> I think maybe another 30% at least understand the implications and are trying to at least but a plan in place. And the rest, 66% or so, still aren't very aware of what GDPR means for their business. >> Dave: Wow. >> Can you take us inside? what's Veritas's role in helping customers get ready for GDPR? We talked to one of Veritas's consulting partners today and it's a big issue, it crosses five to ten different budget areas. So what's the piece that Veritas leads and what's the part that you need to pull in other partners for? >> Sure thing. So in terms of our approach, we have what we refer to as a wheel. Which sort of attacks different parts of the GDPR, so various articles step you through the processes you need to be compliant. Things like locating personal data, being able to search that data, minimizing what you have, because GDPR is really dictating you can no longer data hoard, because you can only keep data which has business value. Further downstream it's obviously protecting the data that has business value, and then monitoring that over time. From a Veritas approach perspective, we tying those articles obviously to some of our products, some of our solutions. There's also definitely a services component around that as well. When you think about e-discovery of regulatory requirements, when the regulators come in, generally they're not necessarily going to be questioning the tools, they're going to be questioning how you're using those tools to be compliant. It is sort of a combination between tools and services. And then we're also partnering with other consulting companies on that process piece, as well. Zach, at the keynote this morning, there was a lot of discussion about there's dark data out there, and we need to shine a light on it I have to imagine that's a big piece of this. Why don't you bring us up to speed. What are some of the new products that were announced that help with this whole GDRP problem. >> In to that point, 52% of data is dark, 33% is rot, 15% is mission critical. Today we announced 23 new connectors for the Veritas information map. This is our immersive visual data mapping tool, that really highlights where you're stale, and orphaned, and non-business critical data is across the entire enterprise. New connectors with Microsoft as your Google Cloud storage, Oracle databases, so forth and so on, there's quite a number that we're adding into the fold. That really gives organizations better visibility into where risk may be hiding, and allows you to shine that light and interrogate that data in ways you couldn't do previously because you didn't have those types of insights. >> Also we heard about Risk Analyzer? >> Yes, that's right. We just recently announced the Veritas Risk Analyzer, this is a free online tool, where anyone can go to Veritas.com/riskanalyzer, take a folder of their data, and try out our brand new integrated classification engine. We've got preset policies for GDPR, so you drop in your files, and we'll run the classification in record speed, and it will come back with where PII is, how risky that folder was, tons of great insights. >> So it's identifying the PII, and how much there is, and how siloed it is? Are you measuring that? What are you actually measuring there? >> We're actually giving you a risk score. When we're analyzing risk, you might find one individual piece of PII, or you might find much more dense PII. So depending on the number of files, and the types of files, we'll actually give you a different risk tolerance. What we're doing with the Risk Analyzer is giving you a preview, or just a snapshot of the types of capabilities that Veritas can bring to that discussion. >> Who do you typically talk to? Is it the GC, is it the head of compliance, chief risk officer, all of the above? >> Yeah, it's definitely all of the above-- >> Some person who has a combination of those responsibilities, right? >> Yeah, exactly. It's usually, if we're talking GDPR specifically, it's usually information security, compliance, legal, and particularly in organizations now, we're definitely seeing more data privacy officers. And they're the ones that truly understand what these issues are; GDPR or other personal data privacy regulations. >> Let's say I'm the head of compliance security risk information governance, I wear that hat. Say I'm new to the job, and I call you guys in and say, "I need help." Where do I start? Obviously you're going to start with some kind of assessment Maybe you have a partner to help you do that, I can run my little risk analyzer, sort of leech in machine, and that's good but that's just scratching the surface. I know I have a problem. Where do we start? What are the critical elements? And how long is it going to take me to get me where I need to be? >> I think visibility is obviously the first step, which Zach already spoke to. You really have to be able to understand what you have to then be able to make some educated decisions about that. Generally that's where we see the gap in most organizations today. And that's particularly around unstructured data. Because if it's structured, generally you have some sort of search tools that you can quickly identify what is within there. >> To add on to that, you actually have 24 hours. We can bring back one hundred million items using the information map, so you get a really clean snapshot in just one day to start to understand where some of that risk may be hiding. >> Let's unpack that a little bit. You're surveying all my data stores, and that's because you see that because you've got the back-up data, is that right? >> The backup data is one portion of it. The rest is really coming from these 23 new connectors into those different data stores and extracting and sweeping out that metadata, which allows us to make more impactful decisions about where we think personal data may be, and then you can take further downstream actions using the rest of our tool kit. >> And what about distributed data on laptops, mobile devices, IoT devices, is that part of the scope, or is that coming down the road, or is it a problem to be solved? >> It's a little out of scope for what we do. On the laptop/desktop side of things, we do have e-discovery platform, formally known as Clearwell, which does have the ability to go out and search those types of devices and then you could be doing some downstream review of that data, or potentially moving it elsewhere. It's definitely a place we don't really play right now. I don't know if you had other comments? >> You got to start somewhere. Start within your enterprise. This has always been a challenge. We were talking off camera about FRCP and email archiving. I always thought the backup ... The back company was in a good spot. They analyzed that data. But then there's the but. Even these are backed up, kind of, laptops and mobile devices. Do you see the risk and exposures in PII really at the corporate level, or are attorneys going to go after the processes around distributed data, and devices, and the like? >> I think anything is probably fair game at this point given that GDPR isn't being enforced yet. We'll have to see how that plays out. I think the biggest gap right now, or the biggest pain point for organizations, is on structured data. It kind of becomes a dumping ground and people come and go from organizations, and you just have no visibility into the data that's being stored there. And generally people like to store things on corporate networks because it gets backed up, because it doesn't get deleted, and it's usually things that probably should not be stored there. >> If I think back to 2006, 2007 time frame with Federal Rules of Civil Procedure, which basically said that electronic information is now admissible. And it was a high profile case, I don't want to name the name because I'll get it wrong, but they couldn't produce the data in court, the judge penalized them, but then they came back and said, "We found some more data. "We found some more data. "We found some more data." Just an embarrassment. It was one hundred million dollar fine. That hit the press. So what organizations did, and I'm sure Anna you could fill in the gaps, they basically said, "Listen, "it's an impossible problem so we're going to go after "email archiving. "We're going to put the finger in the dyke there, "and try to figure the rest of this stuff out later." What happened is plaintiff's attorney's would go after their processes and procedures, and attack those. And if you didn't have those in place, you were really in big trouble. So what people did is try to put those in place. With GDPR, I'm not sure that's going to fly. It's almost binary. If somebody says, "I want you to delete my data," you can't prove it, I guess that's process-wise, you're in trouble, in theory. We'll see how it holds up and what the fines look like, but it sounds like it's substantially more onerous, from what we understand. Is that right? >> Yes, I would 100% agree. From an e-discovery standpoint, there's proportionality and what's reasonable relative to the cost of the discovery and things like that. I actually don't think that that is going to come into play with GDPR because the fines are so substantial. I don't know what would be considered unreasonable to go out and locate data. >> Zach you have to help us end this on an up note. (group laughs) >> Dave: Wait, I wanted to keep going in to the abyss. (group laughs) We've talk about the exponential growth of data, and big data was supposed to be that bit-flip ... of turned it for, "Oh my God, I need to store it "and do everything, I need to be able to harness it "and take advantage of it" Is GDPR an opportunity for customers, to not only get their arms around information, but extract new value from it? >> Absolutely. It's all about good data hygiene. It's about good information governance. It's about understanding where your most valuable assets are, focusing on those assets, and getting the most value you can from them. Get rid of the junk, you don't need that. It's just going to get you into trouble and that's what Veritas can help you do. >> So a lot of unknowns. I guess the message is, get your house in order, call some experts. I'd call a lot of experts, obviously Veritas. We had PWC on earlier today, and a number of folks in your ecosystem I'm sure can help. Guys, thanks very much for coming on theCube and scaring the crap out of us. (group laughs) >> Thanks a lot. >> Alright, keep it right there buddy, we'll be back for our wrap, right after this short break. (light electronic music)

Published Date : Sep 20 2017

SUMMARY :

Brought to you by Veritas. and how to put the pieces back together again. And GDPR is now the new thing. is that data growth has continued to accelerate, And it feels like, and I wonder if you guys agree And then you have IoT data coming online, faster and faster And what are you seeing out there? In terms of the issues we see in information governance, "I need to talk you about GDPR?" It's definitely a combination between the two. What's the bell curve look like? that think they have it nailed. And the rest, 66% or so, still aren't very aware that you need to pull in other partners for? the processes you need to be compliant. into where risk may be hiding, and allows you to shine so you drop in your files, and we'll run the classification So depending on the number of files, and the types of files, And they're the ones that truly understand Say I'm new to the job, and I call you guys in and say, You really have to be able to understand what you have To add on to that, you actually have 24 hours. and that's because you see that may be, and then you can take further downstream actions the ability to go out and search those types of devices and the like? or the biggest pain point for organizations, And if you didn't have those in place, I actually don't think that that is going to come into play Zach you have to help us end this on an up note. "and do everything, I need to be able to harness it Get rid of the junk, you don't need that. I guess the message is, get your house in order, Alright, keep it right there buddy, we'll be back

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Zach Bosin	PERSON	0.99+
Dave	PERSON	0.99+
Anna Simpson	PERSON	0.99+
Veritas	ORGANIZATION	0.99+
Anna	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
five	QUANTITY	0.99+
49%	QUANTITY	0.99+
24 hours	QUANTITY	0.99+
100%	QUANTITY	0.99+
Zach	PERSON	0.99+
23 new connectors	QUANTITY	0.99+
GDPR	TITLE	0.99+
Stu Miniman	PERSON	0.99+
Las Vegas	LOCATION	0.99+
one hundred million dollar	QUANTITY	0.99+
two	QUANTITY	0.99+
15%	QUANTITY	0.99+
66%	QUANTITY	0.99+
ten	QUANTITY	0.99+
2%	QUANTITY	0.99+
52%	QUANTITY	0.99+
Zachary Bosin	PERSON	0.99+
Federal Rules of Civil Procedure	TITLE	0.99+
Oracle	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
May next year	DATE	0.99+
first step	QUANTITY	0.99+
one day	QUANTITY	0.98+
two day	QUANTITY	0.98+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
2017	DATE	0.98+
Today	DATE	0.98+
one hundred million items	QUANTITY	0.98+
33%	QUANTITY	0.97+
today	DATE	0.97+
30-40%	QUANTITY	0.96+
McAfee	ORGANIZATION	0.95+
Veritas Vision	ORGANIZATION	0.94+
earlier today	DATE	0.92+
2007	DATE	0.91+
Veritas Risk Analyzer	TITLE	0.91+
Risk	TITLE	0.91+
past year	DATE	0.91+
one individual piece	QUANTITY	0.89+
PWC	ORGANIZATION	0.88+
this morning	DATE	0.88+
Data Genomics Index	OTHER	0.84+
Google	ORGANIZATION	0.84+
half a decade	QUANTITY	0.83+
Brynjolfsson	PERSON	0.83+
Clearwell	ORGANIZATION	0.83+
single	QUANTITY	0.83+
2006,	DATE	0.77+
EU	ORGANIZATION	0.77+

Ash Munshi, Pepperdata - #SparkSummit - #theCUBE

(upbeat music) >> Announcer: Live from San Francisco, it's theCUBE, covering Spark Summit 2017, brought to you by Databricks. >> Welcome back to theCUBE, it's day two at the Spark Summit 2017. I'm David Goad and here with George Gilbert from Wikibon, George. >> George: Good to be here. >> Alright and the guest of honor of course, is Ash Munshi, who is the CEO of Pepperdata. Ash, welcome to the show. >> Thank you very much, thank you. >> Well you have an interesting background, I want you to just tell us real quick here, not give the whole bio, but you got a great background in machine learning, you were an early user of Spark, tell us a little bit about your experience. >> So I'm actually a mathematician originally, a theoretician who worked for IBM Research, and then subsequently Larry Ellison at Oracle, and a number of other places. But most recently I was CTO at Yahoo, and then subsequent to that I did a bunch of startups, that involved different types of machine learning, and also just in general, sort of a lot of big data infrastructure stuff. >> And go back to 2012 with Spark right? You had an interesting development. Right, so 2011, 2012, when Spark was still early, we were actually building a recommendation system, based on user-generated reviews. That was a project that was done with Nando de Freitas, who is now at DeepMind, and Peter Cnudde, who's one of the key guys that runs infrastructure at Yahoo. We started that company, and we were one of the early users of Spark, and what we found was, that we were analyzing all the reviews at Amazon. So Amazon allows you to crawl all of their reviews, and we basically had natural language processing, that would allow us to analyze all those reviews. When we were doing sort of MapReduce stuff, it was taking us a huge number of nodes, and 24 hours to actually go do analysis. And then we had this little project called Spark, out of AMPlab, and we decided spin it up, and see what we could do. It had lots of issues at that time, but we were able to actually spin it up on to, I think it was in the order of 100,000 nodes, and we were able take our times for running our algorithms from you know, sort of tens of hours, down to sort of an hour or two, so it was a significant improvement in performance. And that's when we realized that, you know, this is going to be something that's going to be really important once this set of issues, where it, once it was going to get mature enough to make happen, and I'm glad to see that that it's actually happened now, and it's actually taken over the world. >> Yeah that little project became a big deal, didn't it? >> It became a big deal, and now everybody's taking advantage of the same thing. >> Well bring us to the present here. We'll talk about Pepperdata and what you do, and then George is going to ask a little bit more about some of the solutions that you have. >> Perfect, so Pepperdata was a company founded by two gentlemen, Sean Suchter and Chad Carson. Sean used to run Yahoo Search, and one of the first guys who actually helped develop Hadoop next to Eric14 and that team. And then Chad was one of the first guys who actually figured out how to monetize clicks, and was the data science guy around the whole thing. So those are the two guys that actually started the company. I joined the company last July as CEO, and you know, what we've done recently, is we've sort of expanded our focus of the company to addressing DevOps for big data. And the reason why DevOps for big data is important, is because what's happened in the last few years, is people have gone from experimenting with big data, to taking big data into production, and now they're actually starting to figure out how to actually make it so that it actually runs properly, and scales, and does all the other kinds of things that are there, right? So, it's that transition that's actually happened, so, "Hey, we ran it in production, "and it didn't quite work the way we wanted to, "now we actually have to make it work correctly." That's where we sort of fit in, and that's where DevOps comes in, right? DevOps comes in when you're actually trying to make production systems that are going to perform in the right way. And the reason for DevOps is it shortens the cycle between developers and operators, right? So the tighter the loop, the faster you can get solutions out, because business users are actually wanting that to happen. That's where we're squarely focused, is how do we make that work? How do we make that work correctly for big data? And the difference between, sort of classic DevOps and DevOps for big data, is that you're now dealing with not just, you know, a set of computers solving an isolated sort of problem. You're dealing with thousands of machines that are solving one problem, and the amount of data is significantly larger. So the classical methodologies that you have, while, you know, agile and all that still works, the tools don't work to actually figure out what you can do with DevOps, and that's where we come in. We've got a set of tools that are focused on performance effectively, 'cause that's the big difference between distributed systems performance I should say, that's the big difference between that, and sort of classic even scaled out computing, right? So if you've got web servers, yes performance is important, and you need data for those, but that can actually be sharded nicely. This is one system working on one problem, right? Or a set of systems working on one problem. That's much harder, it's a different set of problems, and we help solve those problems. >> Yeah, and George you look like you're itching to dig into this, feel free. (exclaims loudly) >> Well so, it was, so one of the big announcements at the show, and the sort of the headline announcement today, was Spark server lists, like so it's not just someone running Spark in the cloud sort of as a manage service, it's up there as a, you know, sort of SaaS application. And you could call it platform of the service, but it's basically a service where, you know, the infrastructure is invisible. Now, for all those customers who are running their own clusters, which is pretty much everyone I would imagine at this point, how far can you take them in hiding much of the overhead of running those clusters? And by the overhead I mean, you know, the primarily performance and maximizing, you know, sort of maximizing resource efficiency. >> So, you have to actually sort of double-click on to the kind of resources that we're talking about here, right? So there's the number of nodes that you're going to need to actually do the computation. There is, you know, the amount of disc storage and stuff that you're going to need, what type of CPUs you're going to need. All of that stuff is sort of part of the costing if you will, of running an infrastructure. If somebody hides all that stuff, and makes it so that it's economical, then you know, that's a great thing, right? And if it can actually be made so that it's works for huge installations, and hides it appropriately so I don't pay too much of a tax, that's a wonderful thing to do. But we have, our customers are enterprises, typically Fortune 200 enterprises, and they have both a mixture of cloud-based stuff, where they actually want to control everything about what's going on, and then they have infrastructure internally, which by definition they control everything that's going on, and for them we're very, very applicable. I don't know how we'd applicable in this, sort of new world as a service that grows and shrinks. I can certainly imagine that whoever provides that service would embed us, to be able to use the stuff more efficiently. >> No, you answered my question, which is, for the people who aren't getting the turnkey you know, sort of SaaS solution, and they need help managing, you know, what's a fairly involved stack, they would turn to you? >> Ash: Yes. >> Okay. >> Can I ask you about the specific products? >> George: Oh yes. >> I saw you at the booth, and I saw you were announcing a couple of things. Well what is new-- >> Ash: Correct. >> With the show? >> Correct, so at the show we announced Code Analyzer for Apache Spark, and what that allows people to do, is really understand where performance issues are actually happening in their code. So, one of the wonderful things about Spark, compared to MapReduce, is that it abstracts the paradigm that you actually write against, right? So that's a wonderful thing, 'cause it makes it easier to write code. The problem when we abstract, is what does that abstraction do down in the hardware, and where am I losing performance? And being able to give that information back to the user. So you know, in Spark, you have jobs that can run in parallel. So an apps consists of jobs, jobs can run in parallel, and each one of these things can consume resources, CPU, memory, and you see that through sort of garbage collection, or a disc or a network, and what you want to find out, is which one these parallel tasks was dominating the CPU? Why was it dominating the CPU? Which one actually caused the garbage collector actually go crazy at some point? While the Spark UI provides some of that information, what it doesn't do, is gives you a time series view of what's going on. So it's sort of a blow-by-blow view of what's going on. By imposing the time series view on sort of an enhanced version of the Spark UI, you now have much better visibility about which offending stages are causing the issue. And the nice thing about that is, once you know that, you know exactly which piece of code that you actually want to go and look at. So classic example would be, you might have two stages that are running in parallel. The Spark UI will tell you that it's stage three that's causing the problem, but if you look at the time series, you'll find out that stage two actually runs longer, and that's the one that's pegging the CPU. And you can see that because we have the time series, but you couldn't see that any other way. >> So you have a code analyzer and also the app profiler. >> So the app profiler is the other product that we announced a few months ago. We announced that I guess about three months ago or so. And the app profiler, what it does, is it actually looks after the run is done, it actually looks at all the data that the run produces, so the Spark history server produces, and then it actually goes back and analyzes that and says, "Well you know what? "You're executors here, are not working as efficiently, "these are the executors "that aren't working as efficiently." It might be using too much memory or whatever, and then it allows the developer to basically be able to click on it and say, "Explain to me why that's happening?" And then it gives you a little, you know, a little fix-it if you will. It's like, if this is happening, you probably want to do these things, in order to improve performance. So, what's happening with our customers, is our customers are asking developers to run the application profiler first, before they actually put stuff on production. Because if the application profiler comes back and says, "Everything is green." That there's no critical issues there. Then they're saying, "Okay fine, put it on my cluster, "on the production cluster, "but don't do it ahead of time." The application profiler, to be clear, is actually based on some work that, on open source project called Dr. Elephant, which comes out of LinkedIn. And now we're working very closely together to make sure that we actually can advance the set of heuristics that we have, that will allow developers to understand and diagnose more and more complex problems. >> The Spark community has the best code names ever. Dr. Elephant, I've never heard of that one before. (laughter) >> Well Dr. Elephant, actually, is not just the Spark community, it's actually also part of the MapReduce community, right? >> David: Ah, okay. >> So yeah, I mean remember Hadoop? >> David: Yes. >> The elephant thing, so Dr. Elephant, and you know. >> Well let's talk about where things are going next, George? >> So, you know, one of the things we hear all the time from customers and vendors, is, "How are we going to deal with this new era "of distributed computing?" You know, where we've got the cloud, on-prem, edge, and like so, for the first question, let's leave out the edge and say, you've got your Fortune 200 client, they have, you know, production clusters or even if it's just one on-prem, but they also want to work in the cloud, whether it's for elastics stuff, or just for, they're gathering a lot of data there. How can you help them manage both, you know, environments? >> Right, so I think there's a bunch of times still, before we get into most customers actually facing that problem. What we see today is, that a lot of the Fortune 200, or our customers, I shouldn't say a lot of the Fortune 200, a lot of our customers have significant, you know, deployments internally on-prem. They do experimentation on the cloud, right? The current infrastructure for managing all these, and sort of orchestrating all this stuff, is typically YARN. What we're seeing, is that more than likely they're going to wind up, or at least our intelligence tells us that it's going to wind up being Kubernetes that's actually going to wind up managing that. So, what will happen is-- >> George: Both on-prem and-- >> Well let me get to that, alright? >> George: Okay. >> So, I think YARN will be replaced certainly on-prem with Kupernetes, because then you can do multi data center, and things of that sort. The nice thing about Kupernetes, is it in fact can span the cloud as well. So, Kupernetes as an infrastructure, is certainly capable of being able to both handle a multi data center deployment on-prem, along with whatever actually happens on the cloud. There is infrastructure available to do that. It's very immature, most of the customers aren't anywhere close to being able to do that, and I would say even before Kupernetes gets accepted within the environment, it's probably 18 months, and there's probably another 18 months to two years, before we start facing this hybrid cloud, on-prem kind of problem. So we're a few years out I think. >> So, would, for those of us including our viewers, you know, who know the acronym, and know that it's a, you know, scheduler slash cluster manager, resource manager, would that give you enough of a control plane and knowledge of sort of the resources out there, for you to be able to either instrument or deploy an instrument to all the clusters (mumbles). >> So we are actually leading the effort right now for big data on Kupernetes. So there is a group of, there's a small group working. It's Google, us, Red Hat, Palantir, Bloomberg now has joined the group as well. We are actually today talking about our effort on getting HDFS working on Kupernetes, so we see the writing on the wall. We clearly are positioning ourselves to be a player in that particular space, so we think we'll be ready and able to take that challenge on. >> Ash this is great stuff, we've just got about a minute before the break, so I wanted to ask you just a final question. You've been in the Spark community for a while, so what of their open source tools should we be keeping our eyes out for? >> Kupernetes. >> David: That's the one? >> To me that is the killer that's coming next. >> David: Alright. >> I think that's going to make life, it's going to unify the microservices architecture, plus the sort of multi data center and everything else. I think it's really, really good. Board works, it's been working for a long time. >> David: Alright, and I want to thank you for that little Pepper pen that I got over at your booth, as the coolest-- >> Come and get more. >> Gadget here. >> We also have Pepper sauce. >> Oh, of course. (laughter) Well there sir-- >> It's our sauce. >> There's the hot news from-- >> Ash: There you go. >> Pepperdata Ash Munshi. Thank you so much for being on the show, we appreciate it. >> Ash: My pleasure, thank you very much. >> And thank you for watching theCUBE. We're going to be back with more guests, including Ali Ghodsi, CEO of Databricks, coming up next. (upbeat music) (ocean roaring)

Published Date : Jun 7 2017

SUMMARY :

brought to you by Databricks. and here with George Gilbert from Wikibon, George. Alright and the guest of honor of course, I want you to just tell us real quick here, and then subsequent to that I did a bunch of startups, and it's actually taken over the world. and now everybody's taking advantage of the same thing. about some of the solutions that you have. So the classical methodologies that you have, Yeah, and George you look like And by the overhead I mean, you know, is sort of part of the costing if you will, and I saw you were announcing a couple of things. And the nice thing about that is, once you know that, And then it gives you a little, The Spark community has the best code names ever. is not just the Spark community, and like so, for the first question, that a lot of the Fortune 200, or our customers, and there's probably another 18 months to two years, and know that it's a, you know, scheduler Bloomberg now has joined the group as well. so I wanted to ask you just a final question. plus the sort of multi data center Oh, of course. Thank you so much for being on the show, we appreciate it. And thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
David Goad	PERSON	0.99+
Ash Munshi	PERSON	0.99+
George	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
Larry Ellison	PERSON	0.99+
George Gilbert	PERSON	0.99+
Google	ORGANIZATION	0.99+
Sean Suchter	PERSON	0.99+
David	PERSON	0.99+
Sean	PERSON	0.99+
Ash	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Peter Cnudde	PERSON	0.99+
2011	DATE	0.99+
DeepMind	ORGANIZATION	0.99+
Bloomberg	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
two guys	QUANTITY	0.99+
Pepperdata	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
first question	QUANTITY	0.99+
Spark UI	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
DevOps	TITLE	0.99+
2012	DATE	0.99+
Chad Carson	PERSON	0.99+
two years	QUANTITY	0.99+
18 months	QUANTITY	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
one problem	QUANTITY	0.99+
last July	DATE	0.99+
Databricks	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
Spark Summit 2017	EVENT	0.99+
Code Analyzer	TITLE	0.99+
Spark	TITLE	0.98+
100,000 nodes	QUANTITY	0.98+
today	DATE	0.98+
Palantir	ORGANIZATION	0.98+
an hour	QUANTITY	0.98+
IBM Research	ORGANIZATION	0.98+
Both	QUANTITY	0.98+
two gentlemen	QUANTITY	0.98+
Chad	PERSON	0.98+
two stages	QUANTITY	0.98+
first guys	QUANTITY	0.98+
both	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
each one	QUANTITY	0.97+
tens of hours	QUANTITY	0.95+
Kupernetes	ORGANIZATION	0.95+
MapReduce	TITLE	0.95+
Yahoo Search	ORGANIZATION	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for first analyzer product: