Tammy Butow & Alberto Farronato, Gremlin CUBE Conversation, April 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Hello everyone, welcome to theCUBE Conversation here in Palo Alto, in our studios of theCUBE, I'm John Furrier, your host. We're here during the crisis of COVID-19 doing remote interviews. I come into the studio, we've got a quarantine crew are here, getting the interviews, getting the stories out there and of course, the story we're going to continue to talk about is the impact of COVID-19, and how we're all getting back to work, either working at home or working remotely and virtually certainly, but as things start to change, we're going to start to see events, mostly digital events, and we're here to talk about an event that's coming up called the Failover Conference from Gremlin which is now gone digital because it's April 21st. But I think what's important about this conversation that I want to get into is, not only talk about the event that's coming up, but talk about the scale problems that are being highlighted by this change in work environment, working at home. We've been talking about the at-scale problems that we're seeing whether it's a flood of surge of traffic and the chaos that's ensuing across the world and with this pandemic. So I'm excited, I've two two great guests, Alberto Fernando, senior vice president of marketing in Gremlin and Tammy Butow, principal site reliability engineer, or SRE. Guys thanks for coming on. Appreciate it, thank you. >> Thanks. >> Thanks for having me. >> Alberto, I want to get to you first. We've know each other before. You've been in this industry. We've been all talking about the cloud native, cloud scale for some time. It's kind of inside the ropes, it's inside baseball. Tammy, you're a site reliability engineer. Everyone knows Google, knows how cloud works. This is large scale stuff. Now with the COVID-19, we're starting to see the average person, my brother, my sister, our family members and people around the world go, "Oh my God, this is really a high impact." This change of behavior, this surge of web, whether it's traffic on the internet or work at home tools that are inadequate, you start to see (laughs) the statistical things that were planned for, not working well, and this actually maps the things that we've been talking about in our industry. Alberto, you've been on this. How are you guys doing? >> Yeah. >> And what's your take on this situation we're in right now? >> Yeah, we're doing pretty well as a company. We were born as a distributed organization to begin with, so for us working in a distributed environment from all over the world is common practice day-to-day. Personally, I'm originally from Italy, my parents, my family, is Milan and Bergamo of all places, so I have to follow the news with extra care and it becomes so much clear nowadays that the technology is not just a powerful tool to enable our businesses but it also is so critical for our day-to-day life, and thanks to video calls, I can easily talk to my family back there every day. So that's really important. So yes, we've been talking for a long time as you mentioned about complex systems at scale and reliability often in the context of mission critical applications, but more and more of these systems need to be reliable also when it comes to back office systems that enable people to continue to work on a daily basis. >> Yeah, well our hearts go out to your family and your friends in Italy, and I hope everyone stays safe there (speaks faintly) a tough situation continues to be a challenge. Tammy, I want to get your thoughts. How's life going for you? You're a site reliable engineer. What you deal with on the tech side is now (laughs) happening in the real world. It's mind blowing to me that we're seeing these things happen, it's a paradigm that needs attention. How do you look at it as a SRE, dealing with mostly on the tech side now seeing it play out in real life? >> It's been such an interesting situation, obviously really terrible for everybody to have to go through and deal with, so one of the things that I specialize in as a site reliability engineer is incident management and so for example, I previously worked at Dropbox where I was the incident manager on call for 500 million customers, it's like 24/7 shift. These large scale incidents, you really need to be able to act fast. There are two very important metrics that we track and care about as a site reliability engineer. The first one is mean time to detection. How fast can you detect that something is happening? Obviously, if we detect an issue faster then you've got a better chance of making the impact lower so you can contain the blast radius. I like to explain it to people like, if you have a fire in your sauce bin in your kitchen, and you put it out, that's way better than waiting until your entire house is on fire. And the other metric is mean time to resolution. So how long does it take you to recover from the situation? So yeah, this is a large scale, global incident right now that we're in. >> Yeah, I know you guys do a lot, talk about chaos, theory and that applies. A lot of math involved, we all know that, but I think we need to look at the real world. This is now going to be table stakes and there's now a line in the sand here, pre-pandemic, post-pandemic, and I think you guys have an interesting company, Gremlin, in the sense that this is a complex system and that if you think about the world we're going to be living in, whether it's digital events that you guys have one coming up or how to work at home or tools that humans are going to be using, it's going to be working with systems, right? So you have this new paradigm going to be upon us pretty quickly and it's not just buying software mechanisms or software, it's a complex system, it's distributed computing, it's an operating system. I mean this is kind of the world. Can you guys talk about the Gremlin situation of how you guys are attacking these new problems and these new opportunities that are emerging? >> Sure, I can talk about that. So yeah, one of the things I've always specialized in over the last ten years is chaos engineering. And so the idea of chaos engineering is that your injecting failure on purpose to uncover weaknesses. So that's really important in distributed systems, with distributed cloud computing, all these different services that you're kind of putting together. But the idea is if you can inject failure, you can actually figure out what happens when I inject that small failure? And then you can actually go ahead and fix it. One of the things I like to say to people is focus on what you're top five critical systems are. Let's fix those first. Don't go for low hanging fruit. Fix the biggest problems first, get rid of the biggest amount of pain that you have as a company, and then you can go ahead and actually... If you think about Pareto principle, the 80/20 rule, if you fix 20% of your biggest problems, you'll actually solve 80% of your issues. That always works. It's something that I've done while working at the National Australia Bank doing chaos engineering. Also at Gremlin, at Dropbox and I help a lot of our customers do that too. >> Alberto, talk about the mindset involved. It's the most counter intuitive. Whoa! Whoa! Risk! The biggest system. >> Yeah >> I don't want to touch those. They're working fine right now. And then these problems just gestate, they kind of hang around to the bin in the kitchen fire, this is okay, I don't want to touch it. The house is still working. So this is kind of a new mindset. Could you talk about what your take is on that? Is the industry there? I mean, it was a kind of a corner case, you had Netflix, you had the Chaos Monkey those days and then now it's a DevOps practice, for a lot of folks, you guys are involved in that. What's the appetite and what's the progress of chaos engineering in mainstream case? >> Yeah, it's interesting that you mentioned DevOps, and recently Gartner came up with a new, revisited DevOps framework that has chaos engineering in the middle of the lifecycle management of your application. And the reality is that systems have become so complex in infrastructure, so many layers of abstractions. You have hundreds of services if you're doing microservices, but even if you're not doing microservices, you have so many applications connected to each other, build really complex workflows and automation flows. It's impossible for traditional QA to really understand where the vulnerability are in terms of resiliency, in terms of quality. Too often the production environment is also too different from the staging environment, and so you need a fundamentally different approach to go and find where your weaknesses are and find them before they happen, before you end up finding yourself in a situation like the one we're into today and you are not prepared. And so, so much of what we talk about is giving a tool and the methodology for people to go and find these vulnerabilities. Not so much about creating chaos, but it's about managing chaos that is built into our current system and exposing those vulnerabilities before they create problem. And so that's a very scientific methodology and tooling that we bring to market and we help customers well. >> Tammy, I want to get your thoughts on something. We used to riff a lot with our 10th unit CUBE, we've had a lot of conversation we've riffed over the years, but you know when the surge of Amazon web services came out it was pretty obvious that cloud's amazing and look at the startups that were born, you mentioned Dropbox, you worked there. These companies, all these born on the cloud, these hyper scale, companies built from scratch, great way to scale up. And we used to joke about Google, people would say, "I would like a cloud like Google," but no one has Googles use cases. And Google really pioneered the SRE concept, and you got to give 'em a lot of props for that. But now we're kind of getting to a world where it's becoming Google-like. There's more scale now than ever before. It's not a corner case, it's becoming more popular and more of a preferred architecture, this large scale. What's your assessment of the main stream enterprises, how far are they in your mind, are they there with chaos? Are they close? Are they doing it? How does someone develop an SRE practice to get the Google-like scale? 'Cause Google has an amazing network, they got large scale cloud, they have SRE's, they've been doing it for years. How does a company that's transforming their IT (laughs) have SRE's? >> That's a great question. I get asked this a lot as well. One of our goals at Gremlin is to help make the internet more reliable for everybody. Everyone using the internet, all of the engineers who are trying to build reliable services, and so I'm often asked by companies all over the world, how do we create an SRE practice and how do we practice chaos engineering? But you can get started actually rolling out your SRE program. Based on my experiences, I've done it. So when I worked at Dropbox, I worked with a lot of people who had been at Google, they've been at YouTube, they were there when SRE was rolled out across those companies, and then they brought those learnings to Dropbox, and I learned from them. But also the interesting thing is if you look at enterprise companies, so large banks. Say for example, I worked at the National Australia Bank for six years, we actually did a lot of work that I would consider chaos engineering and SRE practices. So for example, we would do large scale disaster recovery, and that's where you'd fail over an entire data center to a secret data center in an unknown location, and the reason is 'cause you're checking to make sure that everything operates okay if there's a nuclear blast. That's actually what you have to do and you have to do that practice every quarter. But if you think about it, it's not very good to only do it once a quarter. You really want to be practicing chaos engineering and injecting failure on purpose. I think actually, I prefer to do it three times a week, so I do it a lot. But I'm also someone who likes to work out a lot and be fit all the time so I know that if you do something regularly, you get great results. So that's what I always tell everyone. >> Yeah, get the reps in, as we say, get stronger, get the muscle memory. >> Yep, exactly. >> Guys, talk about the event that's coming up. You've got an event that was scheduled, physical event and then you were right in the planning mode and then the crisis hits. You're going digital, going virtual, it's really digital, but it's digital. It's on the internet. So how are you guys thinking about this? I know its out there. It's April 21st. Can you share some specifics around the event? Who should be attending and how do they get involved online? >> Yeah, the event really came together about a month ago when we started to see all the cancellations happening across the industry because of COVID-19 and we were extremely engaged in the community and we have a lot of talks and we were seeing a lot of conferences just dropping and so speakers losing their opportunity to really share their knowledge with respect with how you do reliability and topics that we focus on. And so we quickly pivoted as a company and created a new online event to give everyone in the community the opportunity to just failover to a new event as the conference name says and have those speakers who'll have lost their speaking slots have a new opportunity to go share their knowledge. And so that came together really quickly, we shared the idea with a dozen of our partners and everyone liked it and all the sudden this thing took off like crazy and just a month where we are approaching 4,000 registrations, we have over 30 partners signed up and supporting the initiative. A lot of past partners as well covering the event. So it was impressive to see the amount of interest that we were able to generate in such a short amount of time. And really, this is a conference for anybody who is interested in resiliency. If you want to know from the best on how to build business continuity across systems, people and processes, this is a great opportunity at no cost really. It's a free conference. >> And the target persona and the audience you want to have attend is what? SREs or folks doing architectural work? What's the target >> Yeah >> person to attend? >> Architects, SREs, developers, business leaders who care about the quality and the reliability of their applications, who need to help create a framework and a mindset for their organizations that speaks to what Tammy was saying a minute ago. Having that constant practice on a daily basis about go and finding how to improve things. >> You know, Tammy we've been going to physical events with theCUBE and extracting the signal from the noise and distributed it digitally for 10 years and I got to ask you because now that those events have gone away, you talk about chaos and injecting failure. Doing these digital events is not as easy as just live streaming, it's hard to replicate the value of a physical event, years of experience and standards, roles and responsibilities to digital. A different consumption environment, it's asynchronous, you're trying to create a synchronous environment. It's its own complex system, so I think a lot of people who are experimenting and learning (laughs) from these events because it's pretty chaotic. So, I'd love to get your thoughts on how you look at these digital events as a chaos engineer. How should people be looking at these events? How are you guys looking at... I mean, obviously you want to get the program going, get people out there, get the content, but to iterate on this, how do you view this? >> It is really different. So I actually like to compare it to fire drills in SRE. So often what you do there is you actually create a fake incident or a fake issue, so you just, you were saying, "Let's have a fire drill." Similar to when you're in a building and you have a fire drill that goes off and you have wardens and everything and you all have to go outside. So we can do that in this new world that we're all in all of the sudden. A lot people have never run an online event and now all of a sudden they have to. So what I would say is like, do a fire drill. Run a fake one before you do the actual one to make sure that everything does work okay. My other tip is make sure that you have backup plans. Backup plans on backup plans on backup plans. As an SRE, I always have at least three to five backup plans. I'm not just saying plan A and plan B, but there's also a C, D, and E and I think that's very important and even when you're considering technology, one of the things we say with chaos engineering is, if you're using one service, inject failure and make sure that you can fail over to a different alternative servers in case something goes wrong. >> Yeah, hence the Failover Conference, which is the name of the conference. (chuckles) >> Exactly! >> Yeah, well we certainly are going to be sending a digital reporter there, virtually. If you need any backup plans, obviously we have the remote interviews here. If you need any help, let us know, really appreciate it. Great to see you guys. And thanks for sharing. Any final thoughts on the conference? What happens when we get through the other side of this? I'll give you guys a final word. We'll start with Alberto, with you first. >> Yeah, I think when we are on the other side of this, we'll understand even more the importance of effective resilience, architecting and testing. As a provider of tools and methodologies for that, we think we will be able to help customers when we do a significant leap forward on that side. And the conference is just super exciting. I think it's going to be a great event. I encourage everyone to participate. We have tremendous lineup of speakers that have incredible reputation in their field so I'm really happy and excited about the work that the team has been able to do with our partners put together at this type of event. >> Okay, Tammy. >> Yeah, for me, I'm actually going to be doing the opening keynote for the conference and the topic that I'm speaking about is that reliability matters more now than ever. And I'll be sharing some, bizarre, weird incidents that I have worked on myself that I have experienced, really critical strange issues that have come up. But yeah, I'm really looking forward to sharing that with everybody else, so please come along, it's free. You can join from your own home and we can all be there together to support each other. >> You got a great community support and there's a lot of partners, Press Media and ecosystem and customers, so congratulations Gremlin, having a conference on April 21st called the Failover Conference. TheCUBE and SiliconANGLE have a digital reporter there that will be covering the news. Thanks for coming on and sharing. I appreciate the time. I'm John Furrier in the Palo Alto studio with remote interview with Gremlin around their Failover Conference, April 21st. It's really demonstrating, in my opinion, the at scale problems that we've been working on the industry, now more applicable than ever before as we get post-pandemic with COVID-19. Thanks for watching. Be back. (calm music)

Published Date : Apr 8 2020

SUMMARY :

this is theCUBE Conversation. and of course, the story we're going to and people around the world go, and reliability often in the context and your friends in Italy, making the impact lower so you can contain the blast radius. and that if you think about the world and then you can go ahead and actually... Alberto, talk about the mindset involved. in the kitchen fire, this is okay, and the methodology for people to go and look at the startups that were born, and so I'm often asked by companies all over the world, Yeah, get the reps in, as we say, get stronger, and then you were right in the planning mode and all the sudden this thing took off like crazy and the reliability of their applications, and I got to ask you because now and you all have to go outside. Yeah, hence the Failover Conference, Great to see you guys. that the team has been able to do and the topic that I'm speaking about and customers, so congratulations Gremlin,

ENTITIES

Entity	Category	Confidence
Tammy	PERSON	0.99+
Alberto Fernando	PERSON	0.99+
Alberto	PERSON	0.99+
80%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Italy	LOCATION	0.99+
20%	QUANTITY	0.99+
Milan	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
April 21st	DATE	0.99+
4,000 registrations	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Bergamo	LOCATION	0.99+
six years	QUANTITY	0.99+
Dropbox	ORGANIZATION	0.99+
National Australia Bank	ORGANIZATION	0.99+
Alberto Farronato	PERSON	0.99+
COVID-19	OTHER	0.99+
10 years	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
April 2020	DATE	0.99+
Tammy Butow	PERSON	0.99+
Gremlin	PERSON	0.99+
One	QUANTITY	0.99+
Boston	LOCATION	0.99+
over 30 partners	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
10th unit	QUANTITY	0.99+
YouTube	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.99+
first	QUANTITY	0.98+
Netflix	ORGANIZATION	0.98+
today	DATE	0.98+
one service	QUANTITY	0.97+
once a quarter	QUANTITY	0.97+
one	QUANTITY	0.97+
Gremlin	ORGANIZATION	0.97+
SiliconANGLE	ORGANIZATION	0.96+
Failover Conference	EVENT	0.96+
500 million customers	QUANTITY	0.96+
TheCUBE	ORGANIZATION	0.96+
hundreds of services	QUANTITY	0.95+
Gremlin	LOCATION	0.95+
first one	QUANTITY	0.95+
three times a week	QUANTITY	0.95+
five backup plans	QUANTITY	0.94+
two very important metrics	QUANTITY	0.94+
a month ago	DATE	0.94+
five critical systems	QUANTITY	0.93+
a month	QUANTITY	0.92+
a dozen	QUANTITY	0.89+
Googles	ORGANIZATION	0.88+
theCUBE Conversation	EVENT	0.88+
SRE	ORGANIZATION	0.83+
DevOps	TITLE	0.83+
two two great guests	QUANTITY	0.82+
CUBE	COMMERCIAL_ITEM	0.82+
pandemic	EVENT	0.81+

UNLISTED FOR REVIEW Tammy Butow & Alberto Farronato, Gremlin | CUBE Conversation, April 2020

from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation hello everyone welcome to the cube conversation here in Palo Alto our studios of the cube I'm showing for your host we're here during the crisis of Cove in nineteen doing remote interviews I come into the studio we've got a quarantine crew or here getting the interviews getting the stories out there and of course the story we continue to talk about is the impact of Kovan 19 and how we're all getting back to work either working at home or working remotely and virtually certainly but as things start to change we can start to see events mostly digital events and we're here to talk about an event that's coming up called the failover conference from gremlin which is now gone digital because it's April 21st but I think what's important about this conversation that I want to get into is not only talk about the event that's coming up but talk about these scale problems that are being highlighted by this change in work environment working at home we've been talking about the at scale problems that we're seeing whether it's a flood of surge of traffic and the chaos that's ensuing across the world with this pandemic so I'm excited have two great guests Alberto Ferran auto senior vice president marketing gremlin and Tammy Bhutto principal site reliability engineer or SRE guys thanks for coming on appreciate it thank you Thank You Alberto I want to get to you first you know we've known each other before you've been in this industry we all we've been all been talking about the cloud native cloud scale for some time it's kind of inside the ropes it's inside baseball Tami your site reliability engineer everyone knows Google knows how well cloud works this is large-scale stuff now with The Cove in 19 we're starting to see the average person my brother my sister our family members and people around the world go oh my god this is really a high impact this change of behavior the surge of you know whether whether it's traffic on the internet or work at home tools that are inadequate you start to see these statistical things that were planned for not working well and this actually Maps the things that we've been talking about it in our industry Alberto you've been on this how you guys doing and what's your what's your take on this situation we're in right now yeah yeah we're we're doing pretty well as a company we were born as a distributed organization to begin with so for us working in a distributed environment from all over the world is is common practice day-to-day personally you know I'm originally from Italy my parents my family is Milan and Bergen audible places so I have to follow the news with extra care and so much in me it becomes so much clearer nowadays that technology is not just a powerful tool to enable our businesses but it also is so critical for our day-to-day life and thanks to you know video calls I can easily talk to my family back there every day Wow so that's that's really important so yes we've been talking for a long time as you mentioned about complex systems at scale and reliability often in the context of mission-critical applications but more and more these systems need to be reliable also when it comes to back office systems that enable people to continue to work on a daily basis yeah well our hearts go out to your family and your friends in Italy and hope everyone's stay safe there no that was a tough situation continues to be a challenge Tammy I want to get your thoughts how is life going for you you're a sight reliable engineer what you deal with on the tech side is now happening in the real world it's it's almost it's mind-blowing and to me that we're seeing these these things happen it's it's a paradigm that needs attention and whew look at it as a sre dealing a most from a tech side now seeing it play out in real life it's such an interesting situation really terrible so one of the things that I specialize in as a site reliability engineer is incident management and so for example I previously worked at Dropbox where I was you know the incident manager on call for 500 million customers you know it's like 24/7 and these large-scale incidents you really need to be able to act fast there are two very important metrics that we track and care about as a site reliability engineer the first one is mean time to detection how fast can you detect what something is happening obviously if you detect an issue faster and you've got a better chance of making the impact lower so you can contain the blast radius I like to explain it to people like if you have a fire in your sauce bin in your kitchen and you put it out that's way better than waiting until your entire house is on fire and the other metric is mean time to resolution so how long does it take you to recover from the situation so yeah this is a large-scale global incident right now that we're in yeah I know you guys do a lot of talk about chaos theory and that applies a lot of math involved we all know that but I think when you go look at the real world this is gonna be table stakes and you know there's now a line in the sand here you know pre-pandemic post pandemic and i think you guys have an interesting company gremlin in the sense that this is this is a complex system and if you think about the world we're going to be living in whether it's digital events that you guys are have one coming up or how to work at home or tools that humans are going to be using it's going to be working with systems right so you have this new paradigm gonna be upon us pretty quickly and it's not just buying software mechanisms or software it's a complex system it's distributed computing and operating so I mean this is kind of the world can you guys talk about the gremlin situation of how you guys are attacking these new problems and these new opportunities that are emerging one of the things that I've always specialized in over the last 10 years is chaos engineering and so the idea of chaos engineering is that you're injecting failure on purpose to uncover weaknesses so that's really important in distributed systems with distributed you know cloud computing all these different services that you're kind of putting together but the idea is if you can inject failure you can actually figure out what happens when I inject that small failure and then you can actually go ahead and fix it one of the things I like to say to people is you know focus on what your top 5 critical systems are let's fix those first don't go for low-hanging fruit fix the biggest problems first get rid of the biggest amount of pain that you have as a company and then you can go ahead and like actually if you think about Pareto principle the 80/20 rule if you fix 20% of your biggest problems you actually solve 80% of your issues that always works something that I've done while working at National Australia Bank doing chaos engineering also what gremlin at Dropbox and I help a lot of our customers do that to albariño talk about the mindset involved it's almost counterintuitive whoa-oh-oh risk the biggest system and I don't want to touch those there working fine right now and then these problems just gestate they kind of hang around to the bin in the kitchen fire you know mist okay I don't want to touch it the house is still working so this is kind of a new mindset could you talk about what your take is on that is the industry there I mean oh it was a kind of a corner case you know you had Netflix you had the chaos monkey those days and then now it's the DevOps practice for a lot of folks you guys are involved in that what's the what's the appetite what's the progress of chaos engineering and mainstream yeah it's interesting that you mentioned DevOps and you know recently Gartner came up with a new revisited devil scream work that has chaos engineering in the middle of the lifecycle of your application and the reality is that systems have become so complex in infrastructure so many layers of abstractions you have hundreds of services if you're doing micro services but even if you're not doing micro services you have so many applications connected to each other build really complex workflows and automation flows it's impossible for traditional QA to really understand well the vulnerability are in terms of resiliency in terms of quality too often the production environment is also too different from the staging environment and so you need a fundamentally different approach to go and find where your weaknesses are and find them before they happen before you end up finding yourself in a situation like the one we're in today and you're not prepared and so much of what we talk about is giving it >> and the methodology for people to go and find these vulnerabilities not so much about creating cause chaos but it's about managing sales that is built into our current system and exposing those vulnerabilities before they create problem and so that's a very scientific methodology and and and tooling that we would bring to market and we help customers with Tammy I want to get your thoughts on so you know we used to riff a lot of to our 10th you know cube we've had a lot of conversation we've ripped over the over the years but you know when the surge of Amazon Web Services came out as pretty obvious the clouds amazing and look at the startups that were born you mentioned Dropbox you work there these comings and all these born in the cloud these hyper scale comes built from scratch great way to scale up and we used to joke about Google people say I would like a cloud like Google but no one has Google's use cases and Google really pioneered the sre concept and you gotta give them a lot of props for that but now we're kind of getting to a world where it's becoming Google like there's more scale now than ever before it's not a corner case it's becoming more popular and more of a preferred architecture this large scale what's your assessment of the of the mainstream enterprises how far are they did in your mind our way are they there with Castle they clothed how they doing it how does someone take how does someone develop an SRE practice to get the Google like scale because Google has an amazing network they got large-scale cloud they have sres they've been doing it for years how does a company that's transforming their IT have expertise it's a great question I get asked this a lot as well one of our goals at Bremen is to help make Internet more reliable for everybody everyone using the Internet all of the engineers who are trying to build reliable services and so I'm often asked by you know companies all over the world how do we create an SRE practice and how do we practice chaos engineering and so actually how you can get started actually rolling out your sre program based on my experiences I've done it so when I worked at Dropbox I worked with a lot of people who had been at Google they've been at YouTube they were there when was rolled out across those companies and then they brought those learnings to Dropbox and I learned from them but also the interesting thing is if you look at enterprise companies so large banks say for example I worked at a National Australia Bank for six years we actually did a lot of work that I would consider chaos engineering and sre practices so for example we would do large-scale disaster recovery and that's where you fail over an entire data center to a secret data center in an unknown location and the reason is because you're checking to make sure that everything operates okay if there's a nuclear blast that's actually what you have to do and you have to do that practice every quarter so but but if you think about it it's not very good to only do it once a quarter you really want to be practicing chaos engineering and injecting failure on this I think actually my I prefer to do it three times a week do I do it a lot but I'm also someone who likes to work out a lot and be fit all the time so I know that do something regularly you get great results so that's what I always tell us yeah I get the reps in as we say you know get get stronger at the muscle memory guys talk about the event that's coming up you got an event that was schedules physical event and then you were right in the planning mode and then the crisis hits you going digital going virtual it's really digital but it's digital that's on the internet so how are you guys thinking about this I know I it's out there it's April 21st can you share some specifics around the event well who should be attending and how they get involved online yeah yeah they vent really came about about together about a month ago when we started to see all the cancellations happening across the industry because of code 19 and we are extremely engaged with in the community and we have a lot of talks and we are seeing a lot of conferences just dropping and so speakers losing their opportunity to share their knowledge with respect to how you do reliability and topics that we focus on and so we quickly people it as a company and created a new online event to give everyone in the community the opportunity to you know they'll over to a new event as the president as a as the conference name says and and have those speakers will have lost their speaking slots have a new opportunity to go share their knowledge and so that came together really quickly we share the idea with a dozen of our partners and everyone liked it and all the sudden this thing took off like crazy in just a month where we are approaching you know four thousand registrations we have over 30 partners signed up and supporting the initiative a lot of a lot of past partners as well covering the event so it was impressive to see the amount of interest that that we were able to generate in such a short amount of time and really this is a conference for anybody who is interested in resilience and if you want to know from the best on how to build business continuity of persistence people and processes this is a great opportunity at no cost we need some free conference and the target persona and the audience you want to have a ten is what Sree Zoar folks doing architectural work and what's that that's the target yes and to attend our cadets s Ari's developers business leaders who care about the quality and reliability of their applications who need to help create a framework and a mindset for their organization that speaks to what Tammy was saying a minute ago having that constant crap is on a daily basis about who and finding how to improve things you know Tammy we've been doing going to physical events with the cube and extracting the signal of the noise and distributing it digitally for ten years and I got to ask you because now that those are those events have gone away you talk about chaos and injecting failure these doing these digital events is not as easy it's just live streaming it's it's hard to replicate the value of a physical event years of experience and standards roles and responsibilities to digital different consumption environments a synchronous you're trying to create a synchronous environment it's its own complex system so I think a lot of people are experimenting and learning from these events because it's pretty chaotic so I'd love to get your thoughts on how you look at these digital events as a chaos engineer how should people be looking at these events how are you I was looking at it you know I also want to get the program going get people out there get the content but you have to iterate on this how do you view this it is really different so I actually like to compare it to fire drills in SRA so often what you do there is you actually create a fake incident or a fake issue so you just you know you're saying let's have a fire drill similar to like you know when you're in a building and you have a fire drill that goes off you have wardens and everything and you all have to go outside so we can do that in this new world that we're all in all of a sudden you know a lot of people have never run an online event and now all of a sudden they have to so what I would say is like do a fire drill um run up you know a baked one before you do the actual on one to make sure that everything does work okay my other tip is make sure that you have backup plans backup plans on backup plans on backup plans like as in SRA I always have at least three to five backup plans like I'm not just saying plan a and Plan B but there's also a C D and E and I think that's very important and you know even when you're considering technology one of the things we say with chaos engineering is you know if you're using one service inject failure and make sure that you can fail over to a different alternative service in case something goes wrong yeah hence the failover conference which is the name of the conference yeah yeah well we certainly are gonna be sending a digital reporter there virtually if you need any backup plans obviously we have the remote interviews here if you need any help let us know really appreciate it I'll great to see you guys and thanks for sharing any final thoughts on the conference how what what happens when we get through the other side of this I'll give you guys a final word we'll start with Alberto with you first yeah I think one when we are on the other side of this will will understand even more the importance of effective resilience architecting and and and testing I think you know as a provider of tools and methodologies for that we we think we will be able to help customers do we do a significant leap forward on that side and the conference is just super exciting I think it's going to be a great I encourage everyone to participate we have tremendous lineup of speakers that have incredible reputation in their fields so I'm really happy and and excited about the work that the team has being able to do with our partners put together this type of event okay Tammy yes ma'am I'm actually going to be doing the opening keynote for the conference and the topic that I'm speaking about is that reliability matters more now than ever and I'll be sharing some you know bizarre weird incidents that I've worked on myself that I've experienced you know really critical strange issues that have come up but yeah I just I'm really looking forward to sharing that with everybody else so please come along it's free you can join from your own home and we can all be there together to support each other you got a great community support and there's a lot of partners press media and an ecosystem and customers so congratulations gremlin having a conference on April 21st called the failover conference the qubits look at angle we'll have a digital reporter there we covering the news thanks for coming on and sharing and appreciate the time I'm Jeff we're here in the Palo Alto series with remote interview with gremlin around there failover conference April 21st it's really demonstrating in my opinion the at scale problems that we've been working on the industry now more applicable than ever before as we get post pandemic with kovin 19 thanks for watching be back [Music]

Published Date : Apr 7 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Tammy	PERSON	0.99+
April 21st	DATE	0.99+
Milan	LOCATION	0.99+
20%	QUANTITY	0.99+
April 2020	DATE	0.99+
Palo Alto	LOCATION	0.99+
Tammy Bhutto	PERSON	0.99+
six years	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Italy	LOCATION	0.99+
Alberto Farronato	PERSON	0.99+
ten years	QUANTITY	0.99+
Jeff	PERSON	0.99+
Alberto	PERSON	0.99+
National Australia Bank	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Tammy Butow	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
National Australia Bank	ORGANIZATION	0.99+
two very important metrics	QUANTITY	0.99+
nineteen	QUANTITY	0.99+
Bergen	LOCATION	0.99+
over 30 partners	QUANTITY	0.99+
Dropbox	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.98+
Tami	PERSON	0.98+
10th	QUANTITY	0.98+
a month	QUANTITY	0.98+
hundreds of services	QUANTITY	0.98+
one	QUANTITY	0.97+
four thousand registrations	QUANTITY	0.97+
three times a week	QUANTITY	0.97+
YouTube	ORGANIZATION	0.97+
first one	QUANTITY	0.97+
gremlin	PERSON	0.96+
Alberto Ferran	PERSON	0.96+
first	QUANTITY	0.96+
Netflix	ORGANIZATION	0.95+
today	DATE	0.94+
once a quarter	QUANTITY	0.93+
ten	QUANTITY	0.93+
one service	QUANTITY	0.93+
pandemic	EVENT	0.92+
code 19	OTHER	0.9+
500 million customers	QUANTITY	0.89+
two great guests	QUANTITY	0.88+
five backup	QUANTITY	0.84+
Bremen	ORGANIZATION	0.84+
about a month ago	DATE	0.83+
lot of people	QUANTITY	0.8+
pandemic post pandemic	EVENT	0.79+
The Cove	ORGANIZATION	0.79+
a minute ago	DATE	0.79+
failover	EVENT	0.78+
a lot of people	QUANTITY	0.78+
80% of your issues	QUANTITY	0.77+
Kovan 19	EVENT	0.76+
pre-	EVENT	0.76+
19	QUANTITY	0.75+
every quarter	QUANTITY	0.75+
failover conference	EVENT	0.75+
Sree Zoar	ORGANIZATION	0.75+
top 5 critical systems	QUANTITY	0.73+
DevOps	TITLE	0.72+
19	DATE	0.7+
one of	QUANTITY	0.7+

Tammy Bryant | PagerDuty Summit 2020

>> Presenter: From around the globe, it's the cube, with digital coverage of pager duty summit 2020. Brought to you by pager duty. >> Welcome to this cube conversation. I'm Lisa Martin, today talking with Tammy Bryant is a cube alumna, the principal Site reliability engineer at Gremlin and the co-founder and CTO of the Girl Geek Academy. Tammy, it's great to have you on the program again. >> Hi Lisa, thanks so much for having me again. It's great to be here. >> So one of the things I saw in your background 10 plus years of technical expertise, and SRE, and chaos engineering, and I thought chaos engineering, I feel like I'm living in chaos right now. What is chaos engineering and why do you break things on purpose? >> Yep. So the idea of chaos engineering is that we're, breaking systems but in a thoughtful controlled way, to identify weaknesses in systems. So that's really what it's all about. The idea there is, you know, When you're doing really complicated work with technical systems, so like, for example, distributed systems and say, for example, you're working at a bank, it's tough to be able to pinpoint the exact failure mode that could cause a really large outage for your customers. And that's what chaos engineering is all about. you inject the failure proactively, to identify the issues and then you fix them before they actually cause really big problems for customers and you do it during the middle of the day, you know, when you're feeling great, instead of being paged in the middle of the night for an incident, that's actually like causing your customers pain, and making you lose a lot of money. So that's what chaos engineering really is. >> Are you seeing in the last six months since the world is so different, are you seeing an increase in customers? Now with, the for example, Brick and Mortars shut down and everything having to convert to digital if it wasn't already? Is there an increase in demand for chaos engineering services? >> Yeah, definitely. So a lot of people are asking what is chaos engineering, how can I use ,it will it help me reduce my incidents? and definitely because there are a lot of new services that have been rolled out recently, say, for example, curbside pickup. That's a whole new thing that had to be created really recently to be able to handle a large amount of load. And you know, people show up, they want to get their product really fast, 'cause they want to be able to just get back home quickly. And that's something that we've been working on with our customers is to make sure that curbside pickup experience is really great. The other interesting thing that we've been working on because of the pandemic is making sure that banks are really reliable, and that customers are able to get access to their money when they need it. And able to see that information too. And you can imagine that not as when you're in lockdown, and you only can leave your house for maybe an hour a day, you need to be able to quickly get access to your money to buy food, and we've seen some big incidents recently, where that hasn't been the case. Yeah. >> And I can imagine I mean, just thinking of what happened with, everything six months ago and how people were, we are just, demanding, right, consumers were demanding, we expect to get whatever we want, whether it's something we buy on Amazon, something that we stream on Netflix, or whatnot, we have this expectation that we can almost get it in real time. But there was a there was, you know what, there was a delay a few months ago, and there still is to some degree. But companies like Amazon and Netflix, I can imagine, really must have a big focus on chaos engineering, to test these things regularly. And now have proved, I would imagine to some degree that with chaos engineering that they have built, they're built to withstand that. >> Yes, exactly. So our founders at Gremlin came from Netflix and Amazon, our CEO had worked at both where he done chaos engineering, and that's actually why he decided to create Gremlin. It's the first company in the world to offer chaos engineering as a service. And you know, obviously, when you're working somewhere like Netflix, you know the whole product, you have to be able to get access to that movie, that TV show, right in that moment, and also customers expect to be able to see that on for example. There PlayStation in their living room and it should work and there paying for a subscription, So, to be able to keep them on that subscription, you need to offer a great service. Same thing with Amazon, you know, Amazon.com, they've done a lot of chaos engineering work over many years now to be able to make sure that everything is available. And it's not just that, the entire amazon.com is up and running. It's also for example, that when you go and look at a page that the recommendation service works toO and they're able to show you, hey, here's some other things that you might like to get to buy at this time. And I like as as a consumer, I love that 'cause it helps me save time and effort and even money as well 'cause it's giving you some good advice. So that's the type of statement we do. >> Exactly, So. when you're working with customers, I'd love to understand just a little bit from the, like the conversational standpoint is this now, is chaos engineering now, at kind of the sea level or is it still sort of in within the engineering folks 'cause looking at this as a make or break, knowing that for example, Netflix, there's Hulu, there's Disney Plus, there's Apple TV. Plus, if we don't get something that we're looking for right away, there's prime, we're going to go to another streaming service. So are you starting to see like an increase in demand from companies that no, we have competition right behind us, we've got to be able to set up the infrastructure and ensure that it is reliable. Now more than ever. >> Yeah, exactly. That's really, really important. I'm seeing a lot of executives. I mean, I've seen that since the beginning, really, since I first started working at Gremlin. I would often be invited by executives to come and give talks actually, within their company, to help the teams learn about chaos engineering, and I love doing that, It's really great. So I'd be invited by C levels, or VPs, from different departments. And I often get people adding me on LinkedIn from all over the world who are in leadership roles, because really, like, you know, they're responsible for making sure that their companies can hit those critical metrics and make sure that they're able to achieve their really, you know, demanding business goals, and then they're trying to help their teams be able to achieve that, too. So I've actually been so pleased to see that as well. Like it is really cool to have an executive reach out and say, hey, I'm thinking of helping my team, I'd like to get them introduced to you can you come and just teach them about this topic? And I love being able to do that it's really positive. And it's the right way to improve. >> It is, and I think nowadays, with reliability being more important than ever, you know, we talked to leaders from industry, from every industry. And there are certain things right now that are going to be shaping the winners and the losers of tomorrow. And it sounds to me like chaos engineering is one of those things that's going to be fundamental to any type of business to not just survive these times, but to thrive going forward. >> Yes, I definitely think so. I mean, obviously, people can easily just go to a different URL and try and use a different service. And you know, we're seeing now failure across so many different industries. We didn't see that before. But for example, you know, I'm sure you've seen in the news or heard from friends and family about schools, now being completely online. And then kids can't actually access, their calls their resources, what they need to learn every day. So that really just shows you how much it's impacting us as a society, we really know that the internet is critical. It's amazing that we have the internet, like how lucky we are to have this, but it needs to work for us to actually be able to get value out of it. And that's what chaos engineering is all about. You know, were able to make sure that everything is reliable, so it's up and running. And we do that by looking at things like redundancy. So we'll do failover work where we completely shut down an application or service and make sure it gracefully fails over. We also do a lot of dependency failure work, where you're actually looking to say, this is the critical path of this service. And a lot of people don't think about this, but the critical path really starts at sign in. So you need to make sure that login and sign in works really well. It's not just about like the experience once you've signed in, that has to work well all the way through. So actually if you have a good understanding of user experience, it helps you create a much better pathway and understand those critical pieces that the customer needs to be able to do to have a great experience. And I care a lot about that. Like whenever I go and work somewhere, I always read customer tickets, I always try and understand what are the customer pain points. And I love listening to customers and then just solving their problems. The last thing I want them to do is, you know, be complaining or be really annoyed on Twitter because something just isn't working when they need it to be working. And it is really critical these days. It's a the internet is a really serious part of our day to day life. >> Oh, it's a lifeline. I mean, that's, some folks. It's the only way that they're connecting with the outside world, is through the internet. So when things aren't, I had a friend whose son first day of college couple weeks ago, freshman year, first class couldn't get into zoom. And that's a stressful situation. But I imagine too, though, that and I know you're going to be speaking at the pager duty summit that more folks need to understand what this is. And I can tell the you have a real authentic passion for it. Talk to us about what you're going to be talking about at the pager duty summit. >> Sure thing, I'm really excited to be speaking at Pager Duty Summit very soon. My talk is called building, and scaling SRE teams, so site reliability engineering teams. And this is something that I've done previously. I've built out the SRE teams at Dropbox for both databases as well as storage. So block storage, and then I also lead the code workflows team. And that's for, you know, over 500 million users, people accessing the critical data that they store on Dropbox all the time. You know the way that folks use Dropbox is in so many different ways. Maybe it's like really famous music musicians who are trying to create an amazing new album that happens or maybe it's a lawyer preparing for a court case, and they need to be able to access their documents. So those are a lot of customer stories that would come up over time. And prior to that, I worked at the National Australia Bank as well leading teams too and obviously like people care about their money if they can't access their money. If there incorrect transactions, if there are missing transactions, you know, duplicate transactions, maybe people don't mind so much about it you get like a double deposit, but it's still not good from the bank's perspective. So there's all types of different chaos that can happen. And I found it to be really interesting to be able to dive into that and make sure that you can make improvements. And I love that it makes customers happier. And also, it helps you improve your company as a whole. So it's a really good thing to be able to do, And with my talk, I'm going to talk to folks about, you know, not only why it's important to build out a reliability practice at your organization, you know, back in the day, people used to go, why would you need a security team? You know, why would we need that? now everybody has a security team, everyone has a chief security officer as well. But why don't we focus on reliability, like we know that we see incidents out in the news all the time, but for some reason, we don't have the chief reliability officer. I think that's definitely going to be something that will appear in the future just like the chief security officer roll up. But that's what I'm going to talk about there. How you can find site reliability engineers, I'll share a few of my secrets. I won't give any spoilers out. But there's actually quite a few places that you can find amazing people. There's even a school that you can hire them from, which I've done in the past. And then I'll talk to you about how you can interview them to make sure that you get the best people on your team. There are a number of things that I think are very important to interview for. And then once you've got those folks on your team, I'll talk to you about how you can make sure that they're successful. How to set them up for success and make sure that they're aligned to not only your business goals, but also your core values as a company, which is really important too. >> Yeah, that's fantastic. It's very well rounded, I'm curious, what are some of the the characteristics that you think are really critical for someone to become a successful SRE? >> Yeah, so there's a few key things that I look for. One thing is that, somebody who is really good at troubleshooting, so they need to be able to be comfortable with complexity, ambiguity and open ended challenges and problems and also thrive in those types of environments. Because often you're seeing something that you've never seen happen before. And also you're working with really complicated systems. So you just need to be able to feel good in that moment. And you can test for that during an interview question on troubleshooting and debugging. So that's something that I'll go into in more detail. But that's definitely the first characteristic. The other thing, of course, is you want to have someone who is good at being able to build solutions. So they can code, they understand automation, they can figure out how can I take this pain point, this problem? And how can I automate it and then scale this out and make it available for everyone across my organization? So someone who has that mindset of building tools for others, and often they are internal tools, because maybe you're building a tool that helps everybody know, who's on call every single critical service at the company and also non critical service and they can identify that in a minute or less like maybe even just in a few seconds, and then they can quickly get that person involved, if anything need to escalate to them. Via for example, a tool like pager duty, that's really what you want. You want them to be able to think, how can I just make this efficient? How can I make sure that we can get really great results? And yeah, I think they also just need to be really personable too and work well in a really complicated organizational structure. Because usually they have to work with the engineering team, the finance team to understand the revenue impact. They need to be able to work with the PR team and the social media team, if they're incidents, and then they need to provide information about when this incident is going to be resolved, and how they can update VIP customers. They need to talk to the sales team, because what happens if you're giving a demonstration, and then somehow there's an issue, or failure that happens, an incident and then in the middle of your very important sales demo, you're not able to actually deliver it that can happen a lot too. So there are a lot of very important key skills. >> Sounds like it's a really cross functional role, pivotal to an organization, that needs to understand how these different functions not only operate, but also operate together, is that somebody that you think has certain types of previous work experience? Is this something that you talked to the Girl Geek Academy girls about? How did they get into? I'm curious, like what the career path is? >> Yeah, it's interesting, like I find a lot of SRE's often come from either a few different backgrounds. One is they came through the world of Linux and understanding systems, and just being really interested in that. Like deep diving into the kernel, understanding how to improve performance of systems. The other side is maybe they came from coding background where they were actually building applications and features. I started off actually on that side, but I also had a passion for Linux. And then I sort of spread over into the other side and was able to learn both. And then often you know, someone who's comfortable with being on call and handling incidents, but it is a lot of skills, like that's actually something that I often talk to folks about, and they asked me how can I become a great SRE? There's so many things I need to learn. And I just say, you know, take it slow, try and gradually increase your number of skills. People often say that there is like there's some curve for SRE's, where you have the operations side, on one side, and then the coding side on the other. And often like the best person sits right in the middle where they have both ops and engineering skills. But it's really hard to find those people. It's okay if you have someone that's like, really deep, has amazing knowledge of Linux and scaling systems and internet management, and then you can pair them up with a really amazing programmer who's great at software engineering and software architecture, that's okay, too. >> We've been hearing for a long time about this sort of negative unemployment with respect to cyber security professionals. Is that, are you guys falling into that same category as well with SRE? Or is it somehow different or you just know this is exactly what we're looking for? We want to go out there, and even in the Girl, Greek Academy, maybe help girls learn how to be able to find what I imagine are a lot of opportunities. >> Yeah, there are so many opportunities for this. So it's definitely an opportunity because what I see is there's not enough SRE's. So tons of companies all over the world will actually ping me and say, hey, Tommy, how do I hire SRE's, that's why I decided to give this talk because I wanted to package that up and just share that information as to how you can do it. And also, maybe you can't find the SRE's because they don't exist. But you can help retrain your team. So you can have an engineer learn the skills that are required to be an SRE, that's totally possible too, maybe move them over to become an SRE. With girl geek Academy, one of the things that I've done is run hackathons and workshops and just online training sessions to help girls learn these new skills. So that's exactly what our mission is, is to teach 1 million girls technical skills by 2025. And I love to do mentoring at scale, which is why it's been really cool to be able to do it online and through these like workshops and remote hackathons. And I definitely love to do something where else work with some of our customers actually, and run an event. I did one a while back, it was really cool, we were able to have all of the girls come in and be at the customer's office and actually learn skills with the customer, which was really fun. And it helps them actually think, hey, I could work one day that would be really amazing. And I'm going to do that again in November. And it's kind of fun too. We can do things like have like, you know, dad and mom and then daughter day, where you actually bring your daughter to work and help her learn technical skills. That's really fun because they get to see what you do and they understand it more and see how cool chaos engineering really is. Then they think oh, wow, you're so awesome, this is great. >> I love it, that's fantastic. Well it sounds like, like I said before your passion for it is really there. What, I think is really interesting is how you're talking about chaos engineering and just the word in and of itself chaos. But you painted in such a positive lights critical business critical, but also the all the opportunities there that businesses have to learn and fine tune so such an interesting conversation. Yeah, Tammy. We have you back on the program. But I thank you so much for joining me today. And for those folks that lucky enough that are attending the pager duty summit, they're going to get to learn a lot from you. Thank you. >> Thanks so much for having me, Lisa. >> For Tammy Bryant, I'm Lisa Martin. You're watching this cube conversation. (upbeat music)

Published Date : Sep 10 2020

SUMMARY :

Brought to you by pager duty. and the co-founder and CTO It's great to be here. and why do you break things on purpose? and then you fix them and that customers are able to get access and there still is to some degree. and also customers expect to be able to and ensure that it is reliable. I'd like to get them introduced to you that are going to be shaping the winners the customer needs to be able to do And I can tell the you have a and make sure that they're aligned to that you think are really critical and then they need to And I just say, you know, take it slow, maybe help girls learn how to be able to they get to see what you do and just the word in and of itself chaos.

ENTITIES

Entity	Category	Confidence
Tammy Bryant	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Tammy	PERSON	0.99+
Lisa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tammy Bryant	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
Amazon.com	ORGANIZATION	0.99+
National Australia Bank	ORGANIZATION	0.99+
November	DATE	0.99+
Dropbox	ORGANIZATION	0.99+
10 plus years	QUANTITY	0.99+
Hulu	ORGANIZATION	0.99+
2025	DATE	0.99+
Gremlin	ORGANIZATION	0.99+
Girl Geek Academy	ORGANIZATION	0.99+
Brick and Mortars	ORGANIZATION	0.99+
amazon.com	ORGANIZATION	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
PlayStation	COMMERCIAL_ITEM	0.99+
Pager Duty Summit	EVENT	0.98+
Linux	TITLE	0.98+
six months ago	DATE	0.98+
One thing	QUANTITY	0.98+
Apple TV	COMMERCIAL_ITEM	0.98+
over 500 million users	QUANTITY	0.98+
Tommy	PERSON	0.98+
Twitter	ORGANIZATION	0.98+
Girl, Greek Academy	ORGANIZATION	0.98+
tomorrow	DATE	0.97+
first day	QUANTITY	0.97+
pager duty summit	EVENT	0.97+
an hour a day	QUANTITY	0.96+
1 million girls	QUANTITY	0.96+
couple weeks ago	DATE	0.96+
one	QUANTITY	0.96+
one day	QUANTITY	0.95+
first	QUANTITY	0.95+
both databases	QUANTITY	0.93+
pandemic	EVENT	0.93+
first company	QUANTITY	0.93+
few months ago	DATE	0.92+
One	QUANTITY	0.91+
first class	QUANTITY	0.9+
last six months	DATE	0.9+
prime	COMMERCIAL_ITEM	0.89+
first characteristic	QUANTITY	0.88+
single	QUANTITY	0.86+
pager duty summit 2020	EVENT	0.84+
double deposit	QUANTITY	0.83+
PagerDuty Summit 2020	EVENT	0.82+
Disney	ORGANIZATION	0.74+
one side	QUANTITY	0.73+
lot of money	QUANTITY	0.71+
SRE	ORGANIZATION	0.66+
girl	ORGANIZATION	0.58+
tons	QUANTITY	0.56+
people	QUANTITY	0.54+
Academy	ORGANIZATION	0.52+
few seconds	QUANTITY	0.49+
SRE	TITLE	0.44+
Plus	COMMERCIAL_ITEM	0.32+

Nenshad Bardoliwalla & Pranav Rastogi | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> OK, welcome back everyone we're here in New York City it's theCUBE's exclusive coverage of Big Data NYC, in conjunction with Strata Data going on right around the corner. It's out third day talking to all the influencers, CEO's, entrepreneurs, people making it happen in the Big Data world. I'm John Furrier co-host of theCUBE, with my co-host here Jim Kobielus who is the Lead Analyst at Wikibon Big Data. Nenshad Bardoliwalla. >> Bar-do-li-walla. >> Bardo. >> Nenshad Bardoliwalla. >> That guy. >> Okay, done. Of Paxata, Co-Founder & Chief Product Officer it's a tongue twister, third day, being from Jersey, it's hard with our accent, but thanks for being patient with me. >> Happy to be here. >> Pranav Rastogi, Product Manager, Microsoft Azure. Guys, welcome back to theCUBE, good to see you. I apologize for that, third day blues here. So Paxata, we had your partner on Prakash. >> Prakash. >> Prakash. Really a success story, you guys have done really well launching theCUBE fun to watch you guys from launching to the success. Obviously your relationship with Microsoft super important. Talk about the relationship because I think this is really people can start connecting the dots. >> Sure, maybe I'll start and I'LL be happy to get Pranav's point of view as well. Obviously Microsoft is one of the leading brands in the world and there are many aspects of the way that Microsoft has thought about their product development journey that have really been critical to the way that we have thought about Paxata as well. If you look at the number one tool that's used by analysts the world over it's Microsoft Excel. Right, there isn't even anything that's a close second. And if you look at the the evolution of what Microsoft has done in many layers of the stack, whether it's the end user computing paradigm that Excel provides to the world. Whether it's all of their recent innovation in both hybrid cloud technologies as well as the big data technologies that Pranav is part of managing. We just see a very strong synergy between trying to combine the usage by business consumers of being able to take advantage of these big data technologies in a hybrid cloud environment. So there's a very natural resonance between the 2 companies. We're very privileged to have Microsoft Ventures as an investor in Paxata and so the opportunity for us to work with one of the great brands of all time in our industry was really a privilege for us. Yeah, and that's the corporate sides so that wasn't actually part of it. So it's a different part of Microsoft which is great. You have also business opportunity with them. >> Nenshad : We do. >> Obviously data science problem that we're seeing is that they need to get the data faster. All that prep work, seems to be the big issue. >> It does and maybe we can get Pranav's point of view from the Microsoft angle. >> Yeah so to sort of continue what Nenshad was saying, you know the data prep in general is sort of a key core competence which is problematic for lots of users, especially around the knowledge that you need to have in terms of the different tools you can use. Folks who are very proficient will do ETL or data preparation like scenarios using one of the computing engines like Hive or Spark. That's good, but there's this big audience out there who like Excel-like interface, which is easy to use a very visually rich graphical interface where you can drag and drop and can click through. And the idea behind all of this is how quickly can I get insights from my data faster. Because in a big data space, it's volume, variety and velocity. So data is coming at a very fast rate. It's changing it's growing. And if you spend lot of time just doing data prep you're losing the value of data, or the value of data would change over time. So what we're trying to do would sort of enabling Paxata or HDInsight is enabling these users to use Paxata, get insights from data faster by solving key problems of doing data prep. >> So data democracy is a term that we've been kicking around, you guys have been talking about as well. What is actually mean, because we've been teasing out first two days here at theCUBE and BigData NYC is. It's clear the community aspect of data is growing, almost on a similar path as you're seeing with open source software. That genie's out the bottle. Open source software, tier one, it won, it's only growing exponentially. That same paradigm is moving into the data world where the collaboration is super important, in this data democracy, what is that actually mean and how does that relate to you guys? >> So the perspective we have is that first something that one of our customers said, that is there is no democracy without certain degrees of governance. We all live in a in a democracy. And yet we still have rules that we have to abide by. There are still policies that society needs to follow in order for us to be successful citizens. So when when a lot of folks hear the term democracy they really think of the wild wild west, you know. And a lot of the analytic work in the enterprise does have that flavor to it, right, people download stuff to their desktop, they do a little bit of massaging of the data. They email that to their friend, their friend then makes some changes and next thing you know we have what what some folks affectionately call spread mart hell. But if you really want to democratize the technology you have to wrap not only the user experience, like Pranav described, into something that's consumable by a very large number of folks in the enterprise. You have to wrap that with the governance and collaboration capabilities so that multiple people can work off the same data set. That you can apply the permissions so that people, who is allowed to share with each other and under what circumstances are they allowed to share. Under what circumstances are you allowed to promote data from one environment to another? It may be okay for someone like me to work in a sandbox but I cannot push that to a database or HDFS or Azure BLOB storage unless I actually have the right permissions to do so. So I think what you're seeing is that, in general, technology is becoming a, always goes on this trend, towards democratization. Whether it's the phone, whether it's the television, whether it's the personal computer and the same thing is happening with data technologies and certainly companies like. >> Well, Pranav, we're talking about this when you were on theCUBE yesterday. And I want to get your thoughts on this. The old way to solve the governance problem was to put data in silos. That was easy, I'll just put it in a silo and take care of it and access control was different. But now the value of the data is about cross-pollinating and make it freely available, horizontally scalable, so that it can be used. But the same time and you need to have a new governance paradigm. So, you've got to democratize the data by making it available, addressable and use for apps. The same time there's also the concerns on how do you make sure it doesn't get in the wrong hands and so on and so forth. >> Yeah and which is also very sort of common regarding open source projects in the cloud is a how do you ensure that the user authorized to access this open source project or run it has the right credentials is authorized and stuff. So, the benefit that you sort of get in the cloud is there's a centralized authentication system. There's Azure Active Directory, so you know most enterprise would have Active Directory users. Who are then authorized to either access maybe this cluster, or maybe this workload and they can run this job and that sort of further that goes down to the data layer as well. Where we have active policies which then describe what user can access what files and what folders. So if you think about the entrance scenario there is authentication and authorization happening and for the entire system when what user can access what data. And part of what Paxata brings in the picture is like how do you visualize this governance flow as data is coming from various sources, how do you make sure that the person who has access to data does have access data, and the one who doesn't cannot access data. >> Is that the problem with data prep is just that piece of it? What is the big problem with data prep, I mean, that seems to be, everyone keeps coming back to the same problem. What is causing all this data prep. >> People not buying Paxata it's very simple. >> That's a good one. Check out Paxata they're going to solve your problems go. But seriously, there seems to be the same hole people keep digging themselves into. They gather their stuff then next thing they're in the in the same hole they got to prepare all this stuff. >> I think the previous paradigms for doing data preparation tie exactly to the data democracy themes that we're talking about here. If you only have a very silo'd group of people in the organization with very deep technical skills but don't have the business context for what they're actually trying to accomplish, you have this impedance mismatch in the organization between the people who know what they want and the people who have the tools to do it. So what we've tried to do, and again you know taking a page out of the way that Microsoft has approached solving these problems you know both in the past in the present. Is to say look we can actually take the tools that once were only in the hands of the, you know, shamans who know how to utter the right incantations and instead move that into the the common folk who actually. >> The users. >> The users themselves who know what they want to do with the data. Who understand what those data elements mean. So if you were to ask the Paxata point of view, why have we had these data prep problems? Because we've separated the people who had the tools from the people who knew what they wanted to do with it. >> So it sounds to me, correct me if this is the wrong term, that what you offer in your partnership is it basically a broad curational environment for knowledge workers. You know, to sift and sort and annotating shared data with the lineage of the data preserved in essentially a system of record that can follow the data throughout its natural life. Is that a fair characterization? >> Pranav: I would think so yeah. >> You mention, Pranav, the whole issue of how one visualizes or should visualize this entire chain of custody, as it were, for the data, is there is there any special visualization paradigm that you guys offer? Now Microsoft, you've made a fairly significant investment in graph technology throughout your portfolio. I was at Build back in May and Sacha and the others just went to town on all things to do with Microsoft Graph, will that technology be somehow at some point, now or in the future, be reflected in this overall capability that you've established here with your partner here Paxata? >> I am not sure. So far, I think what you've talked about is some Graph capabilities introduced from the Microsoft Graph that's sort of one extreme. The other side of Graph exists today as a developer you can do some Graph based queries. So you can go to Cosmos DB which had a Gremlin API. For Graph based query, so I don't know how. >> I'll get right to the question. What's the Paxata benefits of with HDInsight? How does that, just quickly, explain for the audience. What is that solution, what are the benefits? >> So the the solution is you get a one click install of installing Paxata HDInsight and the benefit is as a benefit for a user persona who's not, sort of, used to big data or Hadoop they can use a very familiar GUI-based experience to get their insights from data faster without having any knowledge of how Spark works or Hadoop works. >> And what does the Microsoft relationship bring to the table for Paxata? >> So I think it's a couple of things. One is Azure is clearly growing at an extremely fast pace. And a lot of the enterprise customers that we work with are moving many of their workloads to Azure and and these cloud based environments. Especially for us, the unique value proposition of a partner who truly understands the hybrid nature of the world. The idea that everything is going to move to the cloud or everything is going to stay on premise is too simplistic. Microsoft understood that from day one. That data would be in it and all of those different places. And they've provided enabling technologies for vendors like us. >> I'll just say it to maybe you're too coy to say it, but the bottom line is you have an Excel-like interface. They have Office 365 they're user's going to instantly love that interface because it's an easy to use interface an Excel-like it's not Excel interface per se. >> Similar. >> Metaphor, graphical user interface. >> Yes it is. >> It's clean and it's targeted at the analyst role or user. >> That's right. >> That's going to resonate in their install base. >> And combined with a lot of these new capabilities that Microsoft is rolling out from a big data perspective. So HDInsight has a very rich portfolio of runtime engines and capabilities. They're introducing new data storage layers whether it's ADLS or Azure BLOB storage, so it's really a nice way of us working together to extract and unlock a lot of the value that Microsoft. >> So, here's the tough question for you, open source projects I see Microsoft, comments were hell froze because LINUX is now part of their DNA, which was a comment I saw at the even this week in Orlando, but they're really getting behind open source. From open compute, it's just clearly new DNA's. They're they're into it. How are you guys working together in open source and what's the impact to developers because now that's only one cloud, there's other clouds out there so data's going to be an important part of it. So open source, together, you guys working together on that and what's the role for the data? >> From an open source perspective, Microsoft plays a big role in embracing open source technologies and making sure that it runs reliably in the cloud. And part of that value prop that we provide in sort of Azure HDInsight is being sure that you can run these open source big data workloads reliably in the cloud. So you can run open source like Apache, Spark, Hive, Storm, Kafka, R Server. And the hard part about running open source technology in the cloud is how do you fine tune it, and how do you configure it, how do you run it reliably. And that's what sort of what we bring in from a cloud perspective. And we also contribute back to the community based on sort of what learned by running these workloads in the cloud. And we believe you know in the broader ecosystem customers will sort of have a mixture of these combinations and their solution They'll be using some of the Microsoft solutions some open source solutions some solutions from ecosystem that's how we see our customer solution sort of being built today. >> What's the big advantage you guys have at Paxata? What's the key differentiator for why someone should work with you guys? Is it the automation? What's the key secret sauce to you guys? >> I think it's a couple of dimensions. One is I think we have come the closest in the industry to getting a user experience that matches the Excel target user. A lot of folks are attempting to do the same but the feedback we consistently get is that when the Excel user uses our solution they just, they get it. >> Was there a design criteria, was that from the beginning how you were going to do this? >> From day one. >> So you engineer everything to make it as simple as like Excel. >> We want people to use our system they shouldn't be coding, they shouldn't be writing scripts. They just need to be able. >> Good Excel you just do good macros though. >> That's right. >> So simple things like that right. >> But the second is being able to interact with the data at scale. There are a lot of solutions out there that make the mistake in our opinion of sampling very tiny amounts of data and then asking you to draw inferences and then publish that to batch jobs. Our whole approach is to smash the batch paradigm and actually bring as much into the interactive world as possible. So end users can actually point and click on 100 million rows of data, instead of the million that you would get in Excel, and get an instantaneous response. Verses designing a job in a batch paradigm and then pushing it through the the batch. >> So it's interactive data profiling over vast corpuses of data in the cloud. >> Nenshad: Correct. >> Nenshad Bardoliwalla thanks for coming on theCUBE appreciate it, congratulations on Paxata and Microsoft Azure, great to have you. Good job on everything you do with Azure. I want to give you guys props, with seeing the growth in the market and the investment's been going well, congratulations. Thanks for sharing, keep coverage here in BigData NYC more coming after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media in the Big Data world. it's hard with our accent, So Paxata, we had your partner on Prakash. launching theCUBE fun to watch you guys has done in many layers of the stack, is that they need to get the data faster. from the Microsoft angle. the different tools you can use. and how does that relate to you guys? have the right permissions to do so. But the same time and you need to have So, the benefit that you sort of get in the cloud What is the big problem with data prep, But seriously, there seems to be the same hole and instead move that into the the common folk from the people who knew what they wanted to do with it. is the wrong term, that what you offer for the data, is there is there So you can go to Cosmos DB which had a Gremlin API. What's the Paxata benefits of with HDInsight? So the the solution is you get a one click install And a lot of the enterprise customers but the bottom line is you have an Excel-like interface. user interface. It's clean and it's targeted at the analyst role to extract and unlock a lot of the value So open source, together, you guys working together and making sure that it runs reliably in the cloud. A lot of folks are attempting to do the same So you engineer everything to make it as simple They just need to be able. Good Excel you just do But the second is being able to interact So it's interactive data profiling and Microsoft Azure, great to have you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jersey	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Excel	TITLE	0.99+
2 companies	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Orlando	LOCATION	0.99+
Nenshad	PERSON	0.99+
Bardo	PERSON	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
third day	QUANTITY	0.99+
both	QUANTITY	0.99+
Office 365	TITLE	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
100 million rows	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Microsoft Ventures	ORGANIZATION	0.99+
Pranav Rastogi	PERSON	0.99+
first two days	QUANTITY	0.99+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
million	QUANTITY	0.98+
second	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
Spark	TITLE	0.98+
this week	DATE	0.98+
first	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
one click	QUANTITY	0.97+
Prakash	PERSON	0.97+
Azure	TITLE	0.97+
May	DATE	0.97+
Wikibon Big Data	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hive	TITLE	0.94+
today	DATE	0.94+
Strata Data	ORGANIZATION	0.94+
Pranav	PERSON	0.93+
NYC	LOCATION	0.93+
one cloud	QUANTITY	0.93+
2017	DATE	0.92+
Apache	ORGANIZATION	0.9+
Paxata	TITLE	0.9+
Graph	TITLE	0.89+
Pranav	ORGANIZATION	0.88+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for gremlin: