Deepu Kumar, Tony Abrozie, Ashlee Lane | AWS Executive Summit 2022

>>Now welcome back to the Cube as we continue our coverage here. AWS Reinvent 2022, going out here at the Venetian in Las Vegas. Tens of thousands of attendees. That exhibit Hall is full. Let me tell you, it's been something else. Well, here in the executive summit, sponsored by Accenture. Accenture rather. We're gonna talk about Baptist Health, what's going on with that organization down in South Florida with me. To do that, I have Tony Abro, who's the SVP and Chief Digital and Information Officer. I have Ashley Lane, the managing director of the Accenture Healthcare Practice, and on the far end Poop Kumar, who is the VP and cto Baptist Health Florida won and all. Welcome. Thank you. First off, let's just talk about Baptist Health, the size of your footprint. One and a half million patient visits a year, not a small number. >>That was probably last year's number, but okay. >>Right. But not a small number about your footprint and, and what, I guess the client base basically that you guys are serving in it. >>Absolutely. So we are the largest organization in South Florida system provider and the 11 hospitals soon to be 12, as you said, it's probably about 1.8 million by now. People were, were, were supporting a lot of other units and you know, we're focusing on the four southern counties of South Florida. Okay. >>So got day Broward. Broward, yep. Down that way. Got it. So now let's get to your migration or your cloud transformation. As we're talking about a lot this week, what's been your, I guess, overarching goal, you know, as you worked with Accenture and, and developed a game plan going forward, you know, what was on the front end of that? What was the motivation to say this is the direction we're going to go and this is how we're gonna get there? >>Perfect. So Baptist started a digital transformation initiative before I came about three years ago. The board, the executive steering committee, decided that this is gonna be very important for us to support us, to help our patients and, and consumers. So I was brought in for that digital transformation. And by the way, digital transformation is kind of an umbrella. It's really business transformation with technology, digital technologies. So that's, that's basically where we started in terms of consumer focused and, and, and patient focus. And digital is a big word that really encompasses a lot of things. Cloud is one of, of course. And, you know, AI and ML and all the things that we are here for this, this event, you know, and, and we've started that journey about two years ago. And obviously cloud is very important. AWS is our main cloud provider and clearly in AWS or any club providers is not just the infrastructure they're providing, it's the whole ecosystem that provides us back value into, into our transformation. And then somebody, I think Adam this morning at the keynote said, this is a team sport. So with this big transformation, we need all the help and that we can get to mines and, and, and hands. And that's where Accenture has been invaluable over the last two years. >>Yeah, so as a team sport then depu, you, you've got external stakeholders, otherwise we talked about patience, right? Internal, right. You've, you've got a whole different set of constituents there, basically, but it takes that team, right? You all have to work together. What kind of conversations or what kind of actions, I guess have you had with different departments and what different of sectors of, of the healthcare business as Baptist Health sees it in order to bring them along too, because this is, you know, kind of a shocking turn for them too, right? And how they're gonna be doing business >>Mostly from an end user perspective. This is something that they don't care much about where the infrastructure is hosted or how the services are provided from that perspective. As long as the capabilities function in a better way, they are seemingly not worried about where the hosting is. So what we focus on is in terms of how it's going to be a better experience for, from them, from, from their perspective, right? How is it going to be better responsiveness, availability, or stability overall? So that's been the mode of communication from that perspective. Other than that, from a, from a hosting and service perspective, the clientele doesn't care as much as the infrastructure or the security or the, the technology and digital teams themselves. >>But you know, some of us are resistant to change, right? We're, we're just, we are old dogs. We don't like new tricks and, and change can be a little daunting sometimes. So even though it is about my ease of use and my efficiency and why I can then save my time on so and so forth, if I'm used to doing something a certain way, and that's worked fine for me and here comes Tony and Depo and here comes a, >>They're troublemaker >>And they're stir my pot. Yeah. So, so how do you, the work, you were giving advice maybe to somebody watching this and say, okay, you've got internal, I wouldn't say battles, but discussions to be held. How did you navigate through that? >>Yeah, no, absolutely. And Baptist has been a very well run system, very successful for 60 something odd years. Clearly that conversation did come, why should we change? But you always start with, this is what we think is gonna happen in the future. These are the changes that very likely will happen in the future. One is the consumer expectations are the consumer expectations in terms of their ability to have access to information, get access to care, being control of the process and their, their health and well-being. Everything else that happens in the market. And so you start with the, with that, and that's where clearly there are, there are a lot of signs that point to quite a lot of change in the ecosystem. And therefore, from there, the conversation is how do we now meet that challenge, so to speak, that we all face in, in, in healthcare. >>And then from there, you kind of designed the, a vision of where we want to be in terms of that digital transformation and how do we get there. And then once that is well explained and evangelized, and that's part of our jobs with the help of our colleagues who have, have been doing this with others, then is the, what I call a tell end show. We're gonna say, okay, in this, in this road, we're gonna start with this. It's a small thing and we're gonna show you how it works in terms of, in terms of the process, right? And then as, as you go along and you deliver some things, people understand more, they're on board more and they're ready for for more. So it's iterative from small to larger. >>The proof is always in the place, right? If you can show somebody, so actually I, I obviously we know about Accenture's role, but in terms of almost, almost what Tony was just saying, that you have to show people that it works. How, how do you interface with a client? And when you're talking about these new approaches and you're suggesting changes and, and making these maybe rather dramatic proposals, you know, to how they do things internally, from Accenture's perspective, how do you make it happen? How, how do you bring the client along in this case, batches >>Down? Well, in this case, with Tony and Depu, I mean, they have been on this journey already at another client, right? So they came to Baptist where they had done a similar journey previously. And so it wasn't really about convincing >>Also with Accenture's >>Health, also with Accenture's Health, correct. But it wasn't about telling Tony Dupe, how do we do this? Or anything like that. Cuz they were by far the experts and have, you know, the experience behind it. Well, it's really like, how do we make sure that we're providing the right, right team, the right skills to match, you know, what they wanted to do and their aspirations. So we had brought the, the healthcare knowledge along with the AWS knowledge and the architects and you know, we said that we gotta, you know, let's look at the roadmap and let's make sure that we have the right team and moving at the right pace and, you know, testing everything out and working with all the different vendors in the provider world specifically, there's a lot of different vendors and applications that are, you know, that are provided to them. It's not a lot of custom activity, you know, applications or anything like that. So it was a lot of, you know, working with other third party that we really had to align with them and with Baptist to make sure that, you know, we were moving together at speed. >>Yeah, we've heard about transformation quite a bit. Tony, you brought it up a little bit ago, depu, just, if you had to define transformation in this case, I mean, how big of a, of a, of a change is that? I mean, how, how would you describe it when you say we're gonna transform our, you know, our healthcare business? I mean, I think there are a lot of things that come to my mind, but, but how do you define it and, and when you're, when you're talking to the folks with whom you've got to bring along on this journey? >>So there's the transformation umbrella and compos two or three things. As Tony said, there is this big digital transformation that everybody's talking about. Then there is this technology transformation that powers the digital transformation and business transformation. That's the outcome of the digital transformation. So I think we, we started focusing on all three areas to get the right digital experience for the consumers. We have to transform the way we operate healthcare in its current state or, or in the existing state. It's a lot of manual processes, a lot of antiquated processes, so to speak. So we had to go and reassess some of that and work with the respective business stakeholders to streamline those because in, it's not about putting a digital solution out there with the anti cured processes because the outcome is not what you expect when you do that. So from that perspective, it has been a heavy lifting in terms of how we transform the operations or the processes that facilitates some of the outcomes. >>How do you know it's working >>Well? So I I, to add to what Deep was saying is I think we are fortunate and that, you know, there are a lot of folks inside Baptist who have been wanting this and they're instrumental to this. So this is not a two man plus, you know, show is really a, you know, a, a team sport. Again, that same. So in, in that, that in terms of how do we know it works well when, when we define what we want to do, there is some level of precision along the way. In those iterations, what is it that we want to do next, right? So whatever we introduce, let's say a, a proper fluid check in for a patient into a, for an appointment, we measure that and then we measure the next one, and then we kind of zoom out and we look at the, the journey and say, is this better? >>Is this better for the consumer? Do they like it better? We measure that and it's better for the operations in terms of, but this is the interesting thing is it's always a balance of how much you can change. We want to improve the consumer experience, but as deeply said, there's lot to be changed in, in the operations, how much you do at the same time. And that's where we have to do the prioritization. But you know, the, the interesting thing is that a lot of times, especially on the self servicing for consumers, there are a lot of benefits for the operations as well. And that's, that's where we're in, we're in it together and we measure. Yeah, >>Don't gimme too much control though. I don't, I'm gonna leave the hard lifting for you. >>Absolutely, absolutely right. Thank you. >>So, and, and just real quick, Ashley, maybe you can shine some light on this, about the relationship, about, about next steps, about, you know, you, you're on this, this path and things are going well and, and you've got expansion plans, you want, you know, bring in other services, other systems. Where do you want to take 'em in the big picture in terms of capabilities? >>Well, I, I mean, they've been doing a fantastic job just being one of the first to actually say, Hey, we're gonna go and make an investment in the cloud and digital transformation. And so it's really looking at like, what are the next problems that we need to solve, whether it's patient care diagnosis or how we're doing research or, you know, the next kind of realm of, of how we're gonna use data and to improve patient care. So I think it's, you know, we're getting the foundation, the basics and everything kind of laid out right now. And then it's really, it's like what's the next thing and how can we really improve the patient care and the access that they have. >>Well, it sure sounds like you have a winning accommodation, so I I keep the team together. >>Absolutely. >>Teamwork makes the dream >>Work. Absolutely. It is, as you know. So there's a certain amount of, if you look at the healthcare industry as a whole, and not, not just Baptist, Baptist is, you know, fourth for thinking, but entire industry, there's a lot of catching up to do compared to whatever else is doing, whatever else the consumers are expecting of, of an entity, right? But then once we catch up, there's a lot of other things that we were gonna have to move on, innovate for, for problems that we maybe we don't know we have will have right now. So plenty of work to do. Right. >>Which is job security for everybody, right? >>Yes. >>Listen, thanks for sharing the story. Yeah, yeah. Continued success. I wish you that and I appreciate the time and expertise here today. Thank you. Thanks for being with us. Thank you. Thank you. We'll be back with more. You're watching the Cube here. It's the Executive Summit sponsored by Accenture. And the cube, as I love to remind you, is the leader in tech coverage.

Published Date : Nov 30 2022

SUMMARY :

I have Ashley Lane, the managing director of the Accenture Healthcare Practice, and on the far end Poop and what, I guess the client base basically that you guys are serving in it. units and you know, we're focusing on the four southern you know, as you worked with Accenture and, and developed a game plan going forward, And, you know, AI and ML and all the things that we are here them along too, because this is, you know, kind of a shocking turn for them too, So that's been the mode of communication But you know, some of us are resistant to change, right? you were giving advice maybe to somebody watching this and say, okay, you've got internal, And so you start with the, with that, and that's where clearly And then as, as you go along and you deliver some things, people and making these maybe rather dramatic proposals, you know, So they came to Baptist where they had done a similar journey previously. the healthcare knowledge along with the AWS knowledge and the architects and you know, come to my mind, but, but how do you define it and, and when you're, when you're talking to the folks with whom you've there with the anti cured processes because the outcome is not what you expect when and that, you know, there are a lot of folks inside Baptist who have been wanting this and But you know, the, the interesting thing is that a lot of times, especially on the self I don't, I'm gonna leave the hard lifting for you. Thank you. about next steps, about, you know, you, you're on this, this path and things are going well So I think it's, you know, we're getting the foundation, the basics and everything kind of laid out right now. So there's a certain amount of, if you look at the healthcare industry And the cube, as I love to remind you, is the leader in tech coverage.

ENTITIES

Entity	Category	Confidence
Tony	PERSON	0.99+
Tony Abrozie	PERSON	0.99+
Ashley Lane	PERSON	0.99+
Tony Abro	PERSON	0.99+
Ashlee Lane	PERSON	0.99+
Accenture	ORGANIZATION	0.99+
Deepu Kumar	PERSON	0.99+
Poop Kumar	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ashley	PERSON	0.99+
Adam	PERSON	0.99+
South Florida	LOCATION	0.99+
11 hospitals	QUANTITY	0.99+
Baptist Health	ORGANIZATION	0.99+
Tony Dupe	PERSON	0.99+
two	QUANTITY	0.99+
last year	DATE	0.99+
12	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
60	QUANTITY	0.99+
First	QUANTITY	0.99+
fourth	QUANTITY	0.98+
this week	DATE	0.98+
today	DATE	0.98+
first	QUANTITY	0.98+
Venetian	LOCATION	0.97+
Accenture Healthcare Practice	ORGANIZATION	0.97+
One and a half million patient	QUANTITY	0.97+
a year	QUANTITY	0.96+
Depu	PERSON	0.96+
two man	QUANTITY	0.95+
Baptist	ORGANIZATION	0.95+
three things	QUANTITY	0.95+
about 1.8 million	QUANTITY	0.93+
One	QUANTITY	0.9+
Tens of thousands	QUANTITY	0.9+
Depo	PERSON	0.9+
three years ago	DATE	0.89+
cto Baptist Health Florida	ORGANIZATION	0.87+
this morning	DATE	0.86+
one	QUANTITY	0.85+
three	QUANTITY	0.84+
AWS	EVENT	0.83+
last two years	DATE	0.82+
Executive Summit	EVENT	0.77+
Broward	ORGANIZATION	0.76+
about two years ago	DATE	0.73+
Cube	ORGANIZATION	0.69+
Deep	PERSON	0.69+
four southern counties	QUANTITY	0.67+
Executive	EVENT	0.59+
Reinvent 2022	EVENT	0.55+
Cube	PERSON	0.5+
2022	DATE	0.49+

2021 027 Jim Walker

(bright upbeat music) >> Hello, and welcome back to the DockerCon 2021 virtual coverage. I'm John Furrie host of theCUBE here in Palo Alto with a remote interview with a great guest Cuban alumni, Jim Walker VP of Product Marketing at Cockroach Labs. Jim, great to see you remotely coming into theCUBE normally we're in person, soon we'll be back in real life. Great to see you. >> Great to see you as well John, I miss you. I miss senior live and in person. So this has got to do, I guess right? >> We we had the first multi-cloud event in New York city. You guys had was I think one of the last events that was going on towards the end of the year before the pandemic hit. So a lot's happened with Cockroach Labs over the past few years, accelerated growth, funding, amazing stuff here at DockerCon containerization of the world, containers everywhere and all places hybrid, pure cloud, edge everywhere. Give us the update what's going on with Cockroach Labs and then we'll get into what's going on at DockerCon. >> Yeah Cockroach Labs, this has been a pretty fun ride. I mean, I think about two and a half years now and John it's been phenomenal as the world kind of wakes up to a distributed systems and the containerization of everything. I'm happy we're at DockerCon talking about containerization 'cause I think it has radically changed the way we think about software, but more importantly it's starting to take hold. I think a lot of people would say, oh, it's already taken hold but if you start to think about like just, these kind of modern applications that are depending on data and what does containerization mean for the database? Well, Cockroach has got a pretty good story. I mean, gosh, before Escape I think the last time I talked to you, I was at CoreOS and we were playing the whole Kubernetes game and I remember Alex Povi talking about GIFEE Google infrastructure for everyone or for everyone else I should say. And I think that's what we've seen that kind of happened with the infrastructure layer but I think that last layer of infrastructure is the database. Like I really feel like the database is that dividing line between the business logic and infrastructure. And it's really exciting to see, just massive huge customers come to Cockroach to rethink what the database means in cloud, right? What does the database mean when we moved to distributed systems and that sort of thing, and so, momentum has been building here, we are, upwards of, oh gosh, over 300 paying customers now, thousands of Cockroach customers in the wild out there but we're seeing this huge massive attraction to CockroachCloud which is a great name. Come on, Johnny, you got to say, right? And our database as a service. So getting that out there and seeing the uptake there has just been, it's been phenomenal over the past couple of years. >> Yeah and you've got to love the Cockroach name, love it, survive nuclear war and winter all that good stuff as they say, but really the reality is that it's kind of an interesting play on words because one of the trends that we've been talking about, I mean, you and I've been telling this for years with our CUBE coverage around Amazon Web Services early on was very clear about a decade ago that there wasn't going to be one database to rule the world. They're going to many, many databases. And as you started getting into these cloud native deployments at scale, use your database of choice was the developer ethos just whatever it takes to get the job done. Now you start integrating this in a horizontally scalable way with the cloud, you have now new kinds of scale, cloud scale. And it kind of changed the game on the always on availability question which is how do I get high availability? How do I keep things running? And that is the number one developer challenge whether it's infrastructure as code, whether it's security shifting left, it all comes down to making sure stuff's running at scale and secure. Talk about that. >> Yeah, absolutely and it's interesting it's been, like I said, this journey in this arc towards distributed systems and truly like delivery of what people want in the cloud, it's been a long arc and it's been a long journey and I think we're getting to the point where people, they are starting to kind of bake resilience and scale into their applications and I think that's kind of this modern approach. Look we're taking legacy databases today. There are people are kind of lift and shift, move them into the cloud, try to run them there but they aren't just built for that infrastructure like the there's a fundamentally different approach and infrastructure when it talks, when you talk about cloud it's one of the reasons why John early on your conversations with the AWS Team and what they did, it's like, yeah, how do we give resilient and ubiquitous and always on scalable kind of infrastructure people. Well, that's great for those layers but when you start to get into the software that's running on these things, it isn't lift and shift and it's not even move and improve. You can't like just take a legacy system and change one piece of it to make it kind of take advantage of the scale and the resilience and the ubiquity of the cloud, because there's very very explicit challenges. For us, it's about re-architect and rebuild. Let's tear the database down and let's rethink it and build from the ground up to be cloud native. And I think the technologies that have done that, that have kind of built from scratch, to be cloud native are the ones that are I believe, three years from now that's what we're going to be talking about. I mean, this comes back to again, like the Genesis of what we did is Google Cloud Spanner. Spanner white paper and what Google did, they didn't build, they didn't use an existing database because they needed something for a transactional relational database. They hire a bunch of really incredible engineers, right? And I got like Jeff Dean and Sanjay Ghemawat over there, like designing and doing all these cool things, they build and I think that's what we're seeing and I think that's, to me the exciting part about data in the cloud as we move forward. >> Yeah, and I think the Google cloud infrastructure, everyone I think that's the same mindset for Amazon is that I want all the scale, but I don't want to do it like over 10 years I to do it now, which I love I want to get back to in a second, but I want to ask you specifically this definition of containerization of the database. I've heard that kicked around, love the concept. I kind of understand what it means but I want you to define it for us. What does it mean when someone says containerizing the database? >> Yeah, I mean, simply put the database in container and run it and that's all that I can think that's like, maybe step one I think that's kind of lift and shift. Let's put it in a container and run it somewhere. And that's not that hard to do. I think I could do that. I mean, I haven't coded in a long time but I think I could figure that out. It's when you start to actually have multiple instances of a container, right? And that's where things get really, really tricky. Now we're talking about true distributed systems. We're talking about how do you coordinate data? How do you balance data across multiple instances of a database, right? How do you actually have fail over so that if one node goes down, a bunch of them are still available. How do you guarantee transactional consistency? You can't just have four instances of a database, all with the same information in it John without any sort of coordination, right? Like you hit one node and you hit another one in the same account which transaction wins. And so the concepts in distributed systems around there's this thing called the cap theorem, there's consistency, availability, and partition tolerance and actually understanding how these things work especially for data in distributed systems, to make sure that it's going to be consistent and available and you're going to scale those things are not simple to solve. And again, it comes back to this. I don't think you can do it with legacy database. You kind of have to re-architect and it comes down to where data is stored, it comes down to how it's replicated, it comes down to really ultimately where it's physically located. I think when you deploy a database you think about the logical model, right? You think about tables, and normalization and referential integrity. The physical location is extremely important as we kind of moved to that kind of containerized and distributed systems, especially around data. >> Well, you guys are here at DockerCon 2021 Cockroach Labs good success, love the architectural flexibility that you guys offer. And again, bringing that scale, like you mentioned it's awesome value proposition, especially if people want to just program the infrastructure. What's going on with with DockerCon specifically a lot of talk about developer productivity, a lot of talk about collaboration and trust with containers, big story around security. What's your angle here at DockerCon this year? What's the big reveal? What's the discussion? What's the top conversation? >> Yeah, I mean look at where we are a containerized database and we are an incredibly great choice for developers. For us, it's look at there's certain developer communities that are important on this planet, John, and this is one of them, right? This is I don't know a developer doesn't have that little whale up in their status bar, right? And for us, you know me man, I believe in this tech and I believe that this is something that's driven and greatly simplify our lives over the next two to three to 10 to 15 years. And for us, it's about awareness. And I think once people see Cockroach, they're like oh my God, how did I ever even think differently? And so for us, it's kind of moving in that direction. But ultimately our vision where we want to be, is we want to abstract the database to a SQL API in the cloud. We want to make it so simple that I just have this rest interface, there's end points all over the planet. And as a developer, I never have to worry about scale. I never have to worry about DR right? It's always going to be on. And most importantly, I don't have to worry about low latency access to data no matter where I'm at on the planet, right? I can give every user this kind of sub 50 millisecond access to data or sub 20 millisecond access to data. And that is the true delivery of the cloud, right? Like I think that's what the developer wants out of the cloud. They want to code against a service like, and it's got to be consumption-based and you secure and I don't want to have to pay for stuff I'm not using and that all those things. And so, for us, that's what we're building to, and interacting in this environment is critical for us because I think that's where audiences. >> I want to get your thoughts on you guys do have success with a couple of different personas and developers out there, groups, classic developers, software developers which is this show is that DockerCon full of developers KubeCon a lot of operators cool, and some dads, but mostly cloud native operations. Here's a developer shops. So you guys got to hit the developers which really care about building fast and building the scale and last with security. Architects you had success with, which is the classic, cloud architecture, which now distributed computing, we get that. But the third area I would call the kind of the role that both the architects and the developers had to take on which is being the DevOps person or then becomes the SRE in the group, right? So most startups have the DevOps team developers. They do DevOps natively and within every role. So they're the same people provisioning. But as you get larger and an enterprise, the DevOps role, whether it's in a team or group takes on this SRE site reliability engineer. This is a new dynamic that brings engineering and coding together. It's like not so much an ops person. It's much more of like an engineering developer. Why is that role so important? And we're seeing more of it in dev teams, right? Seeing an SRE person or a DevOps person inside teams, not a department. >> Yeah, look, John, we, yeah, I mean, we employ an army of SREs that manage and maintain our CockroachCloud, which is CockroachDB as a service, right? How do you deliver kind of a world-class experience for somebody to adopt a managed service a database such as ours, right? And so for us, yeah I mean, SREs are extremely important. So we have personal kind of an opinion on this but more importantly, I think, look at if you look at Cockroach and the architecture of what we built, I think Kelsey Hightower at one point said, I am going to probably mess this up but there was a tweet that he wrote. It's something like, CockroachDB is the Spanner as Kubernetes is the board. And if you think about that, I mean that's exactly what this is and we built a database that was actually amenable to the SRE, right? This is exactly what they want. They want it to scale up and down. They want it to just survive things. They want to be able to script this thing and basically script the world. They want to actually, that's how they want to manage and maintain. And so for us, I think our initial audience was definitely architects and operators and it's theCUBE con crowd and they're like, wow, this is cool. This is architected just like Kubernetes. In fact, like at etcd, which is a key piece of Kubernetes but we contribute back up to NCD our raft implementation. So there's a lot of the same tech here. What we've realized though John, with database is interesting. The architect is choosing a database sometimes but more often than not, a developer is choosing that database. And it's like they go out, they find a database, they just start building and that's what happens. So, for us, we made a very critical decision early on, this database is wire compatible with Postgres and it speaks to SQL syntax which if you look at some of the other solutions that are trying to do these things, those things are really difficult to do at the end. So like a critical decision to make sure that it's amenable so that now we can build the ORMs and all the tools that people would use and expect that of Postgres from a developer point of view, but let's simplify and automate and give the right kind of like the platform that the SREs need as well. And so for us the last year and a half is really about how do we actually build the right tooling for the developer crowd too. And we've really pushed really far in that world as well. >> Talk about the aspect of the scale of like, say startup for instance, 'cause you made this a great example borg to Kubernetes 'cause borg was Google's internal Kubernetes, like thing. So you guys have Spanner which everyone knows is a great product at Google had. You guys with almost the commercial version of that for the world. Is there, I mean, some people will say and I'll just want to challenge you on this and we'll get your thoughts. I'm not Google, I'll never be Google, I don't need that scale. Or so how do you address that point because some people say, well this might dismiss the notion of using it. How do you respond to that? >> Yeah, John, we get this all the time. Like, I'm not global. My application's not global. I don't need this. I don't need a tank, right? I just need, like, I just need to walk down the road. You know what I mean? And so, the funny thing is, even if you're in a single region and you're building a simple application, does it need to be always on does it need to be available. Can it survive the failure of a server or a rack or an AZ it doesn't have to survive the failure of a region but I tell you what, if you're successful, you're going to want to start actually deploying this thing across multiple regions. So you can survive a backhoe hit in a cable and the entire east coast going out, right? Like, and so with Cockroach, it's real easy to do that. So it's four little SQL commands and I have a database that's going to span all those regions, right? And I think that's important but more importantly, think about scale, when a developer wants to scale, typically it's like, okay, I'm going to spin up Postgres and I'm going to keep increasing my instance size. So I'm going to scale vertically until I run out of room. And then I'm going to have to start sharding this database. And when you start doing that, it adds this kind of application complexity that nobody really wants to deal with. And so forget it, just let the database deal with all that. So we find this thing extremely useful for the single developer in a very small application but the beauty thing is, if you want to go global, great just keep that in notes. Like when that application does take off and it's the next breakthrough thing, this database going to grow with you. So it's good enough to kind of start small but it's the scale fast, it'll go global if you want to, you have that option, I guess, right? >> I mean, why wouldn't you want optionality on this at all? So clearly a good point. Let me ask you a question, take me through a use case where with Cockroach, some scenario develops nicely, you can point to the visibility of the use case for the developer and then kind of how it played out and then compare that and contrast that to a scenario that doesn't go well, like where where we're at plays out well, for an example, and then if they didn't deploy it they got hung up and went sideways. >> Yeah like Cockroach was built for transactional workloads. That that's what we are like, we are optimized for the speed of light and consistent transactions. That's what we do, and we do it very well. At least I think so, right. But I think, like my favorite customer of all of ours is DoorDash and about a year ago DoorDash came to us and said, look at we have a transactional database that can't handle the right volume that we're getting and falls over. And they they'd significant challenges and if you think about DoorDash and DoorDash is business they're looking at an IPO in the summer and going through these, you can't have any issues. So like system's got to be up and running, right? And so for them, it was like we need something that's reliable. We need something that's not going to come down. We need something that's going to scale and handle burst and these sort of things and their business is big, their businesses not just let me deliver food all the time. It's deliver anything, like be that intermediary between a good and somebody's front door. That's what DoorDash wants to be. And for us, yeah, their transactions and that backend transactional system is built on Cockroach. And that's one year ago, they needed to get experienced. And once they did, they started to see that this was like very, very valuable and lots of different workloads they had. So anywhere there's any sort of transactional workload be it metadata, be it any sort of like inventory, or transaction stuff that we see in companies, that's where people are coming to us. And it's these traditional relational workloads that have been wrapped up in these transactional relational databases what built for the cloud. So I think what you're seeing is that's the other shoe to drop. We've seen this happen, you're watching Databricks, you're watching Snowflake kind of do this whole data cloud and then the analytical side John that's been around for a long time and there's that move to the cloud. That same thing that happened for OLAP, is got to happen for OLTP. Where we don't do well is when somebody thinks that we're an analytic database. That's not what we're built for, right? We're optimized for transactions and I think you're going to continue to see these two sides of the world, especially in cloud especially because I think that the way that our global systems are going to work you don't want to do analytics across multiple regions, it doesn't make sense, right? And so that's why you're going to see this, the continued kind of two markets OLAP and OLTP going on and we're just, we're squaring that OLTP side of the world. >> Yeah talking about the transaction processing side of it when you start to change a distributed architecture that goes from core edge, core on premises to edge. Edge being intelligent edge, industrial edge, whatever you're going to have more action happening. And you're seeing, Kubernetes already kind of talking about this and with the containers you got, so you've got kind of two dynamics. How does that change the nature of, and the level of volume of transactions? >> Well, it's interesting, John. I mean, if you look at something like Kubernetes it's still really difficult to do multi-region or multicloud Kubernetes, right? This is one of those things that like you start to move Kubernetes to the edge, you're still kind of managing all these different things. And I think it's not the volumes, it's the operational nightmare of that. For us, that's federate at the data layer. Like I could deploy Cockroach across multiple Kubernetes clusters today and you're going to have one single logical database running across those. In fact you can deploy Cockroach today on top of three public cloud providers, I can have nodes in AWS, I could have nodes in GCP, I could have nodes running on VMs in my data center. Any one of those nodes can service requests and it's going to look like a single logical database. Now that to me, when we talked about multicloud a year and a half ago or whatever that was John, that's an actual multicloud application and delivering data so that you don't have to actually deal with that in your application layer, right? You can do that down in the guts of the database itself. And so I think it's going to be interesting the way that these things gets consumed and the way that we think about where data lives and where our compute lives. I think that's part of what you're thinking about too. >> Yeah, so let me, well, I got you here. One of the things on my mind I think people want to maybe get clarification on is real quick while you're here. Take a minute to explain that you're seeing a CockroachDB and CockroachCloud. There are different products, you mentioned you've brought them both up. What's the difference for the developers watching? What's the difference of the two and when do I need to know the difference between the two? >> So to me, they're really one because CockroachCloud is CockroachDB as a service. It's our offering that makes it a world-class easy to consume experience of working with CockroachDB, where we take on all the hardware we take on the SRE role, we make sure it's up and running, right? You're getting connection, stringing your code against it. And I think, that's side of our world is really all about this kind of highly evolved database and delivering that as a service and you can actually use it's CockroachDB. I think it was just gets really interesting John is the next generation of what we're building. This serverless version of our database, where this is just an API in the cloud. We're going to have one instance of Cockroach with multi-tenant database in there and any developer can actually spin up on that. And to me, that gets to be a really interesting world when the world turns serverless, and we have, we're running our compute in Lambda and we're doing all these great things, right? Or we're using cloud run and Google, right? But what's the corresponding database to actually deal with that? And that to me is a fundamentally different database 'cause what is scale in the serverless world? It's autonomous, right? What scale in the current, like Cockroach world but you kind of keep adding nodes to it, you manage, you deal with that, right? What does resilience mean in a serverless world? It's just, yeah, its there all the time. What's important is latency when you get to kind of serverless like where are these things deployed? And I think to me, the interesting part of like the two sides of our world is what we're doing with serverless and kind of this and how we actually expose the core value of CockroachDB in that way. >> Yeah and I think that's one of the things that is the Nirvana or the holy grail of infrastructure as code is making it, I won't say irrelevant, but invisible if you're really dealing with a database thing, hey I'm just scaling and coding and the database stuff is just working with compute, just whatever, how that's serverless and you mentioned Lambda that's the action because you don't want the file name and deciding what the database is just having it happen is more productivity for the developers that kind of circles back to the whole productivity message for the developers. So I totally get that I think that's a great vision. The question I have for you Jim, is the big story here is developer simplicity. How you guys making it easier to just deploy. >> John is just an extension of the last part of the conversation. I don't want to developer to ever have to worry about a database. That's what Spencer and Peter and Ben have in their vision. It's how do I make the database so simple? It's simple, it's a SQL API in the cloud. Like it's a rest interface, I code against it, I run queries against it, I never have to worry about scaling the thing. I never have to worry about creating active, passive, and primary and secondary. All these like the DevOps side of it, all this operation stuff, it's just kind of done in the background dude. And if we can build it, and it's actually there now where we have it in beta, what's the role of the cost-based optimizer in this new world that we've had in databases? How are you actually ensuring data is located close to users and we're automating that so that, when John's in Australia doing a show, his data is going to follow him there. So he has fast access to that, right? And that's the kind of stuff that, we're talking about the next generation of infrastructure John, not like we're not building for today. Like, look at Cockroach Labs is not building for like 2021. Sure, do we have something that's great. We're building something that's 22 and 23 and 24, right? Like what do we need to be as a extremely productive set of engineers? And that's what we think about all day. How do we make data easy for the developer? >> Well, Jim, great to have you on VP of Product Marketing at Cockroach Labs, we've known each other for a long time. I got to ask you while I had got you here final question is, you and I have chatted about the many waves of in open source and in the computer industry, what's your take on where we are now. And I see you're looking at it from the Cockroach Labs perspective which is large scale distributed computing kind of you're on the new side of history, the right side of history, cloud native. Where are we right now? Compare and contrast for the folks watching who we're trying to understand the importance of where we are in the industry, where are we in and what's your take? >> Yeah John I feel fortunate to be in a company such as this one and the past couple that I've like been around and I feel like we are in the middle of a transformation. And it's just like the early days of this next generation. And I think we're seeing it in a lot of ways in infrastructure, for sure but we're starting to see it creep up into the application layer. And for me, it is so incredibly exciting to see the cloud was, remember when cloud was like this thing that people were like, oh boy maybe I'll do it. Now it's like, it's anything net new is going to be on cloud, right? Like we don't even think twice about it and the coming nature of cloud native and actually these technologies that are coming are going to be really interesting. I think the other piece that's really interesting John is the changing role of open source in this whole game, because I think of open source as code consumption and community, right? I think about those and then there's license of course, I think people were always there. A lot of people wrapped around the licensing. Consumption has changed, John. Back when we were talking to Dupe, consumption was like, oh, it's free, I get this thing I could just download it use it. Well consumption over the past three years, everybody wants everything as a service. And so we're ready to pay. For us, how do we bring free back to the service? And that's what we're doing. That's what I find like I am so incredibly excited to go through this kind of bringing back free beer to open source. I think that's going to be great 'cause if I can give you a database free up to five gig or 10 gig, man and it's available all over the planet has fully featured, that's coming, that's bringing our community and our code which is all open source and this consumption model back. And I'm super excited about that. >> Yeah, free beer who doesn't like free beer of course, developers love free beer and a great t-shirt too that's soft. Make sure you get that, get the soft >> You just don't want free puppy, you know what I mean? It was just like, yeah, that sounds painful. >> Well Jim, great to see you remotely. Can't wait to see you in person at the next event. And we've got the fall window coming up. We'll see some events. I think KubeCon in LA is going to be in-person re-invent a data breast for sure we'll be in person. I know that for a fact we'll be there. So we'll see you in person and congratulations on the work at Cockroach Labs. >> Thanks, John, great to see you again. All right, this keep coverage of DockerCon 2021. I'm John Furrie your host of theCUBE. Thanks for watching.

Published Date : May 19 2021

SUMMARY :

Jim, great to see you Great to see you as of the world, containers and the containerization of everything. And that is the number and I think that's, to of containerization of the database. and it comes down to where data is stored, that you guys offer. And that is the true the developers had to take on and basically script the world. of that for the world. and it's the next breakthrough thing, for the developer and then is that's the other shoe to drop. and the level of volume of transactions? and the way that we think One of the things on my mind And I think to me, the and the database stuff is And that's the kind of stuff I got to ask you while I had And it's just like the early and a great t-shirt too that's soft. puppy, you know what I mean? Well Jim, great to see you remotely. Thanks, John, great to see you again.

ENTITIES

Entity	Category	Confidence
Raj	PERSON	0.99+
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Caitlyn	PERSON	0.99+
Pierluca Chiodelli	PERSON	0.99+
Jonathan	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
Adam	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Lynn Lucas	PERSON	0.99+
Caitlyn Halferty	PERSON	0.99+
$3	QUANTITY	0.99+
Jonathan Ebinger	PERSON	0.99+
Munyeb Minhazuddin	PERSON	0.99+
Michael Dell	PERSON	0.99+
Christy Parrish	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Ed Amoroso	PERSON	0.99+
Adam Schmitt	PERSON	0.99+
SoftBank	ORGANIZATION	0.99+
Sanjay Ghemawat	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Verizon	ORGANIZATION	0.99+
Ashley	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Greg Sands	PERSON	0.99+
Craig Sanderson	PERSON	0.99+
Lisa	PERSON	0.99+
Cockroach Labs	ORGANIZATION	0.99+
Jim Walker	PERSON	0.99+
Google	ORGANIZATION	0.99+
Blue Run Ventures	ORGANIZATION	0.99+
Ashley Gaare	PERSON	0.99+
Dave	PERSON	0.99+
2014	DATE	0.99+
IBM	ORGANIZATION	0.99+
Rob Emsley	PERSON	0.99+
California	LOCATION	0.99+
Lynn	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Allen Crane	PERSON	0.99+

Tyler Duncan, Dell & Ed Watson, OSIsoft | PI World 2018

>> [Announcer] From San Francisco, it's theCUBE covering OSIsoft PIWORLD 2018, brought to you by OSIsoft. >> Hey, welcome back, everybody, Jeff Frick here with theCUBE, we're in downtown San Francisco at the OSIsoft PIWorld 2018. They've been doing it for like 28 years, it's amazing. We've never been here before, it's our first time and really these guys are all about OT, operational transactions. We talk about IoT and industrial IoT, they're doing it here. They're doing it for real and they've been doing it for decades so we're excited to have our next two guests. Tyler Duncan, he's a Technologist from Dell, Tyler, great to see you. >> Hi, thank you. >> He's joined by Ed Watson, the global account manager for channels for Osisoft. Or OSIsoft, excuse me. >> Glad to be here. Thanks, Jeff. >> I assume Dell's one of your accounts. >> Dell is one of my accounts as well as Nokia so-- >> Oh, very good. >> So there's a big nexus there. >> Yep, and we're looking forward to Dell Technology World next week, I think. >> Next week, yeah. >> I think it's the first Dell Technology not Dell EMC World with-- >> That's right. >> I don't know how many people are going to be there, 50,000 or something? >> There'll be a lot. >> There'll be a lot. (laughs) But that's all right, but we're here today... >> Yeah. >> And we're talking about industrial IoT and really what OSIsoft's been doing for a number of years, but what's interesting to me is from the IT side, we kind of look at industrial IoT as just kind of getting here and it's still kind of a new opportunity and looking at things like 5G and looking at things like IPE, ya know, all these sensors are now going to have IP connections on them. So, there's a whole new opportunity to marry the IT and the OT together. The nasty thing is we want to move it out of those clean pristine data centers and get it out to the edge of the nasty oil fields and the nasty wind turbine fields and crazy turbines and these things, so, Edge, what's special about the Edge? What are you guys doing to take care of the special things on the Edge? >> Well, a couple things, I think being out there in the nasty environments is where the money is. So, trying to collect data from the remote assets that really aren't connected right now. In terms of the Edge, you have a variety of small gateways that you can collect the data but what we see now is a move toward more compute at the Edge and that's where Dell comes in. >> Yeah, so I'm part of Dell's Extreme Scale and Structure Group, ESI, and specifically I'm part of our modular data center team. What that means is that for us we are helping to deploy compute out at the Edge and also at the core, but the challenges at the Edge is, you mentioned the kind of the dirty area, well, we can actually change that environment so that's it's not a dirty environment anymore. It's a different set of challenges. It may be more that it's remote, it's lights out, I don't have people there to maintain it, things like that, so it's not necessarily that it's dirty or ruggedized or that's it's high temperature or extreme environments, it just may be remote. >> Right, there's always this kind of balance in terms of, I assume it's all application specific as to what can you process there, what do you have to send back to process, there's always this nasty thing called latency and the speed of the light that just gets in the way all the time. So, how are you redesigning systems? How are you thinking about how much computing store do you put out on the Edge? How do you break up that you send back to central processing? How much do you have to keep? You know we all want to keep everything, it's probably a little bit more practical if you're keepin' it back in the data center versus you're tryin' to store it at the Edge. So how are you looking at some of these factors in designing these solutions? >> [Ed] Well, Jeff, those are good points. And where OSIsoft PI comes in, for the modular data center is to collect all the power cooling and IT data, aggregate it, send to the Cloud what needs to be sent to the Cloud, but enable Dell and their customers to make decisions right there on the Edge. So, if you're using modular data center or Telecom for cell towers or autonomous vehicles for AR VR, what we provide for Dell is a way to manage those modular data centers and when you're talking geographically dispersed modular data centers, it can be a real challenge. >> Yeah, and I think to add to that, there's, when we start lookin' at the Edge and the data that's there, I look at it as kind of two different purposes. There's one of why is that compute there in the first place. We're not defining that, we're just trying to enable our customers to be able to deploy compute however they need. Now when we start looking at our control system and the software monitoring analytics, absolutely. And what we are doing is we want to make sure that when we are capturing that data, we are capturing the right amount of data, but we're also creating the right tools and hooks in place in order to be able to update those data models as time goes on. >> [Jeff] Right. >> So, that we don't have worry about if we got it right on day one. It's updateable and we know that the right solution for one customer and the right data is not necessarily the right data for the next customer. >> [Jeff] Right. >> So we're not going to make the assumptions that we have it all figured out. We're just trying to design the solution so that it's flexible enough to allow customers to do whatever they need to do. >> I'm just curious in terms of, it's obviously important enough to give you guys your own name, Extreme Scale. What is Extreme Scale? 'Cause you said it isn't necessarily because it's dirty data and hardened and kind of environmentally. What makes an Extreme Scale opportunity for you that maybe some of your cohorts will bring you guys into an opportunity? >> Yeah so I think for the Extreme Scale part of it is, it is just doing the right engineering effort, provide the right solution for a customer. As opposed to something that is more of a product base that is bought off of dell.com. >> [Jeff] Okay. >> Everything we do is solution based and so it's listening to the customer, what their challenges are and trying to, again, provide that right solution. There are probably different levels of what's the right level of customization based off of how much that customer is buying. And sometimes that is adding things, sometimes it's taking things away, sometimes it's the remote location or sometimes it's a traditional data center. So our scrimpt scale infrastructure encompasses a lot of different verticals-- >> And are most of solutions that you develop kind of very customer specific or is there, you kind of come up with a solution that's more of an industry specific versus a customer specific? >> Yeah, we do, I would say everything we do is very customer specific. That's what our branch of Dell does. That said, as we start looking at more of the, what we're calling the Edge. I think ther6e are things that have to have a little more of a blend of that kind of product analysis, or that look from a product side. I'm no longer know that I'm deploying 40 megawatts in a particular location on the map, instead I'm deploying 10,000 locations all over the world and I need a solution that works in all of those. It has to be a little more product based in some of those, but still customized for our customers. >> And Jeff, we talked a little bit about scale. It's one thing to have scale in a data center. It's another thing to have scale across the globe. And, this is where PI excels, in that ability to manage that scale. >> Right, and then how exciting is it for you guys? You've been at it awhile, but it's not that long that we've had things like at Dupe and we've had things like Flink and we've had things like Spark, and kind of these new age applications for streaming data. But, you guys were extracting value from these systems and making course corrections 30 years ago. So how are some of these new technologies impacting your guys' ability to deliver value to your customers? >> Well I think the ecosystem itself is very good, because it allows customers to collect data in a way that they want to. Our ability to enable our customers to take data out of PI and put it into the Dupe, or put it into a data lake or an SAP HANA really adds significant value in today's ecosystem. >> It's pretty interesting, because I look around the room at all your sponsors, a lot of familiar names, a lot of new names as well, but in our world in the IT space that we cover, it's funny we've never been here before, we cover a lot of big shows like at Dell Technology World, so you guys have been doing your thing, has an ecosystem always been important for OSIsoft? It's very, very important for all the tech companies we cover, has it always been important for you? Or is it a relatively new development? >> I think it's always been important. I think it's more so now. No one company can do it all. We provide the data infrastructure and then allow our partners and clients to build solutions on top of it. And I think that's what sustains us through the years. >> Final thoughts on what's going on here today and over the last couple of days. Any surprises, hall chatter that you can share that you weren't expecting or really validates what's going on in this space. A lot of activity going on, I love all the signs over the building. This is the infrastructure that makes the rest of the world go whether it's power, transportation, what do we have behind us? Distribution, I mean it's really pretty phenomenal the industries you guys cover. >> Yeah and you know a lot of the sessions are videotaped so you can see Tyler from last year when he gave a presentation. This year Ebay, PayPal are giving presentations. And it's just a very exciting time in the data center industry. >> And I'll say on our side maybe not as much of a surprise, but also hearing the kind of the customer feedback on things that Dell and OSIsoft have partnered together and we work together on things like a Redfish connector in order to be able to, from an agnostic standpoint, be able to pull data from any server that's out there, regardless of brand, we're full support of that. But, to be able to do that in an automatic way that with their connector so that whenever I go and search for my range of IP addresses, it finds all the devices, brings all that data in, organizes it, and makes it ready for me to be able to use. That's a big thing and that's... They've been doing connectors for a while, but that's a new thing as far as being able to bring that and do that for servers. That, if I have 100,000 servers, I can't manually go get all those and bring them in. >> Right, right. >> So, being able to do that in an automatic way is a great enablement for the Edge. >> Yeah, it's a really refreshing kind of point of view. We usually look at it from the other side, from IT really starting to get together with the OT. Coming at it from the OT side where you have such an established customer base, such an established history and solution set and then again marrying that back to the IT and some of the newer things that are happening and that's exciting times. >> Yeah, absolutely. >> Yeah. >> Well thanks for spending a few minutes with us. And congratulations on the success of the show. >> Thank you. >> Thank you. >> Alright, he's Tyler, he's Ed, I'm Jeff. You're watching theCUBE from downtown San Francisco at OSIsoft PI WORLD 2018, thanks for watching. (light techno music)

Published Date : May 29 2018

SUMMARY :

covering OSIsoft PIWORLD 2018, brought to you by OSIsoft. excited to have our next two guests. the global account manager for channels Glad to be here. Yep, and we're looking forward to But that's all right, but we're here today... and get it out to the edge of the nasty oil fields In terms of the Edge, you have a variety of and also at the core, and the speed of the light that just for the modular data center is to collect and hooks in place in order to be able to for one customer and the right data is not necessarily so that it's flexible enough to allow customers it's obviously important enough to give you guys it is just doing the right engineering effort, and so it's listening to the customer, I think ther6e are things that have to have in that ability to manage that scale. Right, and then how exciting is it for you guys? because it allows customers to collect data We provide the data infrastructure and then allow the industries you guys cover. Yeah and you know a lot of the sessions are videotaped But, to be able to do that in an automatic way So, being able to do that in an automatic way and then again marrying that back to the IT And congratulations on the success of the show. at OSIsoft PI WORLD 2018, thanks for watching.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Tyler	PERSON	0.99+
Jeff Frick	PERSON	0.99+
OSIsoft	ORGANIZATION	0.99+
Ed Watson	PERSON	0.99+
Dell	ORGANIZATION	0.99+
PayPal	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Tyler Duncan	PERSON	0.99+
Nokia	ORGANIZATION	0.99+
40 megawatts	QUANTITY	0.99+
last year	DATE	0.99+
Next week	DATE	0.99+
Osisoft	ORGANIZATION	0.99+
10,000 locations	QUANTITY	0.99+
next week	DATE	0.99+
28 years	QUANTITY	0.99+
Dupe	ORGANIZATION	0.99+
Ebay	ORGANIZATION	0.99+
50,000	QUANTITY	0.99+
one	QUANTITY	0.99+
SAP HANA	TITLE	0.99+
Ed	PERSON	0.99+
100,000 servers	QUANTITY	0.99+
first	QUANTITY	0.99+
first time	QUANTITY	0.99+
Dell Technology	ORGANIZATION	0.99+
today	DATE	0.99+
This year	DATE	0.98+
dell.com	ORGANIZATION	0.98+
two	QUANTITY	0.98+
30 years ago	DATE	0.98+
Spark	TITLE	0.97+
Edge	ORGANIZATION	0.96+
ESI	ORGANIZATION	0.96+
Flink	ORGANIZATION	0.96+
theCUBE	ORGANIZATION	0.95+
Dell EMC World	ORGANIZATION	0.95+
one customer	QUANTITY	0.95+
OSIsoft PIWORLD 2018	EVENT	0.94+
two guests	QUANTITY	0.93+
Redfish	ORGANIZATION	0.92+
PI World 2018	EVENT	0.91+
Scale and Structure Group	ORGANIZATION	0.9+
OSIsoft PIWorld 2018	EVENT	0.87+
nexus	ORGANIZATION	0.86+
OSIsoft PI WORLD 2018	EVENT	0.85+
one thing	QUANTITY	0.83+
Dell Technology World	ORGANIZATION	0.8+
last couple of days	DATE	0.79+
decades	QUANTITY	0.75+
Extreme Scale	OTHER	0.72+
World	EVENT	0.71+
day one	QUANTITY	0.68+
OSIsoft PI	ORGANIZATION	0.68+

Tyler Duncan, Dell & Ed Watson, OSIsoft | PI World 2018

>> Announcer: From San Francisco, it's theCUBE covering OSIsoft PIWORLD 2018, brought to you by OSIsoft. >> Hey, welcome back, everybody, Jeff Frick here with theCUBE, we're in downtown San Francisco at the OSIsoft PIWorld 2018. They've been doing it for like 28 years, it's amazing. We've never been here before, it's our first time and really these guys are all about OT, operational transactions. We talk about IoT and industrial IoT, they're doing it here. They're doing it for real and they've been doing it for decades so we're excited to have our next two guests. Tyler Duncan, he's a Technologist from Dell, Tyler, great to see you. >> Hi, thank you. >> He's joined by Ed Watson, the global account manager for channels for Osisoft. Or OSIsoft, excuse me. >> Glad to be here. Thanks, Jeff. >> I assume Dell's one of your accounts. >> Dell is one of my accounts as well as Nokia so-- >> Oh, very good. >> So there's a big nexus there. >> Yep, and we're looking forward to Dell Technology World next week, I think. >> Next week, yeah. >> I think it's the first Dell Technology not Dell EMC World with-- >> That's right. >> I don't know how many people are going to be there, 50,000 or something? >> There'll be a lot. >> There'll be a lot. (laughs) But that's all right, but we're here today... >> Yeah. >> And we're talking about industrial IoT and really what OSIsoft's been doing for a number of years, but what's interesting to me is from the IT side, we kind of look at industrial IoT as just kind of getting here and it's still kind of a new opportunity and looking at things like 5G and looking at things like IPE, ya know, all these sensors are now going to have IP connections on them. So, there's a whole new opportunity to marry the IT and the OT together. The nasty thing is we want to move it out of those clean pristine data centers and get it out to the edge of the nasty oil fields and the nasty wind turbine fields and crazy turbines and these things, so, Edge, what's special about the Edge? What are you guys doing to take care of the special things on the Edge? >> Well, a couple things, I think being out there in the nasty environments is where the money is. So, trying to collect data from the remote assets that really aren't connected right now. In terms of the Edge, you have a variety of small gateways that you can collect the data but what we see now is a move toward more compute at the Edge and that's where Dell comes in. >> Yeah, so I'm part of Dell's Extreme Scale and Structure Group, ESI, and specifically I'm part of our modular data center team. What that means is that for us we are helping to deploy compute out at the Edge and also at the core, but the challenges at the Edge is, you mentioned the kind of the dirty area, well, we can actually change that environment so that's it's not a dirty environment anymore. It's a different set of challenges. It may be more that it's remote, it's lights out, I don't have people there to maintain it, things like that, so it's not necessarily that it's dirty or ruggedized or that's it's high temperature or extreme environments, it just may be remote. >> Right, there's always this kind of balance in terms of, I assume it's all application specific as to what can you process there, what do you have to send back to process, there's always this nasty thing called latency and the speed of the light that just gets in the way all the time. So, how are you redesigning systems? How are you thinking about how much computing store do you put out on the Edge? How do you break up that you send back to central processing? How much do you have to keep? You know we all want to keep everything, it's probably a little bit more practical if you're keepin' it back in the data center versus you're tryin' to store it at the Edge. So how are you looking at some of these factors in designing these solutions? >> Ed: Well, Jeff, those are good points. And where OSIsoft PI comes in, for the modular data center is to collect all the power cooling and IT data, aggregate it, send to the Cloud what needs to be sent to the Cloud, but enable Dell and their customers to make decisions right there on the Edge. So, if you're using modular data center or Telecom for cell towers or autonomous vehicles for AR VR, what we provide for Dell is a way to manage those modular data centers and when you're talking geographically dispersed modular data centers, it can be a real challenge. >> Yeah, and I think to add to that, there's, when we start lookin' at the Edge and the data that's there, I look at it as kind of two different purposes. There's one of why is that compute there in the first place. We're not defining that, we're just trying to enable our customers to be able to deploy compute however they need. Now when we start looking at our control system and the software monitoring analytics, absolutely. And what we are doing is we want to make sure that when we are capturing that data, we are capturing the right amount of data, but we're also creating the right tools and hooks in place in order to be able to update those data models as time goes on. >> Jeff: Right. >> So, that we don't have worry about if we got it right on day one. It's updateable and we know that the right solution for one customer and the right data is not necessarily the right data for the next customer. >> Jeff: Right. >> So we're not going to make the assumptions that we have it all figured out. We're just trying to design the solution so that it's flexible enough to allow customers to do whatever they need to do. >> I'm just curious in terms of, it's obviously important enough to give you guys your own name, Extreme Scale. What is Extreme Scale? 'Cause you said it isn't necessarily because it's dirty data and hardened and kind of environmentally. What makes an Extreme Scale opportunity for you that maybe some of your cohorts will bring you guys into an opportunity? >> Yeah so I think for the Extreme Scale part of it is, it is just doing the right engineering effort, provide the right solution for a customer. As opposed to something that is more of a product base that is bought off of dell.com. >> Jeff: Okay. >> Everything we do is solution based and so it's listening to the customer, what their challenges are and trying to, again, provide that right solution. There are probably different levels of what's the right level of customization based off of how much that customer is buying. And sometimes that is adding things, sometimes it's taking things away, sometimes it's the remote location or sometimes it's a traditional data center. So our scrimpt scale infrastructure encompasses a lot of different verticals-- >> And are most of solutions that you develop kind of very customer specific or is there, you kind of come up with a solution that's more of an industry specific versus a customer specific? >> Yeah, we do, I would say everything we do is very customer specific. That's what our branch of Dell does. That said, as we start looking at more of the, what we're calling the Edge. I think ther6e are things that have to have a little more of a blend of that kind of product analysis, or that look from a product side. I'm no longer know that I'm deploying 40 megawatts in a particular location on the map, instead I'm deploying 10,000 locations all over the world and I need a solution that works in all of those. It has to be a little more product based in some of those, but still customized for our customers. >> And Jeff, we talked a little bit about scale. It's one thing to have scale in a data center. It's another thing to have scale across the globe. And, this is where PI excels, in that ability to manage that scale. >> Right, and then how exciting is it for you guys? You've been at it awhile, but it's not that long that we've had things like at Dupe and we've had things like Flink and we've had things like Spark, and kind of these new age applications for streaming data. But, you guys were extracting value from these systems and making course corrections 30 years ago. So how are some of these new technologies impacting your guys' ability to deliver value to your customers? >> Well I think the ecosystem itself is very good, because it allows customers to collect data in a way that they want to. Our ability to enable our customers to take data out of PI and put it into the Dupe, or put it into a data lake or an SAP HANA really adds significant value in today's ecosystem. >> It's pretty interesting, because I look around the room at all your sponsors, a lot of familiar names, a lot of new names as well, but in our world in the IT space that we cover, it's funny we've never been here before, we cover a lot of big shows like at Dell Technology World, so you guys have been doing your thing, has an ecosystem always been important for OSIsoft? It's very, very important for all the tech companies we cover, has it always been important for you? Or is it a relatively new development? >> I think it's always been important. I think it's more so now. No one company can do it all. We provide the data infrastructure and then allow our partners and clients to build solutions on top of it. And I think that's what sustains us through the years. >> Final thoughts on what's going on here today and over the last couple of days. Any surprises, hall chatter that you can share that you weren't expecting or really validates what's going on in this space. A lot of activity going on, I love all the signs over the building. This is the infrastructure that makes the rest of the world go whether it's power, transportation, what do we have behind us? Distribution, I mean it's really pretty phenomenal the industries you guys cover. >> Yeah and you know a lot of the sessions are videotaped so you can see Tyler from last year when he gave a presentation. This year Ebay, PayPal are giving presentations. And it's just a very exciting time in the data center industry. >> And I'll say on our side maybe not as much of a surprise, but also hearing the kind of the customer feedback on things that Dell and OSIsoft have partnered together and we work together on things like a Redfish connector in order to be able to, from an agnostic standpoint, be able to pull data from any server that's out there, regardless of brand, we're full support of that. But, to be able to do that in an automatic way that with their connector so that whenever I go and search for my range of IP addresses, it finds all the devices, brings all that data in, organizes it, and makes it ready for me to be able to use. That's a big thing and that's... They've been doing connectors for a while, but that's a new thing as far as being able to bring that and do that for servers. That, if I have 100,000 servers, I can't manually go get all those and bring them in. >> Right, right. >> So, being able to do that in an automatic way is a great enablement for the Edge. >> Yeah, it's a really refreshing kind of point of view. We usually look at it from the other side, from IT really starting to get together with the OT. Coming at it from the OT side where you have such an established customer base, such an established history and solution set and then again marrying that back to the IT and some of the newer things that are happening and that's exciting times. >> Yeah, absolutely. >> Yeah. >> Well thanks for spending a few minutes with us. And congratulations on the success of the show. >> Thank you. >> Thank you. >> Alright, he's Tyler, he's Ed, I'm Jeff. You're watching theCUBE from downtown San Francisco at OSIsoft PI WORLD 2018, thanks for watching. (light techno music)

Published Date : Apr 28 2018

SUMMARY :

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Tyler	PERSON	0.99+
OSIsoft	ORGANIZATION	0.99+
Ed Watson	PERSON	0.99+
PayPal	ORGANIZATION	0.99+
Nokia	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Tyler Duncan	PERSON	0.99+
San Francisco	LOCATION	0.99+
last year	DATE	0.99+
40 megawatts	QUANTITY	0.99+
next week	DATE	0.99+
10,000 locations	QUANTITY	0.99+
Next week	DATE	0.99+
Osisoft	ORGANIZATION	0.99+
Ebay	ORGANIZATION	0.99+
28 years	QUANTITY	0.99+
50,000	QUANTITY	0.99+
Dupe	ORGANIZATION	0.99+
Ed	PERSON	0.99+
one	QUANTITY	0.99+
100,000 servers	QUANTITY	0.99+
first time	QUANTITY	0.99+
Dell Technology	ORGANIZATION	0.99+
first	QUANTITY	0.99+
today	DATE	0.99+
SAP HANA	TITLE	0.99+
This year	DATE	0.98+
two	QUANTITY	0.97+
dell.com	ORGANIZATION	0.97+
30 years ago	DATE	0.97+
Spark	TITLE	0.97+
Edge	ORGANIZATION	0.96+
ESI	ORGANIZATION	0.96+
theCUBE	ORGANIZATION	0.95+
Dell EMC World	ORGANIZATION	0.95+
Flink	ORGANIZATION	0.94+
one customer	QUANTITY	0.94+
OSIsoft PIWORLD 2018	EVENT	0.94+
Redfish	ORGANIZATION	0.93+
two guests	QUANTITY	0.92+
PI World 2018	EVENT	0.9+
Scale and Structure Group	ORGANIZATION	0.89+
OSIsoft PI WORLD 2018	EVENT	0.87+
last couple of days	DATE	0.86+
one thing	QUANTITY	0.84+
OSIsoft PIWorld 2018	EVENT	0.83+
nexus	ORGANIZATION	0.81+
Dell Technology World	ORGANIZATION	0.79+
decades	QUANTITY	0.77+
Extreme Scale	OTHER	0.76+
day one	QUANTITY	0.7+
OSIsoft PI	ORGANIZATION	0.66+
Scale	OTHER	0.54+

Chad Sakac, Dell EMC | Part II | VMworld 2017

(exciting upbeat music) >> Announcer: Live from Las Vegas. It's theCUBE. Covering VMworld 2017, brought to you by VMware, and its ecosystem partner. >> Okay, we're back in Las Vegas. This is VMworld 2017, and this is theCUBE. This is Dave Vellante with Peter Burris, and this is the second segment with Chad Sakac who's the president of Dell EMC. We're going to dig into what the cloud looks like in the next decade, you know, 2022 time frame. Chad, again, welcome back. Thanks for spending some time. >> It's great to be back. No one's got a crystal ball a decade out, but I think we've got a pretty good idea of what we think the next five years look like. >> Well, you know, we do like at Wikibon to look further out, and say okay, what are your assumptions about how the business is going to evolve, knowing that any kind of ten year forecast is going to be wrong. But it does shape your thinking and your assumptions. >> Yep. >> So what's your vision? What's Dell EMC's vision for how the cloud is going to evolve and shape, and look like in the next five years? >> I think the following things are a near certainty and they're driving our strategy. Basically customers will consume platforms. They will pick the best platform on a temporal basis and on a space basis. So time and space (chuckles) right? So I'll give you an example. Today if you said, "What is the best place and time "to do AI and machine learning "for work that is against data that is not in-house?" The answer would be Google cloud platform in a heartbeat. Their core capabilities as a platform around AI and machine learning are head and shoulders above everything else. Right? >> Yep. >> That's a platform that people consume. Likewise, if you said, "Okay, what's the platform that I use for my "applications that basically need a little more "traditional care and feeding around them?" That's going to be an evolution of the VMware stack that the customers are using today. It powers 80% of what they do today. It's the platform that runs the core of their business today, and that platform, as you can see this week, is expanding and expanding and expanding. Now what'll happen over the next decade is that platform will be independent of place. So if you imagine what we're going to do with that capability now, it's not an announcement, it's a platform that customers can buy around VMware cloud on AWS, you can see that we just broke the "Is something on or off?" is now not the question. The question is what's the right platform and services to use for a given set of workloads? >> I want to build on that for a second Chad, if I can. So the vision that I think you articulated the core experience is Look that what you love about the cloud is you love the get in small, grow fast, or grow according to the workload needs, >> Chad: Yep, elasticity. >> Yeah, don't lock in a whole bunch of financial assets. Lower assets, specificity, be able to apply it to a lot of different things. You love that. But the problem is, the physical, legal, and IP realities of your business dictate that you're not going to put it all in there. So the common experience is, get that dependent upon the workload, and have it all run simply in a straightforward manner that serves the business. Right? >> Bingo. So the word platform is independent of space. >> Right? >> Right. The other thing that I think we'll see over the next decade is that any technologies that bind multiple platforms together are incredibly compelling. And you can actually see this driving both the R and D strategy and the M and A strategy of the leaders, right? So let me give you an example of things that bind together platforms, and themselves are platforms. Cloud Foundry is one of the best binders and spanners that exists. Because people use Cloud Foundry on Azure, on AWS, on their own private cloud all day long. In fact, it won the award for basically, at Microsoft Ignite, for the most popular used thing on Azure outside Microsoft's own core services. So it's a binder. It gives customers mobility and flexibility across these different platforms. Another example, we're going all in on Kubernetes. We think that Kubernetes as the container abstraction that spans these different clouds is in essence, game over of chaos, and game beginning of standardization and movement forward. I'll give you another example. I think that ten years from now the debates that we're having around SDN today will be so over, and everyone will go, "Of course you're going to have a software-defined "network that abstracts," because networking is something that needs to span platforms. So, core idea number one, people will make platform choices and there will be multiple platforms. Those platforms will be independent of on off prem, independent of Capex, Opex choices. Those platforms will exist in all of those modes. >> But be tied to the characteristics, the benefits that they provide to workloads. >> Bingo. The library of connectors, of things that span and bind these platforms, will grow in value and importance to the customers. I'll give you another example of a binding thing that links together multiple platforms. And you can see its success even today. ServiceNow is the thing that binds and connects at the ITSM layer, all of these different topologies. So it's not just things that are all just in our family (chuckles) right? But you can see these ideas continuing to march forward. The thing that I think you'll also see is the explosion of the edge is going to create this whole world that is the opposite pendulum swing of centralization that you can kind of already see happening. The number of edge devices that will exist, the amount of data that they're going to need to process locally, and the amount of data that they're going to need to process that's centralized in one of these platforms is going to be immense. >> So the edge, does the edge create a new cloud? >> Yes. You know, people are already talking about that like it's the fog or whatever. Again, buzz words can sometimes make people underestimate very important things that are actually happening in an industry right now. The last thing I'll say is, and this is a dream and an aspiration, and a vision, but a dream and an aspiration. There are amazing problems to crack in the domain of security. And that itself needs to become a core platform element that transits all of these other platforms. >> Peter: That's a key binder. >> It's a binding element that has to transit all these different platforms that people consume. And I think you can see the edges of the industry, us tackling these problems in new ways, and I'm very hopeful about that actually. >> So the infrastructure requirements of that new cloud, customers have to make bets. We were talking about that earlier. There's new stack choices that are emerging. What's your point of view there and how does it all relate to bring it back to how you get from point A to point B? >> There's a great risk in saying stuff on camera Dave. (men chuckle) But you know-- >> Peter: But take the risk Chad. >> But to hell with it (laughs). See it here on theCUBE first. >> So look, I think that we're entering into an era of stack wars. And that sounds too militaristic. That's not what I mean. >> Peter: Let's call it stack competition. >> I think that what is happening is that the need for customers to choose platforms and make platform level bets in exchange for simplification and speed is basically forcing them, and it's forcing the market and everyone in it, including us, to think, what is our opinionated stack? That doesn't mean closed, right? However, even though there's open connections all over the place, increasingly you're seeing people take the Lego components and go (makes building sounds) This works with this, which works with this, which works with this, and they're built all together. And the thing that I'm finding, and I don't know whether you guys see this in your customer conversation. It's weird, people are schizophrenic. They're really worried about what that means for them on premises. Because they're used to hand-assembling everything under the sun, and then are frustrated it doesn't all work together easily, right? And yet, they have no issue at all about saying "I'm putting everything in, you know, "in Office 365." I was talking with a customer, with the procurement person, and you can imagine the procurement person's reaction when I say, "I think that the world is moving "towards vertically integrated stacks." And there is decidedly an open ecosystem, but also an opinionated, pivotal, VMware, Dell, EMC stack. A Dell technology stack. The procurement guy lost his mind. He did not like to hear that from me. >> Of course. >> He started to get angry. >> Well, would you rather have what occurred with the Dupe? >> Yeah, and-- (Dave laughs) >> Well, what he wants is he is being told, "You got to take "five points off of every transaction." >> Yeah, of course. >> And he wants to see all these transactions be distinct, and what you're saying, Chad, is that we're moving where the transactions start to accrete value, accrete strategic importance, >> Yep. >> and accrete risk. And the procurement guy's looking at that saying (makes terrified sound) But it requires hard core, realistic vendor management that's well-defined and treated by the business as an asset. >> I think that we're entering into an era of consolidation. Customers are going to have to make platform bets that are business bets. >> For themselves. >> That's right. >> So bring it back to a topic that is more 2017, hyperconverged infrastructure. >> Chad: Yep. >> Is that the model for the future cloud? Or does it need to go beyond that? Beyond the virtual machine parlance that we tend to talk about? >> So, we have years of experience working with customers, trying to build clouds out of traditional infrastructure stacks. >> Dave: Right. >> And we're there as their partner to make it work. It is freaking hard! Frankly, nearly impossible. And again, they talk to vendor after vendor who's like, "Buy our new cloud management platform "and we'll be able to automate all of your crapola." >> Buy our hammer, and we'll fix all your cloud nails. >> And the reality of it is that every layer that you build one of these stacks on, the more variation that you have at this layer, it complicates the life cycle management of this layer. And then the more variation you have at this layer, the more it complicates the life cycle management of this layer. And that's what I mean about the stackification where the stacks are starting to bowl together. Driven not by vendor, but driven by customer need for simplification and speed. >> Peter: And workload. >> They're just not consciously making the connection yet that says it's time for me to make strategic choices. Right? So hyperconverge infrastructure has proven an ability. It's no longer in weirdo VDI only use cases (laughs). It is now proving itself to be a material simplification at the bottom layer of the stack. And it's not rocket science. It is basically the same lesson that hyperscalers and SASS startups realized, which is that you need to have something which is much more industry standard, much more software defined, much more rigid in a sense about how it's constructed so you actually life cycle it and make the next stack up simpler. >> All right, so we got to wrap. Let me summarize what I heard, and maybe you guys can fill in any gaps. So platforms essentially be products is what I heard. Those are my words not yours. >> Totally, yeah. >> And platforms will be place-independent, and a key value creator will be this binding platforms together. >> Chad: Yes. >> Which is going to become very very compelling. You gave the example of Cloud Foundry, Kubo, Kubernetes. >> I'll give you one more, Boomi. >> Boomi, and even SDN which is basically a fait de complis >> Yeah. >> is essentially what you're saying. An explosion at the edge will create a new cloud. The infrastructure requirements are going to evolve to support that cloud. And security is going to be a core platform element, a key binder as you said. Anything I missed? >> And that literally, customers have to be as simple as they can. And what they need to accept, and make choices, I'm not forcing them down the path with us or whomever. They need to accept that simplicity and speed means choosing platforms and platform partners. >> So here's what I'd add. 'Cause I think you're right Chad. I would add just a couple of refinements, that the quality of the platform is going to be a function of how well it binds. >> Chad: Yep. >> And that that security becomes a crucial binder. And the other thing that I'd say is that the edge, it's not so much a new cloud. I hate the term fog. >> Yeah. >> Because if there's anywhere where business is going to need clarity, it's going to be at the edge. >> I totally agree. >> That's a vendor way of looking at things. The customer way is, "I need clarity here you guys. "Don't talk to me about cloud." In fact, we like to say that when Andreessen said, "Software's going to eat the world," the right way of saying it, "Software's going to eat the edge." >> Right. >> That the edge is going to make a lot more of these choices clear. >> And just, again, I know we got to go but, It always sounds like hyperbole. The amount of stuff that we're doing around trying to make the edge clear, like basically the EdgeX Foundry, which is basically trying to standardize this mess of proprietary protocols and devices. That stuff is happening like now. The Pulse IoT stuff that we talked about, that's happening now. But those are just in early, early days. If you look out over a few years, that stuff will be a new platform. >> That's absolutely right. >> Yeah. And Dell hasn't fully played it's edge card, I suspect. >> We will see more there. >> Yeah. >> All right, Chad, first of all, awesome content. Peter, thank you very much. Virtual Geek is Chad's blog. If you're into this stuff, go subscribe to that. It's a fantastic resource. >> Thanks man. >> So thanks again. Really appreciate it. >> My pleasure guys. >> All right, keep right there everybody. We'll be back with our next guest. This is theCUBE. We're live from VMworld 2017 from Las Vegas. We'll be right back. (electronic music)

Published Date : Aug 29 2017

SUMMARY :

Covering VMworld 2017, brought to you by VMware, We're going to dig into what the cloud looks like It's great to be back. how the business is going to evolve, So I'll give you an example. and that platform, as you can see this week, So the vision that I think you articulated that serves the business. So the word platform is independent of space. is something that needs to span platforms. the benefits that they provide to workloads. and the amount of data that they're going to And that itself needs to become a core platform element It's a binding element that has to and how does it all relate to bring it back But you know-- But to hell with it (laughs). And that sounds too militaristic. is that the need for customers to choose platforms is he is being told, "You got to take And the procurement guy's looking at that saying Customers are going to have to make So bring it back to a topic that So, we have years of experience And again, they talk to vendor after vendor who's like, the more variation that you have at this layer, that says it's time for me to make strategic choices. and maybe you guys can fill in any gaps. and a key value creator will be Which is going to become very very compelling. And security is going to be a core platform element, And that literally, customers have to be that the quality of the platform that the edge, it's not so much a new cloud. it's going to be at the edge. the right way of saying it, That the edge is going to make The amount of stuff that we're doing And Dell hasn't fully played it's edge card, I suspect. Peter, thank you very much. So thanks again. This is theCUBE.

ENTITIES

Entity	Category	Confidence
Chad Sakac	PERSON	0.99+
Peter Burris	PERSON	0.99+
Dave	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter	PERSON	0.99+
80%	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Today	DATE	0.99+
Dell	ORGANIZATION	0.99+
five points	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Chad	PERSON	0.99+
2017	DATE	0.99+
second segment	QUANTITY	0.99+
ten year	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Andreessen	PERSON	0.99+
Cloud Foundry	TITLE	0.99+
Lego	ORGANIZATION	0.98+
Part II	OTHER	0.98+
Office 365	TITLE	0.98+
both	QUANTITY	0.98+
VMworld 2017	EVENT	0.98+
2022	DATE	0.98+
today	DATE	0.98+
this week	DATE	0.97+
Kubo	ORGANIZATION	0.97+
Wikibon	ORGANIZATION	0.97+
EMC	ORGANIZATION	0.96+
Capex	ORGANIZATION	0.96+
Kubernetes	ORGANIZATION	0.95+
Opex	ORGANIZATION	0.95+
first	QUANTITY	0.95+
Kubernetes	TITLE	0.95+
next decade	DATE	0.94+
Azure	TITLE	0.94+
one	QUANTITY	0.93+
Google	ORGANIZATION	0.92+
second	QUANTITY	0.92+
Cloud Foundry	ORGANIZATION	0.91+
ServiceNow	TITLE	0.91+
Dupe	PERSON	0.86+
SDN	ORGANIZATION	0.85+
next five years	DATE	0.83+
Microsoft Ignite	ORGANIZATION	0.78+
ten years	QUANTITY	0.78+
Boomi	ORGANIZATION	0.77+
theCUBE	ORGANIZATION	0.76+
years	QUANTITY	0.66+
VMware	TITLE	0.64+
Pulse	ORGANIZATION	0.57+
Geek	ORGANIZATION	0.53+
EdgeX	ORGANIZATION	0.52+
Chad	ORGANIZATION	0.51+
Virtual	TITLE	0.47+

Mark Grover & Jennifer Wu | Spark Summit 2017

>> Announcer: Live from San Francisco, it's the Cube covering Spark Summit 2017, brought to you by databricks. >> Hi, we're back here where the Cube is live, and I didn't even know it Welcome, we're at Spark Summit 2017. Having so much fun talking to our guests I didn't know the camera was on. We are doing a talk with Cloudera, a couple of experts that we have here. First is Mark Grover, who's a software engineer and an author. He wrote the book, "Dupe Application Architectures." Mark, welcome to the show. >> Mark: Thank you very much. Glad to be here. And just to his left we also have Jennifer Wu, and Jennifer's director of product management at Cloudera. Did I get that right? >> That's right. I'm happy to be here, too. >> Alright, great to have you. Why don't we get started talking a little bit more about what Cloudera is maybe introducing new at the show? I saw a booth over here. Mark, do you want to get started? >> Mark: Yeah, there are two exciting things that we've launched at least recently. There Cloudera Altus, which is for transient work loads and being able to do ETL-Like workloads, and Jennifer will be happy to talk more about that. And then there's Cloudera data science workbench, which is this tool that allows folks to use data science at scale. So, get away from doing data science in silos on your personal laptops, and do it in a secure environment on cloud. >> Alright, well, let's jump into Data Science Workbench first. Tell me a little bit more about that, and you mentioned it's for exploratory data science. So give us a little more detail on what it does. >> Yeah, absolutely. So, there was private beta for Cloudera Data Science Workbench earlier in the year and then it was GA a few months ago. And it's like you said, an exploratory data science tool that brings data science to the masses within an enterprise. Previously people used to have, it was this dichotomy, right? As a data scientist, I want to have the latest and greatest tools. I want to use the latest version of Python, the latest notebook kernel, and I want to be able to use R and Python to be able to crunch this data and run my models in machine learning. However, on the other side of this dichotomy are the IT organization of the organization, where if they want to make sure that all tools are compliant and that your clusters are secure, and your data is not going into places that are not secured by state of the art security solutions, like Kerberos for example, right? And of course if the data scientists are putting the data on their laptops and taking the laptop around to wherever they go, that's not really a solution. So, that was one problem. And the other one was if you were to bring them all together in the same solution, data scientists have different requirements. One may want to use Python 2.6. Another one maybe want to use 3.2, right? And so Cloudera Data Science Workbench is a new product that allows data scientists to visualize and do machine learning through this very nice notebook-like interface, share their work with the rest of their colleagues in the organization, but also allows you to keep your clusters secure. So it allows you to run against a Kerberized cluster, allows single sign on to your web interface to Data Science Workbench, and provides a really nice developer experience in the sense that My workflow and my tools and my version of Python does not conflict with Jennifer's version of Python. We all have our own docker and Kubernetes-based infrastructure that makes sure that we use the packages that we need, and they don't interfere with each other. We're going to go to Jennifer on Altus in just a few minutes, but George first give you a chance to maybe dig in on Data Science workshop. >> Two questions on the data science side: some of the really toughest nuts to crack have been Sort of a common environment for the collaborators, but also the ability to operationalize the models once you've sort of agreed on them, and manage the lifecycle across teams, you know? Like, challenger champion, promote something, or even before that doing the ab testing, and then sort of what's in production is typically in a different language from what, you know, it was designed in and sort of integrating it with the apps. Where is that on the road map? Cause no one really has a good answer for that. >> Yeah, that's an excellent question. In general I think it's the problem to crack these days. How do you productionalize something that was written by a data scientist in a notebook-like system onto the production cluster, right? And I think the part where the data scientist works in a different language than the language that's in production, I think that problem, the best I can say right now is to actually have someone rewrite that. Have someone rewrite that in the language you're going to make in production, right? I don't see that to be the more common part. I think the more widespread problem is even when the language is production, how do you go making the part that the data scientist wrote, the model or whatever that would be, into a prodution cluster? And so, Data Science Workbench in particular runs on the same cluster that is being managed by Cloudera manager, right? So this is a tool that you install, but that is available to you as a web server, as a web interface, and so that allows you to move your development machine learning algorithms from your data science workbench to production much more easier, because it's all running on the same hardware and same systems. There's no separate Cloudera managers that you have to use to manage the workbench compared to your actual cluster. >> Okay. A tangential question, but one of the, the difficulties of doing machine learning is finding all the training data and, and sort of data science expertise to sit with the domain expert to, you know, figure out proper model of features, things like that. One of the things we've seen so far from the cloud vendors is they take their huge datasets in terms of voice, you know, images. They do the natural language understanding, speech or rather text to speech, you know, facial recognition. Cause they have such huge datasets they can train on. We're hearing noises that they'd going to take that down to the more mundane statistical kind of machine learning algorithms, so that you wouldn't be, like, here's a algorithm to do churn, you know, go to town, but that they might have something that's already kind of pre-populated that you would just customize. Is that something that you guys would tackle, too? >> I can't speak for the road map in that sense, but I think some of that problem needs to be tackled by projects like Spark for example. So I think as the stack matures, it's going to raise the level of abstraction as time goes on. And I think whatever benefits Spark ecosystem will have will come directly to distributions like Cloudera. >> George: That's interesting. >> Yeah >> Okay >> Alright, well let's go to Jennifer now and talk about Altus a little bit. Now you've been on the Cube show before, right? >> I have not. >> Okay, well, familiar with your work. Tell us again, you're the product manager for Altus. What does it do, and what was the motivation to build it? >> Yeah, we're really excited about Cloudera Altus. So, we released Cloudera Altus in its first GA form in April, and we launched Cloudera Altus in a public environment in Strata London about two weeks ago, so we're really excited about this and we are very excited to now open this up to all of the customer base. And what it is is a platform as a service offering designed to leverage, basically, the agility and the scale of cloud, and make a very easy to use type of experience to expose Cloudera capacity for, in particular for data engineering type of workloads. So the end user will be able to very easily, in a very agile manner, get data engineering capacity on Cloudera in the cloud, and they'll be able to do things like ETL and large scale data processing, productionized machine learning workflows in the cloud with this new data engineering as a service experience. And we wanted to abstract away the cloud, and cluster operations, and make the end user a really, the end user experience very easy. So, jobs and workloads as first class objects. You can do things like submit jobs, clone jobs, terminate jobs, troubleshoot jobs. We wanted to make this very, very easy for the data engineering end user. >> It does sound like you've sort of abstracted away a lot of the infrastructure that you would associate with on-prem, and sort of almost make it, like, programmable and invisible. But, um, I guess my, one of my questions is when you put it in a cloud environment, when you're on-prem you have a certain set of competitors which is kind of restrictive, because you are the standalone platform. But when you go on the cloud, someone might say, "I want to use red shift on Amazon," or Snowflake, you know, as the MPP sequel database at the end of a pipeline. And it's not just, I'm using those as examples. There's, you know, dozens, hundreds, thousands of other services to choose from. >> Yes. >> What happens to the integrity of that platform if someone carves off one piece? >> Right. So, interoperability and a unified data pipeline is very important to us, so we want to make sure that we can still service the entire data pipeline all the way from ingest and data processing to analytics. So our team has 24 different open source components that we deliver in the CDH distribution, and we have committers across the entire stack. We know the application, and we want to make sure that everything's interoperable, no matter how you deploy the cluster. So if you deploy data engineering clusters through Cloudera Altus, but you deployed Impala clusters for data marks in the cloud through Cloudera Director or through any other format, we want all these clusters to be interoperable, and we've taken great pains in order to make everything work together well. >> George: Okay. So how do Altus and Sata Science Workbench interoperate with Spark? Maybe start with >> You want to go first with Altus? >> Sure, so, we, in terms of interoperability we focus on things like making sure there are no data silos so that the data that you use for your entire data lake can be consumed by the different components in our system, the different compute engines and different tools, and so if you're processing data you can also look at this data and visualize this data through Data Science Workbench. So after you do data ingestion and data processing, you can use any of the other analytic tools and then, and this includes Data Science Workbench. >> Right, and for Data Science Workbench runs, for example, with the latest version of Spark you could pick, the currently latest released version of Spark, Spark 2.1, Spark 2.2 is being boarded of course, and that will soon be integrated after its release. For example you could use Data Science Workbench with your flavor of Spark two's version and you can run PySpark or Scala jobs on this notebook-like interface, be able to share your work, and because you're using Spark Underneath the hood it uses yarn for resource management, the Data Science Workbench itself uses Docker for configuration management, and Kubernetes for resource managing these Docker containers. >> What would be, if you had to describe sort of the edge conditions and the sweet spot of the application, I mean you talked about data engineering. One thing, we were talking to Matei Zaharia and Ronald Chin about was, and Ali Ghodsi as well was if you put Spark on a database, or at least a, you know, sophisticated storage manager, like Kudu, all of a sudden there're a whole new class of jobs or applications that open up. Have you guys thought about what that might look like in the future, and what new applications you would tackle? >> I think a lot of that benefit, for example, could be coming from the underlying storage engine. So let's take Spark on Kudu, for example. The inherent characteristics of Kudu today allow you to do updates without having to either deal with the complexity of something like Hbase, or the crappy performance of dealing HDFS compactions, right? So the sweet spot comes from Kudu's capabilities. Of course it doesn't support transactions or anything like that today, but imagine putting something like Spark and being able to use the machine learning libraries and, we have been limited so far in the machine learning algorithms that we have implemented in Spark by the storage system sometimes, and, for example new machine learning algorithms or the existing ones could rewritten to make use of the update features for example, in Kudu. >> And so, it sounds like it makes it, the machine learning pipeline might get richer, but I'm not hearing that, and maybe this isn't sort of in the near term sort of roadmap, the idea that you would build sort of operational apps that have these sophisticated analytics built in, you know, where the analytics, um, you've done the training but at run time, you know, the inferencing influences a transaction, influences a decision. Is that something that you would foresee? >> I think that's totally possible. Again, at the core of it is the part that now you have one storage system that can do scans really well, and it can also do random reads and writes any place, right? So as your, and so that allows applications which were previously siloed because one appication that ran off of HDFS, another application that ran out of Hbase, and then so you had to correlate them to just being one single application that can use to train and then also use their trained data to then make decisions on the new transactions that come in. >> So that's very much within the sort of scope of imagination, or scope. That's part of sort of the ultimate plan? >> Mark: I think it's definitely conceivable now, yeah. >> Okay. >> We're up against a hard break coming up in just a minute, so you each get a 30-second answer here, so it's the same question. You've been here for a day and a half now. What's the most surprising thing you've learned that you thing should be shared more broadly with the Spark community? Let's start with you. >> I think one of the great things that's happening in Spark today is people have been complaining about latency for a long time. So if you saw the keynote yesterday, you would see that Spark is making forays into reducing that latency. And if you are interested in Spark, using Spark, it's very exciting news. You should keep tabs on it. We hope to deliver lower latency as a community sooner. >> How long is one millisecond? (Mark laughs) >> Yeah, I'm largely focused on cloud infrastructure and I found here at the conference that, like, many many people are very much prepared to actually start taking more, you know, more POCs and more interest in cloud and the response in terms of all of this in Altus has been very encouraging. >> Great. Well, Jennifer, Mark, thank you so much for spending some time here on the Cube with us today. We're going to come by your booth and chat a little bit more later. It's some interesting stuff. And thank you all for watching the Cube today here at Spark Summit 2017, and thanks to Cloudera for bringing us these two experts. And thank you for watching. We'll see you again in just a few minutes with our next interview.

Published Date : Jun 7 2017

SUMMARY :

covering Spark Summit 2017, brought to you by databricks. I didn't know the camera was on. And just to his left we also have Jennifer Wu, I'm happy to be here, too. Mark, do you want to get started? and being able to do ETL-Like workloads, and you mentioned it's for exploratory data science. And the other one was if you were to bring them all together and manage the lifecycle across teams, you know? and so that allows you to move your development machine the domain expert to, you know, I can't speak for the road map in that sense, and talk about Altus a little bit. to build it? on Cloudera in the cloud, and they'll be able to do things a lot of the infrastructure that you would associate with We know the application, and we want to make sure Maybe start with so that the data that you use for your entire data lake and you can run PySpark in the future, and what new applications you would tackle? or the existing ones could rewritten to make use the idea that you would build sort of operational apps Again, at the core of it is the part that now you have That's part of sort of the ultimate plan? that you thing should be shared more broadly So if you saw the keynote yesterday, you would see that and the response in terms of all of this on the Cube with us today.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Mark Grover	PERSON	0.99+
Jennifer Wu	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
George	PERSON	0.99+
Mark	PERSON	0.99+
April	DATE	0.99+
Ronald Chin	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei Zaharia	PERSON	0.99+
30-second	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Dupe Application Architectures	TITLE	0.99+
dozens	QUANTITY	0.99+
Python	TITLE	0.99+
yesterday	DATE	0.99+
Two questions	QUANTITY	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
two experts	QUANTITY	0.99+
a day and a half	QUANTITY	0.99+
First	QUANTITY	0.99+
one problem	QUANTITY	0.99+
Python 2.6	TITLE	0.99+
Strata London	LOCATION	0.99+
one piece	QUANTITY	0.99+
first	QUANTITY	0.98+
Spark Summit 2017	EVENT	0.98+
Cloudera Altus	TITLE	0.98+
Scala	TITLE	0.98+
Docker	TITLE	0.98+
One	QUANTITY	0.97+
Kudu	ORGANIZATION	0.97+
one millisecond	QUANTITY	0.97+
PySpark	TITLE	0.96+
R	TITLE	0.95+
one	QUANTITY	0.95+
two weeks ago	DATE	0.93+
Data Science Workbench	TITLE	0.92+
Cloudera	TITLE	0.91+
hundreds	QUANTITY	0.89+
Hbase	TITLE	0.89+
each	QUANTITY	0.89+
24 different open source components	QUANTITY	0.89+
few months ago	DATE	0.89+
single	QUANTITY	0.88+
kernel	TITLE	0.88+
Altus	TITLE	0.88+

Holden Karau, IBM Big Data SV 17 #BigDataSV #theCUBE

>> Announcer: Big Data Silicon Valley 2017. >> Hey, welcome back, everybody, Jeff Frick here with The Cube. We are live at the historic Pagoda Lounge in San Jose for Big Data SV, which is associated with Strathead Dupe World, across the street, as well as Big Data week, so everything big data is happening in San Jose, we're happy to be here, love the new venue, if you're around, stop by, back of the Fairmount, Pagoda Lounge. We're excited to be joined in this next segment by, who's now become a regular, any time we're at a Big Data event, a Spark event, Holden always stops by. Holden Karau, she's the principal software engineer at IBM. Holden, great to see you. >> Thank you, it's wonderful to be back yet again. >> Absolutely, so the big data meme just keeps rolling, Google Cloud Next was last week, a lot of talk about AI and ML and of course you're very involved in Spark, so what are you excited about these days? What are you, I'm sure you've got a couple presentations going on across the street. >> Yeah, so my two presentations this week, oh wow, I should remember them. So the one that I'm doing today is with my co-worker Seth Hendrickson, also at IBM, and we're going to be focused on how to use structured streaming for machine learning. And sort of, I think that's really interesting, because streaming machine learning is something a lot of people seem to want to do but aren't yet doing in production, so it's always fun to talk to people before they've built their systems. And then tomorrow I'm going to be talking with Joey on how to debug Spark, which is something that I, you know, a lot of people ask questions about, but I tend to not talk about, because it tends to scare people away, and so I try to keep the happy going. >> Jeff: Bugs are never fun. >> No, no, never fun. >> Just picking up on that structured streaming and machine learning, so there's this issue of, as we move more and more towards the industrial internet of things, like having to process events as they come in, make a decision. How, there's a range of latency that's required. Where does structured streaming and ML fit today, and where might that go? >> So structured streaming for today, latency wise, is probably not something I would use for something like that right now. It's in the like sub second range. Which is nice, but it's not what you want for like live serving of decisions for your car, right? That's just not going to be feasible. But I think it certainly has the potential to get a lot faster. We've seen a lot of renewed interest in ML liblocal, which is really about making it so that we can take the models that we've trained in Spark and really push them out to the edge and sort of serve them in the edge, and apply our models on end devices. So I'm really excited about where that's going. To be fair, part of my excitement is someone else is doing that work, so I'm very excited that they're doing this work for me. >> Let me clarify on that, just to make sure I understand. So there's a lot of overhead in Spark, because it runs on a cluster, because you have an optimizer, because you have the high availability or the resilience, and so you're saying we can preserve the predict and maybe serve part and carve out all the other overhead for running in a very small environment. >> Right, yeah. So I think for a lot of these IOT devices and stuff like that it actually makes a lot more sense to do the predictions on the device itself, right. These models generally are megabytes in size, and we don't need a cluster to do predictions on these models, right. We really need the cluster to train them, but I think for a lot of cases, pushing the prediction out to the edge node is actually a pretty reasonable use case. And so I'm really excited that we've got some work going on there. >> Taking that one step further, we've talked to a bunch of people, both like at GE, and at their Minds and Machines show, and IBM's Genius of Things, where you want to be able to train the models up in the cloud where you're getting data from all the different devices and then push the retrained model out to the edge. Can that happen in Spark, or do we have to have something else orchestrating all that? >> So actually pushing the model out isn't something that I would do in Spark itself, I think that's better served by other tools. Spark is not really well suited to large amounts of internet traffic, right. But it's really well suited to the training, and I think with ML liblocal it'll essentially, we'll be able to provide both sides of it, and the copy part will be left up to whoever it is that's doing their work, right, because like if you're copying over a cell network you need to do something very different as if you're broadcasting over a terrestrial XM or something like that, you need to do something very different for satellite. >> If you're at the edge on a device, would you be actually running, like you were saying earlier, structured streaming, with the prediction? >> Right, I don't think you would use structured streaming per se on the edge device, but essentially there would be a lot of code share between structured streaming and the code that you'd be using on the edge device. And it's being vectored out now so that we can have this code sharing and Spark machine learning. And you would use structured streaming maybe on the training side, and then on the serving side you would use your custom local code. >> Okay, so tell us a little more about Spark ML today and how we can democratize machine learning, you know, for a bigger audience. >> Right, I think machine learning is great, but right now you really need a strong statistical background to really be able to apply it effectively. And we probably can't get rid of that for all problems, but I think for a lot of problems, doing things like hyperparameter tuning can actually give really powerful tools to just like regular engineering folks who, they're smart, but maybe they don't have a strong machine learning background. And Spark's ML pipelines make it really easy to sort of construct multiple stages, and then just be like, okay, I don't know what these parameters should be, I want you to do a search over what these different parameters could be for me, and it makes it really easy to do this as just a regular engineer with less of an ML background. >> Would that be like, just for those of us who are, who don't know what hyperparameter tuning is, that would be the knobs, the variables? >> Yeah, it's going to spin the knobs on like our regularization parameter on like our regression, and it can also spin some knobs on maybe the engram sizes that we're using on the inputs to something else, right. And it can compare how these knobs sort of interact with each other, because often you can tune one knob but you actually have six different knobs that you want to tune and you don't know, if you just explore each one individually, you're not going to find the best setting for them working together. >> So this would make it easier for, as you're saying, someone who's not a data scientist to set up a pipeline that lets you predict. >> I think so, very much. I think it does a lot of the, brings a lot of the benefits from sort of the SciPy world to the big data world. And SciPy is really wonderful about making machine learning really accessible, but it's just not ready for big data, and I think this does a good job of bringing these same concepts, if not the code, but the same concepts, to big data. >> The SciPy, if I understand, is it a notebook that would run essentially on one machine? >> SciPy can be put in a notebook environment, and generally it would run on, yeah, a single machine. >> And so to make that sit on Spark means that you could then run it on a cluster-- >> So this isn't actually taking SciPy and distributing it, this is just like stealing the good concepts from SciPy and making them available for big data people. Because SciPy's done a really good job of making a very intuitive machine learning interface. >> So just to put a fine sort of qualifier on one thing, if you're doing the internet of things and you have Spark at the edge and you're running the model there, it's the programming model, so structured streaming is one way of programming Spark, but if you don't have structured streaming at the edge, would you just be using the core batch Spark programming model? >> So at the edge you'd just be using, you wouldn't even be using batch, right, because you're trying to predict individual events, right, so you'd just be calling predict with every new event that you're getting in. And you might have a q mechanism of some type. But essentially if we had this batch, we would be adding additional latency, and I think at the edge we really, the reason we're moving the models to the edge is to avoid the latency. >> So just to be clear then, is the programming model, so it wouldn't be structured streaming, and we're taking out all the overhead that forced us to use batch with Spark. So the reason I'm trying to clarify is a lot of people had this question for a long time, which is are we going to have a different programming model at the edge from what we have at the center? >> Yeah, that's a great question. And I don't think the answer is finished yet, but I think the work is being done to try and make it look the same. Of course, you know, trying to make it look the same, this is Boosh, it's not like actually barking at us right now, even though she looks like a dog, she is, there will always be things which are a little bit different from the edge to your cluster, but I think Spark has done a really good job of making things look very similar on single node cases to multi node cases, and I think we can probably bring the same things to ML. >> Okay, so it's almost time, we're coming back, Spark took us from single machine to cluster, and now we have to essentially bring it back for an edge device that's really light weight. >> Yeah, I think at the end of the day, just from a latency point of view, that's what we have to do for serving. For some models, not for everyone. Like if you're building a website with a recommendation system, you don't need to serve that model like on the edge node, that's fine, but like if you've got a car device we can't depend on cell latency, right, you have to serve that in car. >> So what are some of the things, some of the other things that IBM is contributing to the ecosystem that you see having a big impact over the next couple years? >> So there's a lot of really exciting things coming out of IBM. And I'm obviously pretty biased. I spend a lot of time focused on Python support in Spark, and one of the most exciting things is coming from my co-worker Brian, I'm not going to say his last name in case I get it wrong, but Brian is amazing, and he's been working on integrating Arrow with Spark, and this can make it so that it's going to be a lot easier to sort of interoperate between JVM languages and Python and R, so I'm really optimistic about the sort of Python and R interfaces improving a lot in Spark and getting a lot faster as well. And we're also, in addition to the Arrow work, we've got some work around making it a lot easier for people in R and Python to get started. The R stuff is mostly actually the Microsoft people, thanks Felix, you're awesome. I don't actually know which camera I should have done that to but that's okay. >> I think you got it! >> But Felix is amazing, and the other people working on R are too. But I think we've both been pursuing sort of making it so that people who are in the R or Python spaces can just use like Pit Install, Conda Install, or whatever tool it is they're used to working with, to just bring Spark into their machine really easily, just like they would sort of any other software package that they're using. Because right now, for someone getting started in Spark, if you're in the Java space it's pretty easy, but if you're in R or Python you have to do sort of a lot of weird setup work, and it's worth it, but like if we can get rid of that friction, I think we can get a lot more people in these communities using Spark. >> Let me see, just as a scenario, the R server is getting fairly well integrated into Sequel server, so would it be, would you be able to use R as the language with a Spark execution engine to somehow integrate it into Sequel server as an execution engine for doing the machine learning and predicting? >> You definitely, well I shouldn't say definitely, you probably could do that. I don't necessarily know if that's a good idea, but that's the kind of stuff that this would enable, right, it'll make it so that people that are making tools in R or Python can just use Spark as another library, right, and it doesn't have to be this really special setup. It can just be this library and they point out the cluster and they can do whatever work it wants to do. That being said, the Sequel server R integration, if you find yourself using that to do like distributed computing, you should probably take a step back and like rethink what you're doing. >> George: Because it's not really scale out. >> It's not really set up for that. And you might be better off doing this with like, connecting your Spark cluster to your Sequel server instance using like JDBC or a special driver and doing it that way, but you definitely could do it in another inverted sort of way. >> So last question from me, if you look out a couple years, how will we make machine learning accessible to a bigger and bigger audience? And I know you touched on the tuning of the knobs, hyperparameter tuning, what will it look like ultimately? >> I think ML pipelines are probably what things are going to end up looking like. But I think the other part that we'll sort of see is we'll see a lot more examples of how to work with certain kinds of data, because right now, like, I know what I need to do when I'm ingesting some textural data, but I know that because I spent like a week trying to figure out what the hell I was doing once, right. And I didn't bother to write it down. And it looks like no one else bothered to write it down. So really I think we'll see a lot of tools that look very similar to the tools we have today, they'll have more options and they'll be a bit easier to use, but I think the main thing that we're really lacking right now is good documentation and sort of good books and just good resources for people to figure out how to use these tools. Now of course, I mean, I'm biased, because I work on these tools, so I'm like, yeah, they're pretty great. So there might be other people who are like, Holden, no, you're wrong, we need to rethink everything. But I think this is, we can go very far with the pipeline concept. >> And then that's good, right? The democratization of these things opens it up to more people, you get more creative people solving more different problems, that makes the whole thing go. >> You can like install Spark easily, you can, you know, set up an ML pipeline, you can train your model, you can start doing predictions, you can, people that haven't been able to do machine learning at scale can get started super easily, and build a recommendation system for their small little online shop and be like, hey, you bought this, you might also want to buy Boosh, he's really cute, but you can't have this one. No no no, not this one. >> Such a tease! >> Holden: I'm sorry, I'm sorry. >> Well Holden, that will, we'll say goodbye for now, I'm sure we will see you in June in San Francisco at the Spark Summit, and look forward to the update. >> Holden: I look forward to chatting with you then. >> Absolutely, and break a leg this afternoon at your presentation. >> Holden: Thank you. >> She's Holden Karau, I'm Jeff Frick, he's George Gilbert, you're watching The Cube, we're at Big Data SV, thanks for watching. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

Announcer: Big Data We're excited to be joined to be back yet again. so what are you excited about these days? but I tend to not talk about, like having to process and really push them out to the edge and carve out all the other overhead We really need the cluster to train them, model out to the edge. and the copy part will be left up to and then on the serving side you would use you know, for a bigger audience. and it makes it really easy to do this that you want to tune and you don't know, that lets you predict. but the same concepts, to big data. and generally it would run the good concepts from SciPy the models to the edge So just to be clear then, from the edge to your cluster, machine to cluster, like on the edge node, that's fine, R and Python to get started. and the other people working on R are too. but that's the kind of stuff not really scale out. to your Sequel server instance and they'll be a bit easier to use, that makes the whole thing go. and be like, hey, you bought this, look forward to the update. to chatting with you then. Absolutely, and break you're watching The Cube,

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Holden Karau	PERSON	0.99+
Holden	PERSON	0.99+
Felix	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Joey	PERSON	0.99+
Jeff	PERSON	0.99+
IBM	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Seth Hendrickson	PERSON	0.99+
Spark	TITLE	0.99+
Python	TITLE	0.99+
last week	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
San Francisco	LOCATION	0.99+
June	DATE	0.99+
six different knobs	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
Boosh	PERSON	0.99+
Pagoda Lounge	LOCATION	0.99+
one knob	QUANTITY	0.99+
both sides	QUANTITY	0.99+
two presentations	QUANTITY	0.99+
this week	DATE	0.98+
today	DATE	0.98+
The Cube	ORGANIZATION	0.98+
Java	TITLE	0.98+
both	QUANTITY	0.97+
one thing	QUANTITY	0.96+
one	QUANTITY	0.96+
Big Data week	EVENT	0.96+
single machine	QUANTITY	0.95+
R	TITLE	0.95+
SciPy	TITLE	0.95+
Big Data	EVENT	0.95+
single machine	QUANTITY	0.95+
each one	QUANTITY	0.94+
JDBC	TITLE	0.93+
Spark ML	TITLE	0.89+
JVM	TITLE	0.89+
The Cube	TITLE	0.88+
single	QUANTITY	0.88+
Sequel	TITLE	0.87+
Big Data Silicon Valley 2017	EVENT	0.86+
Spark Summit	LOCATION	0.86+
one machine	QUANTITY	0.86+
a week	QUANTITY	0.84+
Fairmount	LOCATION	0.83+
liblocal	TITLE	0.83+

Holden Karau, IBM - #BigDataNYC 2016 - #theCUBE

>> Narrator: Live from New York, it's the CUBE from Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, Nvidia. And our ecosystem sponsors. Now, here are your hosts: Dave Vellante and Peter Burris. >> Welcome back to New York City, everybody. This is the CUBE, the worldwide leader in live tech coverage. Holden Karau is here, principle software engineer with IBM. Welcome to the CUBE. >> Thank you for having me. It's nice to be back. >> So, what's with Boo? >> So, Boo is my stuffed dog that I bring-- >> You've got to hold Boo up. >> Okay, yeah. >> Can't see Boo. >> So, this is Boo. Boo comes with me to all of my conferences in case I get stressed out. And she also hangs out normally on the podium while I'm giving the talk as well, just in case people get bored. You know, they can look at Boo. >> So, Boo is not some new open source project. >> No, no, Boo is not an open source project. But Boo is really cute. So, that counts for something. >> All right, so, what's new in your world of spark and machinery? >> So, there's a lot of really exciting things, right. Spark 2.0.0 came out, and that's really exciting because we finally got to get rid of some of the chunkier APIs. And data sets are just becoming sort of the core base of everything going forward in Spark. This is bringing the Spark Sequel engine to all sorts of places, right. So, the machine learning APIs are built on top of the data set API now. The streaming APIs are being built on top of the data set APIs. And this is starting to actually make it a lot easier for people to work together, I think. And that's one of the things that I really enjoy is when we can have people from different sort of profiles or roles work together. And so this support of data sets being everywhere in Spark now lets people with more of like a Sequel background still write stuff that's going to be used directly in sort of a production pipeline. And the engineers can build whatever, you know, production ready stuff they need on top of the Sequel expressions from the analysts and do some really cool stuff there. >> So, chunky API, what does that mean to a layperson? >> Sure, um, it means like, for example, there's this thing in Spark where one of the things you want to do is shuffle a whole bunch of data around and then look at all of the records associated with a given key, right? But, you know, when the APIs were first made, right, it was made by university students. Very smart university students, but you know, it started out as like a grad school project, right? And like, um, so finally with 2.0, we were about to get rid of things like places where we use traits like iterables rather than iterators. And because like these minor little drunky things it's like we had to keep supporting this old API, because you can't break people's code in a minor release, but when you do a big release like Spark 2.0, you can actually go, okay, you need to change your stuff now to start using Spark 2.0. But as a result of changing that in this one place, we're actually able to better support spilling to disk. And this is for people who have too much data to fit in memory even on the individual executors. So, being able to spill to disk more effectively is really important from a performance point of view. So, there's a lot of clean up of getting rid of things, which were sort of holding us back performance-wise. >> So, the value is there. Enough value to break the-- >> Yeah, enough value to break the APIs. And 1.6 will continue to be updated for people that are not ready to migrate right today. But for the people that are looking at it, it's definitely worth it, right? You get a bunch of real cool optimizations. >> One of the themes of this event of the last couple of years has been complexity. You guys wrote an article recently in SiliconANGLE some of the broken promises of open source, really the route of it, being complexity. So, Spark addresses that to a large degree. >> I think so. >> Maybe you could talk about that and explain to us sort of how and what the impact could be for businesses. >> So, I think Spark does a really good job of being really user-friendly, right? It has a Sequel engine for people that aren't comfortable with writing, you know, Scala or Java or Python code. But then on top of that, right, there's a lot of analysts that are really familiar with Python. And Spark actually exposes Python APIs and is working on exposing R APIs. And this is making it so that if you're working on Spark, you don't have to understand the internals in a lot of depth, right? There's some other streaming systems where to make them perform really well, you have to have a really deep mental model of what you're doing. But with Spark, it's much simpler and the APIs are cleaner, and they're exposed in the ways that people are already used to working with their data. And because it's exposed in ways that people are used to working with their data, they don't have to relearn large amounts of complexity. They just have to learn it in the few cases where they run into problems, right? Because it will work most of the time just with the sort of techniques that they're used to doing. So, I think that it's really cool. Especially structured streaming, which is new in Spark 2.0. And structured streaming makes it so that you can write sort of arbitrary Sequel expressions on streaming data, which is really awesome. Like, you can do aggregations without having to sit around and think about how to effectively do an aggregation over different microbatches. That's not a problem for you to worry about. That's a problem for the Spark developers to worry about. Which, unfortunately, is sometimes a problem for me to worry about, but you know, not too often. Boo helps out whenever it gets too stressful. >> First of all, a lot to learn. But there's been some great research done in places like Cornell and Penn and others about how the open source community collaborates and works together. And I'm wondering is the open source community that's building things like Spark, especially in a domain like Big Data, which the use cases themselves are so complex and so important. Are we starting to take some of the knowledge in the contributors, or developing, on how to collaborate and how to work together. And starting to find that way into the tools so that the whole thing starts to collaborate better? >> Yeah, I think, actually, if you look at Spark, you can see that there's a lot of sort of tools that are being built on top of Spark, which are also being built in similar models. I mean, the Apache Software Foundation is a really good tool for managing projects of a certain scale. You can see a lot of Spark-related projects that have also decided that become part of Apache Foundation is a good way to manage their governance and collaborate with different people. But then there's people that look at Spark and go like wow, there's a lot of overhead here. I don't think I'm going to have 500 people working on this project. I'm going to go and model my project after something a bit simpler, right? And I think that both of those are really valid ways of building open source tools on Spark. But it's really interesting seeing there's a Spark components page, essentially, a Spark packages list, for community to publish the work that they're doing on top of Spark. And it's really interesting to see all of the collaborations that are happening there. Especially even between vendors sometimes. You'll see people make tools, which help everyone's data access go faster. And it's open source. so you'll see it start to get contributed into other people's data access layers as well. >> So, pedagogy of how the open source community's work starting to find a way into the tools, so people who aren't in the community, but are focused on the outcomes are now able to not only gain the experience about how the big data works, but also how people on complex outcomes need to work. >> I think that's definitely happening. And you can see that a lot with, like, the collaboration layers that different people are building on top of Spark, like the different notebook solutions, are all very focused on ableing collaboration, right? Because if you're an analyst and you're writing some python code on your local machine, you're not going to, like, probably set up a get up recode to share that with everyone, right? But if you have a notebook and you can just send the link to your friends and be like hey, what's up, can you take a look at this? You can share your results more easily and you can also work together a lot more, more collaboratively. And then so data bricks is doing some great things. IBM as well. I'm sure there's other companies building great notebook solutions who I'm forgetting. But the notebooks, I think, are really empowering people to collaborate in ways that we haven't traditionally seen in the big data space before. >> So, collaboration, to stay on that theme. So, we had eight data scientists on a panel the other night and just talking about, collaboration came up, and the question is specifically from an application developer standpoint. As data becomes, you know, the new development kit, how much of a data scientist do you have to become or are you becoming as a developer? >> Right, so, my role is very different, right? Because I focus just on tools, mostly. So, my data science is mostly to make sure that what I'm doing is actually useful to other people. Because a lot of the people that consume my stuff are data scientists. So, for me, personally, like the answer is not a whole lot. But for a lot of my friends that are working in more traditional sort of data engineering roles where they're empowering specific use cases, they find themselves either working really closely with data scientists often to be like, okay, what are your requirements? What data do I need to be able to get to you so you can do your job? And, you know, sometimes if they find themselves blocking on the data scientists, they're like, how hard could it be? And it turns out, you know, statistics is actually pretty complicated. But sometimes, you know, they go ahead and pick up some of the tools on their own. And we get to see really cool things with really, really ugly graphs. 'Cause they do not know how to use graphing libraries. But, you know, it's really exciting. >> Machine learning is another big theme in this conference. Maybe you could share with us your perspectives on ML and what's happening there. >> So, I really thing machine learning is very powerful. And I think machine learning in Spark is also super powerful. And especially just like the traditional things is you down-sample your data. And you train a bunch of your models. And then, eventually, you're like okay, I think this is like the model that I want to like build for real. And then you go and you get your engineer to help you train it on your giant data set. But Spark and the notebooks that are built on top of it actually mean that it's entirely reasonable for data scientists to take the tools which are traditionally used by the data engineering roles, and just start directly applying them during their exploration phase. And so we're seeing a lot of really more interesting models come to life, right? Because if you're always working with down-sampled data, it's okay, right? Like you can do reasonable exploration on down-sampled data. But you can find some really cool sort of features that you wouldn't normally find once you're working with your full data set, right? 'Cause you're just not going to have that show up in your down-sampled data. And I think also streaming machine learning is a really interesting thing, right? Because we see there's a lot of IOT devices and stuff like that. And like the traditional machine learning thing is I'm going to build a model and then I'm going to deploy it. And then like a week later, I'll maybe consider building a new model. And then I'll deploy it. And then so very much it looks like the old software release processes as opposed to the more agile software release processes. And I think that streaming machine learning can look a lot more like, sort of the agile software development processes where it's like cool, I've got a bunch of labeled data from our contractors. I'm going to integrate that right away. And if I don't see any regression on my cross-validation set, we're just going to go ahead and deploy that today. And I think it's really exciting. I'm obviously a little biased, because some of my work right now is on enabling machine learning with structured streaming in Spark. So, I obviously think my work is useful. Otherwise I would be doing something else. But it's entirely possible. You know, everyone will be like Holden, your work is terrible. But I hope not. I hope people find it useful. >> Talking about sampling. In our first at Dupe World 2010, Albi Meta, he stopped by again today, of course, and he made the statement then. Sampling's dead. It's dead. Is sampling dead? >> Sampling didn't quite die. I think we're getting really close to killing sampling. Sampling will only be data once all of the data scientists in the organization have access to the same tools that the data engineers have been using, right? 'Cause otherwise you'll still be sampling. You'll still be implicitly doing your model selection on down-sampled data. And we'll still probably always find an excuse to sample data, because I'm lazy and sometimes I just want to develop on my laptop. But, you know, I think we're getting close to killing a lot more of sampling. >> Do you see an opportunity to start utilizing many of these tools to actually improve the process of building models, finding data sources, identifying individuals that need access to the data? Are we going to start turning big data on the problem of big data? >> No, that's really exciting. And so, okay, so this is something that I find really enjoyable. So, one of the things that traditionally, when everyone's doing their development on their laptop, right? You don't get to collect a lot of metrics about what they're doing, right? But once you start moving everyone into a sort of more integrated notebook environment, you can be like, okay, like, these are data sets that these different people are accessing. Like these are the things that I know about them. And you can actually train a recommendation algorithm on the data sets to recommend other data sets to people. And there are people that are starting to do this. And I think it's really powerful, right? Because it's like in small companies, maybe not super important, right? Because I'll just go an ask my coworker like hey, what data sets do I want to use? But if you're at a company like Google or IBM scale or even like a 500 person company, you're not going to know all of the data sets that are available for you to work with. And the machine will actually be able to make some really interesting recommendations there. >> All right, we have to leave it there. We're out of time. Holden, thanks very much. >> Thank you so much for having me and having Boo. >> Pleasure. All right, any time. Keep right there everybody. We'll be back with our next guest. This is the CUBE. We're live from New York City. We'll be right back.

Published Date : Sep 30 2016

SUMMARY :

Brought to you by headline sponsors, This is the CUBE, the worldwide leader It's nice to be back. normally on the podium So, Boo is not some So, that counts for something. And this is starting to So, being able to spill So, the value is there. But for the people that are looking at it, that to a large degree. about that and explain to us and think about how to And starting to find And it's really interesting to but are focused on the outcomes the link to your friends and the question is specifically be able to get to you Maybe you could share with And then you go and you get your engineer and he made the statement then. that the data engineers on the data sets to recommend All right, we have to leave it there. Thank you so much for This is the CUBE.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Holden Karau	PERSON	0.99+
New York City	LOCATION	0.99+
Java	TITLE	0.99+
Apache Foundation	ORGANIZATION	0.99+
Scala	TITLE	0.99+
New York City	LOCATION	0.99+
Python	TITLE	0.99+
Spark 2.0	TITLE	0.99+
Spark	TITLE	0.99+
500 people	QUANTITY	0.99+
Albi Meta	PERSON	0.99+
a week later	DATE	0.99+
Spark 2.0.0	TITLE	0.99+
500 person	QUANTITY	0.99+
Apache Software Foundation	ORGANIZATION	0.98+
New York	LOCATION	0.98+
today	DATE	0.98+
Holden	PERSON	0.98+
first	QUANTITY	0.98+
both	QUANTITY	0.98+
Cornell	ORGANIZATION	0.97+
Boo	PERSON	0.97+
One	QUANTITY	0.96+
Spark Sequel	TITLE	0.95+
CUBE	ORGANIZATION	0.93+
eight data scientists	QUANTITY	0.93+
python code	TITLE	0.93+
2016	DATE	0.91+
one	QUANTITY	0.91+
First	QUANTITY	0.9+
Penn	ORGANIZATION	0.89+
last couple of years	DATE	0.88+
Big Data	ORGANIZATION	0.86+
one place	QUANTITY	0.85+
2.0	TITLE	0.8+
agile	TITLE	0.79+
one of	QUANTITY	0.75+
things	QUANTITY	0.73+
once	QUANTITY	0.7+
#BigDataNYC	EVENT	0.7+
2010	DATE	0.65+
Dupe	EVENT	0.6+
World	ORGANIZATION	0.56+
Data	TITLE	0.53+
themes	QUANTITY	0.52+
1.6	OTHER	0.5+

Gary MacFadden - BigDataNYC - theCUBE - #BigDataNYC

>> Live from New York City, it's buck you. Here is your host, Jeff Frick. >> Hey, welcome back. I'm Jeff. Rick. We're here at the Cubes. Fifth birthday party. A big date in Icy in Manhattan is part of the big Date. A week. It's got Stratos cough, a dupe world. And, of course, big Aidan. I see. So now having our party, which is always good to have, and I'm joined department X gas. Kerry McFadden from Parodi Research. Carrie. Welcome. Well, thank you very much. So last last we saw he was actually a big data and twenty thirteen, So it's lots changing the year. >> Absolutely, Absolutely. I think the whole hoodoo thing is really taken off. And the thing that interests me the most about show or or the exhibitors at the show is that Bye. You could get a lot of data into Duke, but how do you get it out? How do you make it useful? What do you do with it when you get it out? You know, I said on structure data is structured. Date. Is that a combination? Is it ski Melis? >> All the above all the above, >> right? Exactly. So I think really, that's been on and actually have been Jeff to all the shows, right? Since the beginning, when it was just a new world. Okay, Cube started back. And I think two thousand ten two thousand filling our fifth birthday. Right? So at least at least at least twenty ten. So since then, you've seen, you know, progression off vendors coming in to provide services that actually enable Duke to do more than it does started is kind of a batch oriented type of solution that now, because of these other value added solutions can to really or near real time processing, you can take the data out of it a lot more easily. You can use do basically as a as a repository, right on DH. And a lot of the solutions out there are are evolving to the point where you can, uh, you could basically make a sense of the information, and I think that's a really important rights. Dated information information inside, right? That's where we want to go with this thing. Business decisions made in real time. Which way? Define as in time to do something about it. Right? Right. Yes. Some of the players, I mean, you've got the map. Our guys. You've got the act. Aeon folks that just bought pervasive software. So they've got the Predictive Analytics piece sort of covered. Obviously. That's stone breakers. Old company, you know, a variant of ing gris, right? You've got. Obviously, IBM is a player in this space. With their blue mix and their cloud capabilities and all of their information management pieces, every major vendor is got a piece of is part of the action, if you will. Trying to build something on top of a dupe to make it more useful and make it more valuable. Yeah, the floor was filled with little companies, big companies, and everyone is certainly jumping in. So let me get your prospectus that you've been coming for a lot of years on this thing. Where are we on the journey? How? How? You know, I think we're past the P E O C stage, right? People are getting stuff into production deployments, but it's still early days. You know, the Giants are playing tonight. Go Giants, are we? First inning, third inning, seventh inning. Where are we? I think we're probably in the second or third any second. I think we got a ways to go. And what's the next big hurdle to get us to the next inning. I think one of the problems is this storage issue, right? So you've got this issue of being able to scale out theoretically, exponentially, right? The nice thing about do piss If you need Teo, if you need more space, you just add No J had storage and whatnot, But what happens when you get too much information? You're into the pedal bike, multiple PETA right range now, and most of that data, you know you're not going to access. You may access only two percent of it overtime. I think they're a lot of figures around that. But actually, a wicked bon article that I read recently is very interesting, one called Flake Flake or what they were doing. Flake. I want to make sure he gets a slave by a herd where he said it to me off camera, right? It's a f L a P. It's a combination of flash and tape on DH. Basically, there's a great article on the Wicked Bond site by Wicked Bonds CTO, David's lawyer Okay, and his premises that at some point, relatively soon a cz thie as data grows exponentially into the multiple petabytes ranges and maybe even beyond The thing is gonna get squeezed is the traditional HDD or hardening is spinning disc, right? So tape has become much more, uh, much more resilient. Uh, tape last has a meat time failure of about twenty six or thirty years versus disc, which is about five. And obviously flash is much, much faster, right? Right in some cases don't get into all the nuances of almost feet feet, but flavor going to squeeze out disks and the men think so. And what that'll offer customers is a is a much lower TCO from managing those huge petabytes scale environments and also accessing it at a relatively quick speed. So I think that's that's a piece. It's interesting that the other part that's very interesting to me, Mr Cognitive Computing face. So I was at the no SQL event last week last month in in San Jose, and with that they had a cognitive computing component on DH. I think thie idea of trying to get machines to think more like people building neuro morphing chips to two. It's kind of mimic the way synapses or electricity, electricity in the brain, you know, works how neurons fire and so forth is very interesting. And I think once you Khun Get Dupe is the repository. You've got the data there. But how do you make use of it? And I think that's the challenge. That's going to be, well, paramount the next few years. Exciting days ahead. Well, Gary, thanks for taking a few minutes. We're at the fifth birthday party at the Cube. Were at Big Data and nice jefe. Rick, we're on the ground. Thanks for watching.

Published Date : Oct 22 2014

SUMMARY :

is your host, Jeff Frick. in Manhattan is part of the big Date. You could get a lot of data into Duke, but how do you get it out? of the information, and I think that's a really important rights.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Dave	PERSON	0.99+
John	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Vikas	PERSON	0.99+
Lisa	PERSON	0.99+
Michael	PERSON	0.99+
David	PERSON	0.99+
Katherine Kostereva	PERSON	0.99+
Steve	PERSON	0.99+
Steve Wood	PERSON	0.99+
James	PERSON	0.99+
Paul	PERSON	0.99+
Europe	LOCATION	0.99+
Andy Anglin	PERSON	0.99+
Eric Kurzog	PERSON	0.99+
Kerry McFadden	PERSON	0.99+
Eric	PERSON	0.99+
Ed Walsh	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff Clarke	PERSON	0.99+
Landmark	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
Katherine	PERSON	0.99+
Andy	PERSON	0.99+
Gary	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two hours	QUANTITY	0.99+
Paul Gillin	PERSON	0.99+
Forrester	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Michael Dell	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jeff Frick	PERSON	0.99+
2002	DATE	0.99+
Mandy Dhaliwal	PERSON	0.99+
John Furrier	PERSON	0.99+
2019	DATE	0.99+
five	QUANTITY	0.99+
Starbucks	ORGANIZATION	0.99+
PolyCom	ORGANIZATION	0.99+
US	LOCATION	0.99+
San Jose	LOCATION	0.99+
Boston	LOCATION	0.99+

Steve Wooledge - HP Discover Las Vegas 2014 - theCUBE - #HPDiscover

>>Live from Las Vegas, Nevada. It's a queue at HP. Discover 2014 brought to you by HP. >>Welcome back, everyone live here in Las Vegas for HP. Discover 2014. This is the cube we're out. We go where the action is. We're on the ground here at HP. Discover getting all the signals, sharing them with you, extracting the signal from the noise. I'm John furrier, founder of SiliconANGLE. I joined Steve Woolwich VP of product marketing at map art technologies. Great to see you welcome to the cube. Thank you. I know you got a plane to catch up, but I really wanted to squeeze you in because you guys are a leader in the big data space. You guys are in the top three, the three big whales map are Hortonworks, Cloudera. Um, you know, part of the original big data industry, which, you know, when we did the cube, when we first started the industry, you had like 30, 34 employees, total combined with three, one company Cloudera, and then Matt are announced and then Hortonworks, you guys have been part of that. Holy Trinity of, of early pioneers. Give us the update you guys are doing very, very well. Uh, we talked to you guys at the dupe summit last week. So Jack Norris for the party, give us the update what's going on with the momentum and the traction. And then I want to talk about some of the things with the product. >>Yeah. So we've seen a tremendous uptick in sales at map. Are we tripled revenue? We announced that publicly about a month ago. So we went up 300% in sales, over Q3, I'm sorry, Q1 of 2013. And I think it's really, you know, the maturity of the market. As people move more towards production, they appreciate the enterprise features. We built into the map, our distribution for Hadoop. So, um, you know, the stats I would share is that 80% of our customers triple the size of their cluster within the first 12 months and 50% of them doubled the size of the cluster because there's the, you know, they had that first production success use case and they find other applications and start rolling out more and more. So it's been great for us. >>You know, I always joke with Jack Norris, who's the VP of marketing over there. And John Frodo is the CEO about Matt bars, humbleness. You don't have the fanfare of all the height, depressed love cloud era. Now see they had done some pretty amazing things. They've had a liquidity event, so essentially kind of an IPO, if you will, that huge ex uh, financing from Intel and they're doing great big Salesforce. Hortonworks has got their open source play. You guys got, you got your heads down as well. So talk about that. How many employees you guys have and what's going on with the product? How many, how many new, what, how many products do you guys actually, >>We have, well, we have one product. So we have the map, our distribution for Hadoop, and it's got all the open source packages directly within it, but where we really innovate is in the course. So that's where we, we spent our time early on was really innovating that data platform to give everything within the Hadoop ecosystem, more reliability, better availability, performance, security scale, >>It's open source contributions to the court. And you guys put stuff on top of that, uh, >>And how it works. Yeah. And even some projects we lead the projects like with Apache Mahal and Apache drill, which is coming into beta shortly other projects, we commit and contribute back. But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is at that data platform level. So >>HP is a big data leader officer. They bought, uh, autonomy. They have HP Vertica. You guys are here. Hey, what are you doing here? Obviously we covered the cube, uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity going on other integration opportunities? >>Yeah, a few things. So, um, obviously the HP Vertica news was big. We went into general availability that solution the first week of may. So, um, what we have is the HP Vertica database integrated directly on top of our data platform. So it's this hybrid solution where you have full SQL database directly within your Hadoop distribution. Um, so it had a couple sessions on that. We had, uh, a nice panel discussion with our friends from Cloudera and Hortonworks. So really good discussion with HP about just the ecosystem and how it's evolving. The other things we're doing with HP now is, you know, we've got reference architectures on their hardware lines. So, um, you know, people can deploy Mapbox on the hardware of HP, but then also we're talking with the, um, the autonomy group about enterprise search and looking at a similar type of integration where you could have the search integrated directly into your Hadoop distro. And we've got some joint accounts we're piloting that she goes, now, >>You guys are integrating with HP pretty significantly that deals is working well. Absolutely. What's the coolest thing that you've seen with an HP that you can share. How so I asked you in the big data landscape, everyone's Bucher, you know, hunkering down, working on their feature, but outside in the real world, big data, it's not on the top of mind of the CIO, 24 7. It's probably an item that they're dressing. What have you seen and what have you been most impressed with at HP here? >>Yeah. Say, you know, this is my first HP event like this. I think the strategy they have is really good. I think in certain areas like the cloud in particular with the helium, I think they made a lot of early investments there and place some bets. And I think that's going to pay off well for them. And that marries pretty nicely with our strategy as well in terms of, you know, we have on-premise deployments, but we're also an OEM if you will, within Amazon web services. So we have a lot of agility in the cloud if you will. And I think as those products and the partnerships with HP, evolvable, we'll be playing a lot more with them in the cloud as well. >>I see that asks you a question. I want you to share with the folks out there in your own words, what is it about map bar that they may or may not understand or might not know about? Um, a little humble brag out there and share some, share some, uh, insight of, into, into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you guys do shit share with them, with what, what map map is all about. >>Yeah. I mean, for me, I was in this space with Aster data and kind of the whole Hadoop and MapReduce area since 2008 and pretty familiar with everybody in the space. I really looked at Matt bars, the best technology hands down, you look at the Forrester wave and they rank us as having the best technology today, as well as product roadmap. I think the misperception is people think, oh, it's proprietary and close. It's actually the opposite of that. We have an unbiased open-source approach where we'll ship in support in our distribution, in the entire Apache spark stack. We're not selective over which projects within Apache spark. We support. Um, I feel like SQL on Hadoop. We support Impala as well as hive and other SQL on to do technologies, including the ability to integrate HP Vertica directly in the system. And it's because of the openness of our platform. I'd say it's actually more open because of the standards we've integrated into the data platform to support a lot of third-party tools directly within it. So there is no locked in the storage formats are all the same. The code that runs on top of the distribution from the projects is exactly the same. So you can build a project in hive or some other system, and you can port it between any of the distributions. So there isn't a, lock-in >>The end of the day, what the customers want is they want ease of integration. They want reliability. That's right. And so what are you guys working on next? What's the big, uh, product marketing roadmap that you can share with us? >>Yeah, I think for us, because of the innovations we did in the data platform allows us to support not only more applications, but more types of operational systems. So integrating things like fraud detection and recommendation engines directly with the analytical systems to really speed up that, um, accuracy and, and, uh, in targeting and detecting risk and things like that. So I think now over time, you know, Hadoop has sort of been this batch analytic type of platform, but the ability to converge operations and analytics in one system is really going to be enabled by technology like Matt BARR. >>How many employees do you guys have now? Uh, >>I'm not sure what our CFO would. Let me say that before. You can say we're over 200 at this point >>As well. And over five, the customers which got the data, you guys do summit graduations, we covered your relationship with HP during our big data SV. That was exciting. Good to see John Schroeder, big, very impressive team. I'm impressed with map. I will always have been. You guys have Stephanie kept your knitting saved. Are you going to do, and again, leading the big data space, um, and again, not proprietary is a very key word and that's really cool. So thanks for coming on. Like you really appreciate Steve. We'll be right back. This is the cube live in Las Vegas, extracting the city from the noise with map bar here at the HP discover 2014. We'll be right back here for the short break.

Published Date : Jun 12 2014

SUMMARY :

Discover 2014 brought to you by HP. Uh, we talked to you guys at the dupe summit last week. So, um, you know, the stats You guys got, you got your heads down as well. and it's got all the open source packages directly within it, but where we really innovate is in the course. And you guys put stuff on top of that, But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity So it's this hybrid solution where you have full SQL How so I asked you in the big data landscape, everyone's Bucher, So we have a lot of agility in the cloud if you will. into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you So you can build a project in hive or some What's the big, uh, product marketing roadmap that you can So I think now over time, you know, Hadoop has sort of been this batch analytic Let me say that before. And over five, the customers which got the data, you guys do summit graduations,

ENTITIES

Entity	Category	Confidence
John Schroeder	PERSON	0.99+
Steve Woolwich	PERSON	0.99+
Steve	PERSON	0.99+
Jack Norris	PERSON	0.99+
HP	ORGANIZATION	0.99+
John Frodo	PERSON	0.99+
three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Steve Wooledge	PERSON	0.99+
50%	QUANTITY	0.99+
John furrier	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Matt BARR	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Stephanie	PERSON	0.99+
30	QUANTITY	0.99+
300%	QUANTITY	0.99+
first	QUANTITY	0.99+
last week	DATE	0.99+
Aster	ORGANIZATION	0.99+
2008	DATE	0.98+
Q1	DATE	0.98+
Las Vegas, Nevada	LOCATION	0.98+
one product	QUANTITY	0.98+
34 employees	QUANTITY	0.98+
one system	QUANTITY	0.98+
evolvable	ORGANIZATION	0.98+
over five	QUANTITY	0.97+
SQL	TITLE	0.97+
three big whales	QUANTITY	0.97+
MapReduce	ORGANIZATION	0.96+
SiliconANGLE	ORGANIZATION	0.96+
first 12 months	QUANTITY	0.95+
Apache Mahal	ORGANIZATION	0.95+
map map	ORGANIZATION	0.95+
over 200	QUANTITY	0.95+
24	OTHER	0.94+
today	DATE	0.94+
Intel	ORGANIZATION	0.92+
Matt	PERSON	0.92+
Salesforce	ORGANIZATION	0.91+
2014	DATE	0.9+
Impala	TITLE	0.9+
Hadoop	ORGANIZATION	0.89+
HP Vertica	ORGANIZATION	0.89+
map bar	ORGANIZATION	0.89+
Hadoop	TITLE	0.86+
one company	QUANTITY	0.85+
dupe summit	EVENT	0.84+
about a month ago	DATE	0.83+
Bucher	PERSON	0.81+
Discover 2014	EVENT	0.78+
first week of may	DATE	0.77+
Apache drill	ORGANIZATION	0.74+
#HPDiscover	ORGANIZATION	0.73+
Mapbox	TITLE	0.73+
2013	DATE	0.72+
SQL on	TITLE	0.7+
art technologies	ORGANIZATION	0.63+
Apache	ORGANIZATION	0.61+

Brett Rudenstein - Hadoop Summit 2014 - theCUBE - #HadoopSummit

the cube and hadoop summit 2014 is brought to you by anchor sponsor Hortonworks we do have do and headline sponsor when disco we make hadoop invincible okay welcome back and when we're here at the dupe summit live is looking valance the cube our flagship program we go out to the events expect a signal from noise i'm john per year but Jeff Rick drilling down on the topics we're here with wind disco welcome welcome Brett room Stein about senior director tell us what's going on for you guys I'll see you at big presence here so all the guys last night you guys have a great great booth so causing and the crew what's happening yeah I mean the show is going is going very well what's really interesting is we have a lot of very very technical individuals approaching us they're asking us you know some of the tougher more technical in-depth questions about how our consensus algorithm is able to do all this distributor replication which is really great because there's a little bit of disbelief and then of course we get to do the demonstration for them and then suspend disbelief if you will and and I think the the attendance has been great for our brief and okay I always get that you always we always have the geek conversations you guys are a very technical company Jeff and I always comment certainly de volada and Jeff Kelly that you know when disco doesn't has has their share pair of geeks and that dudes who know they're talking about so I'm sure you get that but now them in the business side you talk to customers I want to get into more the outcome that seems to be the show focused this year is a dupe of serious what are some of the outcomes then your customers are talking about when they get you guys in there what are their business issues what are they tore what are they working on to solve yeah I mean I think the first thing is to look at you know why they're looking at us and then and then with the particular business issues that we solve and the first thing and sort of the trend that we're starting to see is the prospects and the customers that we have are looking at us because of the data that they have and its data that matters so it's important data and that's when people start to come to is that's when they look to us as they have data that's very important to them in some cases if you saw some of the UCI stuff you see that the data is you know doing live monitoring of various you know patient activity where it's not just about about about a life and monitoring a life but potentially about saving the life and systems that go down not only can't save lives but they can potentially lose them so you have a demos you want to jump into this demo here what is this all about you know the demo that the demonstration that I'm going to do for you today is I want to show you our non-stop a new product i'm going to show you how we can basically stand up a single HDFS or a single Hadoop cluster across multiple data centers and I think that's one of the tough things that people are really having trouble getting their heads wrapped around because most people when they do multi data center Hadoop they tend to do two different clusters and then synchronize the data between the two of them the way they do that is they'll use you know flume or they'll use some form of parallel ingest they'll use technologies like dis CP to copy data between the data centers and each one of those has sort of an administrative burden on them and then some various flaws in their and their underlying architecture that don't allow them to do a really really detailed job as ensuring that all blocks are replicated properly that no mistakes are ever made and again there's the administrative burden you know somebody who always has to have eyes in the system we alleviate all those things so I think the first thing I want to start off with we had somebody come to our booth and we were talking about this consensus algorithm that we that we perform and the way we synchronize multiple name nodes across multiple geographies and and again and that sort of spirit of disbelief I said you know one of the key tenants of our application is it doesn't underlie it doesn't change the behavior of the application when you go from land scope to win scope and so I said for example if you create a file in one data center and 3,000 miles apart or 7,000 miles apart from that you were to hit the same create file operation you would expect that the right thing happens what somebody gets the file created and somebody gets file already exists even if at 7,000 miles distance they both hit this button at the exact same time I'm going to do a very quick demonstration of that for you here I'm going to put a file into HDFS the my top right-hand window is in Northern Virginia and then 3,000 miles distance from that my bottom right-hand window is in Oregon I'm going to put the etsy hosts file into a temp directory in Hadoop at the exact same time 3,000 miles distance apart and you'll see that exact behavior so I've just launched them both and again if you look at the top window the file is created if you look at the bottom window it says file already exists it's exactly what you'd expect a land scope up a landscape application and the way you'd expect it to behave so that is how we are ensure consistency and that was the question that the prospect has at that distance even the speed of light takes a little time right so what are some of the tips and tricks you can share this that enable you guys to do this well one of the things that we're doing is where our consensus algorithm is a majority quorum based algorithm it's based off of a well-known consensus algorithm called paxos we have a number of significant enhancements innovations beyond that dynamic memberships you know automatic scale and things of that nature but in this particular case every transaction that goes into our system gets a global sequence number and what we're able to do is ensure that those sequence numbers are executed in the correct order so you can't create you know you can't put a delete before a create you know everything has to happen in the order that it actually happened occurred in regardless of the UN distance between data centers so what is the biggest aha moment you get from customer you show them the demo is it is that the replication is availability what is the big big feature focus that they jump on yeah I think I think the biggest ones are basically when we start crashing nodes well we're running jobs we separate the the link between the win and maybe maybe I'll just do that for you now so let's maybe kick into the demonstration here what I have here is a single HDFS cluster it is spanning two geographic territory so it's one cluster in Northern Virginia part of it and the other part is in Oregon I'm going to drill down into the graphing application here and inside you see all of the name notes so you see I have three name nodes running in Virginia three name nodes running in Oregon and the demonstration is as follows I'm going to I'm going to run Terrigen and Terra sort so in other words i'm going to create some data in the cluster I'm then going to go to sort it into a total order and then I'm going to run Tara validate in the alternate data center and prove that all the blocks replicated from one side to the other however along the way I'm going to create some failures I am going to kill some of that active name nodes during this replication process i am going to shut down the when link between the two data centers during the replication paris's and then show you how we heal from from those kinds of conditions because our algorithm treats failure is a first class citizen so there's really no way to deal in the system if you will so let's start unplug John I'm active the local fails so let's go ahead and run the Terrigen in the terrorists or I'm going to put it in the directory called cube one so we're creating about 400 megabytes of data so a fairly small set that we're going to replicate between the two data centers now the first thing that you see over here on the right-hand side is that all of these name nodes kind of sprung to life that is because in an active active configuration with multiple name nodes clients actually load balance their requests across all of them also it's a synchronous namespace so any change that I make to one immediately Curzon immediately occurs on all of them the next thing you might notice in the graphing application is these blue lines over and only in the Oregon data center the blue lines essentially represent what we call a foreign block a block that is not yet made its way across the wide area network from the site of ingest now we move these blocks asynchronously from the site of in jeff's oh that I have land speed performance in fact you can see I just finished the Terrigen part of the application all at the same time pushing data across the wide area network as fast as possible now as we start to get into the next phase of the application here which is going to run terrace sort i'm going to start creating some failures in the environment so the first thing I'm going to do is want to pick two named nodes I'm going to fail a local named node and then we're also going to fail a remote name node so let's pick one of these i'm going to pick HD p 2 is the name of the machine so want to do ssh hd2 and i'm just going to reboot that machine so as I hit the reboot button the next time the graphing application updates what you'll notice here in the monitor is that a flat line so it's no longer taking any data in but if you're watching the application on the right hand side there's no interruption of the service the application is going to continue to run and you'd expect that to happen maybe in land scope cluster but remember this is a single cluster a twin scope with 3,000 miles between the two of them so I've killed one of the six active named nodes the next thing I'm going to do is kill one of the name nodes over in the Oregon data center so I'm going to go ahead and ssh into i don't know let's pick the let's pick the bottom one HTTP nine in this case and then again another reboot operation so I've just rebooted two of the six name nose while running the job but if again if you look in the upper right-hand corner the job running in Oregon kajabi running in North Virginia continues without any interruption and see we just went from 84 to eighty eight percent MapReduce and so forth so again uninterruptedly like to call continuous availability at when distances you are playing that what does continuous availability and wins because that's really important drill down on yeah I mean I think if you look at the difference between what people traditionally call high availability that means that generally speaking the system is there there is a very short time that the system will be unavailable and then it will then we come available again a continuously available system ensures that regardless of the failures that happen around it the system is always up and running something is able to take the request and in a leaderless system like ours where no one single node actually it actually creates a leadership role we're able to continue replication we're and we're also able to continue the coordinator that's two distinct is high availability which everyone kind of know was in loves expensive and then continues availability which is a little bit kind of a the Sun or cousin I guess you know saying can you put in context and cost implementation you know from a from a from a from a perspective of a when disco deployment it's kind of a continuously available system even though people look at us as somewhat traditional disaster recovery because we are replicating data to another data center but remember it's active active that means both data centers are able to write at the same time you have you get to maximize your cluster resources and again if we go back to one of the first questions you asked what are what a customer's doing this with this what a prospects want to do they want to maximize their resource investment if they have half a million dollars sitting in another data center that only is able to perform an emergency recovery situation that means they either have to a scale the primary data center or be what they want to do is utilize existing resource in an active active configuration which is why i say continuous availability they're able to do that in both data centers maximizing all their resource so you versus the consequences of not having that would be the consequences of not being able to do that is you have a one-way synchronization a disaster occurs you then have to bring that data center online you have to make sure that all the appropriate resources are there you have to you have an administrative burden that means a lot of people have to go into action very quickly with the win disco systems right what that would look like I mean with time effort cost and you have any kind of order of magnitude spec like a gay week called some guy upside dude get in the office login you have to look at individual customer service level agreements a number that i hear thrown out very very often is about 16 hours we can be back online within 16 hours really RTO 44 when disco deployment is essentially zero because both sites are active you're able to essentially continue without without any doubt some would say some would say that's contingent availability is high available because essentially zero 16 that's 16 hours I mean any any time down bad but 16 hours is huge yeah that's the service of level agreement then everyone says but we know we can do it in five hours the other of course the other part of that is of course ensuring that once a year somebody runs through the emergency configure / it you know procedure to know that they truly can be back up in line in the service level agreement timeframe so again there's a tremendous amount of effort that goes into the ongoing administrating some great comments here on our crowd chatter out chat dot net / hadoop summit joined the conversation i'll see ya we have one says nice he's talking about how the system has latency a demo is pretty cool the map was excellent excellent visual dave vellante just weighed in and said he did a survey with Jeff Kelly said large portion twenty-seven percent of respondents said lack of enterprises great availability was the biggest barriers to adoption is this what you're referring to yeah this is this is exactly what we're seeing you know people are not able to meet the uptime requirements and therefore applications stay in proof-of-concept mode or those that make it out of proof of concept are heavily burdened by administrators and a large team to ensure that same level of uptime that can be handled without error through software configuration like Linda scope so another comment from Burt thanks Burt for watching there's availability how about security yeah so security is a good one of course we are you know we run on standard dupe distributions and as such you know if you want to run your cluster with on wire encryption that's okay if you want to run your cluster with kerberos authentication that's fine we we fully support those environments got a new use case for crowd chapel in the questions got more more coming in so send them in we're watching the crowd chat slep net / hadoop summit great questions and a lot of people aren't i think people have a hard time partial eh eh versus continues availability because you can get confused between the two is it semantics or is it infrastructure concerns what is what is the how do you differentiate between those two definitions me not I think you know part of it is semantics but but but also from a win disco perspective we like to differentiate because there really isn't that that moment of downtime there is there really isn't that switch over moment where something has to fail over and then go somewhere else that's why I use that word continuous availability the system is able to simply continue operating by clients load balancing their requests to available nodes in a similar fashion when you have multiple data centers as I do here I'm able to continue operations simply by running the jobs in the alternate data center remember that it's active active so any data ingest on one side immediately transfers to the other so maybe let me do the the next part I showed you one failure scenario you've seen all the nodes have actually come back online and self healed the next part of this I want to do an separation I want to run it again so let me kick up kick that off when I would create another directory structure here only this time I'm going to actually chop the the network link between the two data centers and then after I do that I'm going to show you some some of our new products in the works give you a demonstration of that as well well that's far enough Britain what are some of the applications that that this enables people to use the do for that they were afraid to before well I think it allows you know when we look at our you know our customer base and our prospects who are evaluating our technologies it opens up all the all the regulated industries you know things like pharmaceutical companies financial services companies healthcare companies all these people who have strict regulations auditing requirements and now have a very clear concise way to not only prove that they're replicating data that data has actually made its way it can prove that it's in both locations that it's not just in both locations that it's the correct data sometimes we see in the cases of like dis CP copying files between data centers where the file isn't actually copied because it thinks it's the same but there is a slight difference between the two when the cluster diverges like that it's days of administration hour depending on the size of the cluster to actually to put the cluster you know to figure out what went wrong what went different and then of course you have to involve multiple users to figure out which one of the two files that you have is the correct one to keep so let me go ahead and stop the van link here of course with LuAnn disco technology there's nothing to keep track of you simply allow the system to do HDFS replication because it is essentially native HDFS so I've stopped the tunnel between the two datacenters while running this job one of the things that you're going to see on the left-hand size it looks like all the notes no longer respond of course that's just I have no visibility to those nodes there's no longer replicating any data because the the tunnel between the two has been shut down but if you look on the right hand side of the application the upper right-hand window of course you see that the MapReduce job is still running it's unaffected and what's interesting is once I start replicating the data again or once i should say once i start the tunnel up again between the two data centers i'll immediately start replicating data this is at the block level so again when we look at other copy technologies they are doing things of the file level so if you had a large file and it was 10 gigabytes in size and for some reason you know your your file crash but in that in that time you and you were seventy percent through your starting that whole transfer again because we're doing block replication if you had seventy percent of your box that had already gone through like perhaps what I've done here when i start the tunnel backup which i'm going to do now what's going to happen of course is we just continue from those blocks that simply haven't made their way across the net so i've started the tunnel back up the monitor you'll see springs back to life all the name nodes will have to resync that they've been out of sync for some period of time they'll learn any transactions that they missed they'll be they'll heal themselves into the cluster and we immediately start replicating blocks and then to kind of show you the bi-directional nature of this I'm going to run Tara validate in the opposite data center over in Oregon and I'll just do it on that first directory that we created and in what you'll see is that we now wind up with foreign blocks in both sides I'm running applications at the same time across datacenters fully active active configuration in a single Hadoop cluster okay so the question is on that one what is the net net summarized that demo reel quick bottom line in two sentences is that important bottom line is if name notes fail if the wind fails you are still continuously operational okay so we have questions from the commentary here from the crowd chat does this eliminate the need for backup and what is actually transferring certainly not petabytes of data ? I mean you somewhat have to transfer what what's important so if it's important for you to I suppose if it was important for you to transfer a petabyte of data then you would need the bandwidth that support I transfer of a petabyte of data but we are to a lot of Hollywood studios we were at OpenStack summit that was a big concern a lot of people are moving to the cloud for you know for workflow and for optimization Star Wars guys were telling us off the record that no the new film is in remote locations they set up data centers basically in the desert and they got actually provisioned infrastructure so huge issues yeah absolutely so what we're replicating of course is HDFS in this particular case I'm replicating all the data in this fairly small cluster between the two sites or in this case this demo is only between two sites I could add a third site and then a failure between any two would actually still allow complete you know complete availability of all the other sites that still participate in the algorithm Brent great to have you on I want to get the perspective from you in the trenches out in customers what's going on and win disco tell us what the culture there what's going on the company what's it like to work there what's the guys like I mean we we know some of the dudes there cause we always drink some vodka with him because you know likes to tip back a little bit once in a while but like great guy great geeks but like what's what's it like it when disco I think the first you know you touched on a little piece of it at first is there are a lot of smart people at windows go in fact I know when I first came on board I was like wow I'm probably the most unsmoked person at this company but culturally this is a great group of guys they like to work very hard but equally they like to play very hard and as you said you know I've been out with cause several times myself these are all great guys to be out with the culture is great it's a it's a great place to work and you know so you know people who are who are interested should certainly yeah great culture and it fits in we were talking last night very social crowd here you know something with a Hortonworks guide so javi medicate fortress ada just saw him walk up ibm's here people are really sociable this event is really has a camaraderie feel to it but yet it's serious business and you didn't the days they're all a bunch of geeks building in industry and now it's got everyone's attention Cisco's here in Intel's here IBM's here I mean what's your take on the big guys coming in I mean I think the big guys realize that that Hadoop is is is the elephant is as large as it appears elephant is in the room and exciting and it's and everybody wants a little piece of it as well they should want a piece of it Brett thanks for coming on the cube really appreciate when discs are you guys a great great company we love to have them your support thanks for supporting the cube we appreciate it we right back after this short break with our next guest thank you

Published Date : Jun 4 2014

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
two sites	QUANTITY	0.99+
Jeff Kelly	PERSON	0.99+
seventy percent	QUANTITY	0.99+
Oregon	LOCATION	0.99+
two sites	QUANTITY	0.99+
Jeff Kelly	PERSON	0.99+
3,000 miles	QUANTITY	0.99+
Virginia	LOCATION	0.99+
Jeff Rick	PERSON	0.99+
Burt	PERSON	0.99+
84	QUANTITY	0.99+
Northern Virginia	LOCATION	0.99+
North Virginia	LOCATION	0.99+
two	QUANTITY	0.99+
five hours	QUANTITY	0.99+
3,000 miles	QUANTITY	0.99+
7,000 miles	QUANTITY	0.99+
two data centers	QUANTITY	0.99+
Brett	PERSON	0.99+
Star Wars	TITLE	0.99+
10 gigabytes	QUANTITY	0.99+
half a million dollars	QUANTITY	0.99+
16 hours	QUANTITY	0.99+
Brett Rudenstein	PERSON	0.99+
Jeff	PERSON	0.99+
both locations	QUANTITY	0.99+
two sentences	QUANTITY	0.99+
two files	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
two datacenters	QUANTITY	0.99+
two data centers	QUANTITY	0.99+
one	QUANTITY	0.99+
two different clusters	QUANTITY	0.99+
both sides	QUANTITY	0.99+
both sites	QUANTITY	0.99+
first directory	QUANTITY	0.98+
third site	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first	QUANTITY	0.98+
Cisco	ORGANIZATION	0.98+
twenty-seven percent	QUANTITY	0.98+
John	PERSON	0.98+
first thing	QUANTITY	0.98+
one side	QUANTITY	0.97+
Britain	LOCATION	0.97+
today	DATE	0.97+
two definitions	QUANTITY	0.97+
OpenStack	EVENT	0.96+
Hortonworks	ORGANIZATION	0.96+
eighty eight percent	QUANTITY	0.96+
last night	DATE	0.96+
both data centers	QUANTITY	0.94+
each one	QUANTITY	0.94+
zero	QUANTITY	0.94+
once a year	QUANTITY	0.94+
one failure	QUANTITY	0.93+
the cube and hadoop summit 2014	EVENT	0.93+
two geographic territory	QUANTITY	0.93+
Intel	ORGANIZATION	0.92+
both	QUANTITY	0.92+
single	QUANTITY	0.92+
this year	DATE	0.91+
one data center	QUANTITY	0.91+
dupe summit	EVENT	0.9+
Brett room Stein	PERSON	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Dupe: