Gillian Campbell & Herriot Stobo, HP | Adobe Imagine 2019

>> Announcer: Live from Las Vegas, it's theCUBE covering Magento Imagine 2019, brought to you by Adobe. >> Welcome to the theCUBE, I'm Lisa Martin at The Wynn, in Las Vegas for Magento Imagine 2019. This is a three day event. You can hear a lot of exciting folks networking behind me, talking tech, talking e-commerce innovation and we're pleased to welcome fresh off the keynote stage a couple of guests from HP. We've got Gillian Campbell, the Head of Omni-channel Strategy and Operations. Gillian, thank you for joining us. >> Thank you for asking us. >> Our pleasure and Herriot Stobo, Director of Omni-channel Innovation and Solutions, also from HP. Welcome. >> Thank you very much. >> So Gillian fresh off the keynote stage, enjoyed your presentation this morning. >> Gillian: Thank you. >> Everybody I think in the world knows HP. Those of us consumers going, you know what actually, that reminds me, I need a new printer. >> We can help you. >> Thank you, excellent. Whether I'm shopping online or in a store. So you gave this really interesting keynote this morning talking about what HP is doing, starting at Apache. You really transform this shopping experience. Talk to us a little bit about HP, as I think you've mentioned it as a $50 billion start up and from a digital experience perspective, what you needed to enable. >> Yeah, so as I said, HP have been around for 80 years and in 2015, we became our own entity, HP Inc., and really started looking at how do we enable digital to be pervasive through everything that we do. Our internal processes are reached to customers and identified a great opportunity to really take leading edge and our digital commerce capabilities and we already had some early proof points and APG so we launched a global initiative and we're now on that journey to enable that best in class experience through the digital platforms. >> So Herriot talk to us about, you're based in Singapore. >> Yes. >> What were some of the market dynamics that really made it obvious that this is where we want to start building out this omni-channel strategy starting in Apache? Is it, you know whether, Gillian you mentioned it before. We started retail spaces, some being expensive. Is it more mobile experience and expectations on consumer's part? >> I think we've got a mix of different starting points across Asia. We've got some mega cities like Hong Kong and Singapore rising, Tokyo. And then we've got you know emerging markets across South-East Asia. We don't necessarily have any single market place that controls the entire market as we might see in other regions and so we've had a lot of runway to go and experiment and try new things. We also have an ecosystem of branded retail in Asia. Not in all markets, predominantly in India but also in some markets in South-East Asia that allow us to really blend the experience across both offline and online and to give customers choice at the end of the day. Let them decide how they want to shop and interact with our brand. So we have been running Magento 1 since we first launched our online store businesses in Indonesia and Thailand about six years ago and then we moved into China, replatformed, lexi-platform onto Magento 1 and then that was really the foundation of what we decided to go and build upon to become a global program. so we already had some proof points under our belt with Magento so. >> And what were some of those early wins that really started to make this really obvious that this omni-channel experience, the ability to give customers choice? Whether they want to start the process online, finish it in store, vice verse, or at least have the opportunity to have a choice? What were some of those early wins and business outcomes that you started to see? >> I think even just from because we're all, customers are people. Whether you're a corporate customer, a small business, or a consumer, we're all people and we all know that we shop that way. So essentially the storyline on that back to HP was we have to enable experiences that we would want to experience as well and it was quite a shift for a tech company who were really all about the products to be thinking about, well, how do we really enable that end to end experience? And as Herriot said, the runway was open. We already had some proof points. I was new in the job so I was like all listening to, you know, what the team were telling me. We have a great opportunity here and took that formered as a new concept for the company. We got funding approval and you know the rest is the history and the journey that we're on. So I think it was just taking a different perspective and a different approach and working with a team who already had the, built some of that credibility and others proof points with the earlier deployments and I think we kind of took a risk at the time when we started the engagement with Magento. They weren't in that leadership quadrant and we took a risk to say, let's partner with an energizing company and do something a little bit different and we're still here working towards it so I think that for me was the breakthrough, was just having the tenacity to say, we're gonna drive this path forward. It may not be how we would have done things in the past, but we're a different company now. and we had much more thinner air cover to be able to do that. >> Little bit more agility and flexibility. >> Yeah, absolutely. So you guys, you talked about, Gillian about all the buyers. We are the consumers and we have this expectation, growing expectation that I want to be able to get any and transact anything that I want to buy, whether I'm a procuring person for a company and I'm traveling but I need to approve expenses or I'm a salesperson maybe sitting next to a medium-small business customer. I need to have the option at least to have this store front. What are the things that you guys launched in Apache, leverage be the power of Magento Commerce was click to collect. So tell me a little bit about from maybe an e-commerce cultural perspective, what is it that makes people want to have the ability to start online and actually complete the transaction in a physical location? >> Essentially I was in the Advisory Board yesterday and one of the other customers of Magento said, "Until we can invent a way to touch and feel online, "there's always gonna be a need to have, "outlets where you can go touch and feel." and I think with the click and collect, some of our products are, you know, high-end PCs and gaming devices and printers that is hard to get a good appreciation of what it looks and feels like online. So if you're gonna be spending you know, a significant money you may want to go in and be able to see the colors, feel the finish. You know some of our newer products with the leather portfolios is not something you can truly appreciate without touching it. So I think we have to enable again those customers who do want to experience, feel the weight, you know feel the finish, see the color scheme 'cause its usually important, again not for all customers. Some customers are quite happy to spend thousands of dollars on an online purchase without seeing it and then making sure they have a good facility to be able to, well if they wanted to, to return if they got the normal the product. >> As we look though at like we talked about, this consumerization of everything where we have this expectation and the numbers, I think you even mentioned it maybe in your keynote, Gillian, the numbers of, or somebody did this morning, like upwards of half of all transactions are starting on mobile so we got to start there. What are some of the things that you guys have seen in region in terms of mobile conversions? >> So there's still a massive gap between desktop and mobile conversions, first of all. I mean we're not anywhere near parity between the two. But obviously we're seeing a huge volume of traffic coming in as well and it's shifting that way, so you would expect it to drop as result. I think with Magento what we've seen over the, you know, past few deployments that we've been running and that were over 8% improven. But the desktop conversions are far higher. I mean in terms of improvement and actual conversion so we've still got a long way to go. There and that's a naturative process, that's a journey that probably never ends in terms of ongoing optimization and experimentation. So yeah a lot happening there. I think just on the click and collect topic as well that you were asking about people wanting to start their journey online and then come into bricks and mortar. We're seeing a huge uptake on it just by experimenting, by piloting. Over 26% of our consumer notebooks in India that we've put onto this program were being collected in store and this is in environments which are inherently chaotic on the streets. You don't want to go out there but actually I'm passing that way anyway so it's just easier for me to pick it up on the way home and probably quicker 'cause I can collect in two hours. So it's just giving people customer choice, no additional incentive and it seems to take. So now we're expanding out regionally. >> So you said there's, this morning, Gillian, in your keynote eight markets covered, mostly Apache, but also in Latin America. >> We just started in Latin America, again, the development process is not just as simple as we're switching on. So we've been doing a lot of work for this past six months with Latin America. The team there, they're super excited to get launched. There's some differences there, we've talked about the regional variation around fulfillment models that we have to adapt towards but the intent is to get Latin America deployed, leveraging some of the layer lengths from what we've done in Asia specific and then starting to move around into more the near region and then ultimately back into the US and Canada. >> So as you look forward and of course you've mentioned we're on this journey right, what are some of the key learnings that you're going to apply? You mentioned this morning, something that was very intriguing and that was, respect the integrity of the Magento platform. Talk about that in context of some of the other learnings that you'd recommend for colleagues and similar or other industries to be able to achieve what you have on a global scale. >> I think from the outset, there was this kind of like baggage of deployments of capabilities not just in commerce but deployment of capabilities across HP that we had not respected the integrity of the platform. We had adjusted the code and developed on the code to make it HP specific and with the new HP Inc. company one of the guided principles was no, when we buy the leverage software applications respect it for what it is and adjust business processes and adjust integration rather than adjust the core so that we can get the advantage of the longer term opportunity without creating such like. So it was really just a foundational, you know, let's not go in here with a mindset that we know better than the core. The core is there for a reason and then build around that and ensure the integration and I think you know with Herriot's leadership, we've been able to you know, just keep that firm is why we can be successful and be successful longer term as well. So that all the, and one of the things we talked about yesterday also is the excellent capabilities that are coming with Adobe and the integration that we talked about the recommendation of Adobe Sensei and integrate that with Magento Core. If you don't keep to the respect the integrity, those upgrades and capabilities become really hard to take benefit of so we're really excited about, you know, again, sticking with the core and enabling and growing with the core with Magento and Adobe. >> I would just build on it, I mean I think its never gonna be easy running a global commerce platform. Single instance, multiple countries, you know, 27 markets to get started with. Who knows where we're gonna end. Its always gonna be a challenge so we have to keep it as simple as possible. These upgrades are fast and furious and that's great and we all gets lots of benefit but if we start going down our own path, we've lost it. We've lost the benefit. >> And that's one of the things too that Jason Wolfsteen said this morning was that the word Magento was gonna be enabling businesses to achieve without getting in their way and it kind of sounds Herriot, like you're saying the same thing. That we've gotta be able to respect the technologies that we're building so we don't get in our own way and we keep it simple as we wanna expand globally. Ultimately at the end of the day, you're creating these personalized experiences with consumers and that personalization is so important because it's more and more not only are we transacting or wanting to on mobile but we want our brands like HP to know us. We want you to know our brand value, you know our average order value so that we can become part of the experience but also ideally get rewarded for being loyal. >> Yeah. >> Yeah, I mean, I mean just coming to mobile again but you know, 2.3 delivers the native PWA capabilities which we're super excited to get started with. You know we've got so many used cases for this straight away, right out the box but you know we've got to do it gradually, do it the right way. I think we're also aware that we're not gonna be able to run with PWA in all markets straight away 'cause not all markets are ready for it quite frankly. User behavior- >> Is that a cultural thing? >> It's purely cultural. Maybe technical and just technical ecosystems as well. Places like China in particular, where, you know, customers use app stores but they use app stores from every single phone manufacturer right there. That's where the customer is. We can't just move away from that so we need to keep some of those legacy approaches for a little while and then yeah test in other regions and then take the learnings when we're ready to adopt it. >> Exciting so here we are at, this is the first Magento Imagine since the Adobe acquisition. Gillian, let's wrap things up with you. What are your, you mentioned you were part of the Customer Advisory Board yesterday, just some of your perspectives on this years' event now that Magento is powering the Adobe commerce cloud. >> I actually attended the Adobe Summit a few weeks ago here also in Vegas and started to see the thread of commerce coming into that conference and then seeing the Adobe, the experience, coming into Magento and I just think it's a perfect combination of opportunities especially for a company like HP where we were linked in to connect, you know, marketing and sales and support across the customer journey and the capabilities with Adobe and some of the marketing stack, and then the commerce stack, and there was support bringing that together is a super exciting opportunity for us. You know the partnership that we have with both Adobe and Magento again as one as I really, they were just starting what the next journey was gonna look like. >> We feel that about so many things, we're just starting, but Gillian, Herriot, it's been a pleasure to have you on theCUBE for Magento Imagine 2019. Thank you both for your time. >> Thank you, thank you. >> Our pleasure. I'm Lisa Martin and you're watching theCUBE live from The Wynn Las Vegas at Magento Imagine 2019. Thanks for watching. (light music)

Published Date : May 14 2019

SUMMARY :

covering Magento Imagine 2019, brought to you by Adobe. and we're pleased to welcome fresh off the keynote stage Director of Omni-channel Innovation and Solutions, So Gillian fresh off the keynote stage, Those of us consumers going, you know what actually, and from a digital experience perspective, and in 2015, we became our own entity, HP Inc., Is it, you know whether, and then we moved into China, and I think we kind of took a risk at the time We are the consumers and we have this expectation, and printers that is hard to get a good appreciation What are some of the things that you guys have seen and it's shifting that way, so you would expect it So you said there's, and then starting to move around into more the near region to be able to achieve what you have on a global scale. and I think you know with Herriot's leadership, and that's great and we all gets lots of benefit and we keep it simple as we wanna expand globally. but you know, 2.3 delivers the native PWA capabilities We can't just move away from that so we need to keep now that Magento is powering the Adobe commerce cloud. and the capabilities with Adobe to have you on theCUBE for Magento Imagine 2019. I'm Lisa Martin and you're watching theCUBE

ENTITIES

Entity	Category	Confidence
Gillian	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Singapore	LOCATION	0.99+
Indonesia	LOCATION	0.99+
Jason Wolfsteen	PERSON	0.99+
Gillian Campbell	PERSON	0.99+
China	LOCATION	0.99+
Asia	LOCATION	0.99+
Herriot Stobo	PERSON	0.99+
India	LOCATION	0.99+
2015	DATE	0.99+
Hong Kong	LOCATION	0.99+
Adobe	ORGANIZATION	0.99+
$50 billion	QUANTITY	0.99+
Vegas	LOCATION	0.99+
US	LOCATION	0.99+
Thailand	LOCATION	0.99+
HP	ORGANIZATION	0.99+
Canada	LOCATION	0.99+
yesterday	DATE	0.99+
Magento	ORGANIZATION	0.99+
Latin America	LOCATION	0.99+
Herriot	PERSON	0.99+
two hours	QUANTITY	0.99+
Tokyo	LOCATION	0.99+
27 markets	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Apache	ORGANIZATION	0.99+
two	QUANTITY	0.99+
both	QUANTITY	0.99+
three day	QUANTITY	0.99+
South-East Asia	LOCATION	0.98+
Magento 1	TITLE	0.98+
first	QUANTITY	0.98+
PWA	ORGANIZATION	0.98+
Adobe Summit	EVENT	0.98+
thousands of dollars	QUANTITY	0.98+
over 8%	QUANTITY	0.98+
Single	QUANTITY	0.97+
one	QUANTITY	0.97+
Over 26%	QUANTITY	0.97+
HP Inc.	ORGANIZATION	0.97+
Omni-channel Innovation	ORGANIZATION	0.97+
Magento	TITLE	0.95+
this morning	DATE	0.94+

Andrew Wheeler and Kirk Bresniker, HP Labs - HPE Discover 2017

>> Announcer: Live from Las Vegas, it's The Cube, covering HPE Discover, 2017 brought to you by Hewlett Packard Enterprise. >> Okay, welcome back everyone. We're here live in Las Vegas for our exclusive three day coverage from The Cube Silicon Angle media's flagship program. We go out to events, talk to the smartest people we can find CEOs, entrepreneurs, R&D lab managers and of course we're here at HPE Discover 2017 our next two guests, Andrew Wheeler, the Fellow, VP, Deputy Director, Hewlett Packard Labs and Kirk Bresniker, Fellow and VP, Chief Architect of HP Labs, was on yesterday. Welcome back, welcome to The Cube. Hewlett Packard Labs well known you guys doing great research, Meg Whitman really staying with a focused message and one of the comments she mentioned at our press analyst meeting yesterday was focusing on the lab. So I want ask you where is that range in the labs? In terms of what you guys, when does something go outside the lines if you will? >> Andrew: Yeah good question. So, if you think about Hewlett Packard Labs and really our charter role within the company we're really kind of tasked for looking at things that will disrupt our current business or looking for kind of those new opportunities. So for us we have something we call an innovation horizon and you know it's like any other portfolio that you have where you've got maybe things that are more kind of near term, maybe you know one to three years out, things that are easily kind of transferred or the timing is right. And then we have kind of another bucket that says well maybe it's more of a three to five year kind of in that advanced development category where it needs a little more incubation but you know it needs a little more time. And then you know we reserve probably you know a smaller pocket that's for more kind of pure research. Things that are further out, higher risk. It's a bigger bet but you know we do want to have kind of a complete portfolio of those, and you know over time throughout our history you know we've got really success stories in all of those. So it's always finding kind of that right blend. But you know there's clearly a focus around the advanced development piece now that we've had a lot of things come from that research point and really one of the... >> John: You're looking for breakthroughs. I mean that's what you're... Some-- >> Andrew: Clearly. >> Internal improvement, simplify IT all that good stuff, you guys still have your eyes on some breakthroughs. >> That's right. Breakthroughs, how do we differentiate what we're doing so but yeah clearly, clearly looking for those breakthrough opportunities. >> John: And one of the things that's come up really big in this show is the security and chip thing was pretty hot, very hot, and actually wiki bonds public, true public cloud report that they put out sizing up on prem the cloud mark. >> Dave: True private cloud. >> True private cloud I'm sorry. And that's not including hybrids of $265 billion tam but the notable thing that I want to get your thoughts on is the point they pushed was over 10 years $150 billion is going to shift out of IT on premise into other differentiated services. >> Andrew: Out of labor. >> Out of labor. So this, and I asked them what that means, as he said that means it's going to shift to vendor R&D meaning the suppliers have to do more work. So that the customers don't have to do the R&D. Which we see a lot in cloud where there's a lot of R&D going on. That's your job. So you guys are HP Labs, what's happening in that R&D area that's going to off load that labor so they can move to some other high yield tasks. >> Sure. Take first. >> John: Go ahead take a stab at it. >> When we've been looking at some of the concepts we had in the memory driven computing research and advanced development programs the machine program, you know one of the things that was the kick off for me back in 2003 we looked at what we had in the unix market, we had advanced virtualization technologies, we had great management of resources technologies, we had memory fabric technologies. But they're all kind of proprietary. But Silicon is thinking and back then we were saying how does risk unix compete with industry standards service? This new methodology, new wave, exciting changing cost structures. And for us it was that it was a chance to explore those ideas and understand how they would affect our maintaining the kind of rich set of customer experiences, mission criticality, security, all of these elements. And it's kind of funny that we're sort of just coming back to the future again and we're saying okay we have this move we want to see these things happen on the cloud and we're seeing those same technologies, the composable infrastructure we have in synergy and looking forward to see the research we've done on the machine advanced development program and how will that intersect hardware composability, converged infrastructure so that you can actually have that shift, those technologies coming in taking on more of that burden to allow you freedom of choice, so you can make sure that you end up with that right mix. The right part on a full public cloud, the right mix on a full private cloud, the right mixing on that intelligent edge. But still having the ability to have all of those great software development methodologies that agile methodology, the only thing the kids know how to do out of school is open source and agile now. So you want to make sure that you can embrace that and make sure regardless of where the right spot is for a particular application in your entire enterprise portfolio that you have this common set of experiences and tools. And some of the research and development we're doing will enable us to drive that into that existing, conventional, enterprise market as well as this intelligent edge. Making a continuum, a continuum from the core to the intelligent edge. And something that modern computer science graduates will find completely comfortable. >> One attracting them is going to be the key, I think the edge is kind of intoxicating if you think about all the possibilities that are out there in terms of what you know just from a business model disruption and also technology. I mean wearables are edge, brain implants in the future will be edge, you know the singularities here as Ray Kersewile would say... >> Yeah. >> I mean but, this is the truth. This is what's happened. This is real right now. >> Oh absolutely. You know we think of all that data and right now we're just scratching the surface. I remember it was 1994 the first time I fired up a web server inside of my development team. So I could begin thinning out design information on prototype products inside of HP, and it was a novelty. People would say "What is that thing "you just sent me an email, W W whatever?" And suddenly we went from, like almost overnight, from a novelty to a business necessity, to then it transformed the way that we created the applications for the... >> John: A lot of people don't know this but since you brought it up this historical trivia, HP Labs, Hewlett Packard Labs had scientists who actually invented the web with Tim Berners-Lee, I think HTML founder was an HP Labs scientist. Pretty notable trivia. A lot of people don't know that so congratulations. >> And so I look at just what you're saying there and we see this new edge thing is it's going to be similarly transformative. Now today it's a little gimmicky perhaps it's sort of scratching the surface. It's taking security and it can be problematic at times but that will transform, because there is so much possibility for economic transformation. Right now almost all that data on the edge is thrown away. If you, the first person who understands okay I'm going to get 1% more of that data and turn it into real time intelligence, real time action... That will unmake industries and it will remake new industries. >> John: Andrew this the applied research vision, you got to apply R&D to the problem... >> Andrew: Correct. >> That's what he's getting at but you got to also think differently. You got to bring in talent. The young guns. How are you guys bringing in the young guns? What's the, what's the honeypot? >> Well I think you know for us it's, the sell for us, obviously is just the tradition of Hewlett Packard to begin with right? You know we have recognition on that level even it's not just Hewlett Packard Labs as well it's you know just R&D in general right? Kind of it you know the DNA being an engineering company so... But it's you know I think it is creating kind of these opportunities and whether it's internship programs you know just the various things that we're doing whether it's enterprise related, high performance computing... I think this edge opportunity is a really interesting one as a bridge because if you think about all the things that we hear about in enterprise in terms of "Oh you know I need this deep analytics "capability," or you know even a lot of the in memories things that we're talking about, real time response, driving information, right? All of that needs to happen at the edge as well for various opportunities so it's got a lot of the young graduates excited. We host you know hundreds of interns every year and it's real exciting to see kind of the ideas they come in with and you know they're all excited to work in this space. >> Dave: So Kirk you have your machine button, three, of course you got the logo. And then the machine... >> I got the labs logo, I got the machine logo. >> So when I first entered you talked about in the early 1980s. When I first got in the business I remembered Gene Emdall. "The best IO is no IO." (laughter) >> Yeah that's right. >> We're here again with this sort of memory semantics, centric computing. So in terms of the three that Andrew laid out the three types of sort of projects you guys pursue... Where does the machine fit? IS it sort of in all three? Or maybe you could talk about that a little bit. >> Kirk: I think it is, so we see those technologies that over the last three years we have brought so much new and it was, the critical thing about this is I think it's also sort of the prototyping of the overall approach our leaning in approach here... >> Andrew: That's right. >> It wasn't just researchers. Right? Those 500 people who made that 160 terabyte monster machine possible weren't just from labs. It was engineering teams from across Hewlett Packard Enterprise. It was our supply chain team. It was our services team telling us how these things fit together for real. Now we've had incredible technology experiences, incredible technologist experiences, and what we're seeing is that we have intercepts on conventional platforms where there's the photonics, the persistent memories. Those will make our existing DCIG and SDCG products better almost immediately. But then we also have now these whole cloth applications and as we take all of our learnings, drive them into open source software, drive them into the genesys consortium and we'll see you know probably 18, 24 months from now some of those first optimized silicon designs pop out of that ecosystem then we'll be right there to assemble those again, into conventional systems as well as more expansive, exo-scale computing, intelligent edge with large persistent memories and application specific processing as that next generation of gateways, I think we can see these intercept points at every category Andrew talked about. >> Andrew: And another good point there that kind of magnifies the model we were talking about, if we were sitting here five years ago, we would talking about things like photonics and non-volatile memory as being those big R projects. Those higher risk, longer term things, that right? As those mature, we make more progress innovation happens, right? It gets pulled into that shorter time frame that becomes advanced development. >> Dave: And Meg has talked about that... >> Yeah. >> Wanting to get more productivity out of the labs. And she's also pointed out you guys have spent more on R&D in the last several years. But even as we talked about the other day you want to see a little more D and keep the R going. So my question is, when you get to that point, of being able to support DCIG... Where do you, is it a hand off? Are you guys intimately involved? When you're making decisions about okay so member stir for example, okay this is great, that's still in the R phase then you bring it in. But now you got to commercialize this and you got 3D nan coming out and okay let's use that, that fits into our framework. So how much do you guys get involved in that handoff? You know the commercialization of this stuff? >> We get very involved. So it's at the point where when we think have something that hey we think you know maybe this could get into a product or let's see if there's good intercept here. We work jointly at that point. It's lab engineers, it's the product managers out of the group, engineers out of the business group, they essentially work collectively then on getting it to that next step. So it's kind of just one big R&D effort at that point. >> Dave: And so specifically as it relates to the machine, where do you see in the next in the near term, let's call near term next three years, or five years even, what do you see that looking like? Is it this combination of memory width capacitors or flash extensions? What does that look like in terms of commercial terms that we can expect? >> Kirk: So I really think the palette is pretty broad here. That I can see these going into existing rack and tower products to allow them to have memory that's composable down to the individual module level. To be able to take that facility to have just the right resources applied at just the right time with that API that we have in one view. Extend down to composing the hardware itself. I think we look at those edge line systems and want to have just the right kind of analytic capability, large persistent memories at that edge so we can handle those zeta bytes and zeta bytes of data in full fidelity analyzed at the edge sending back that intelligence to the core but also taking action at the edge in a timeframe that matters. I also see it coming out and being the basis of our exoscale high performance computing. You know when you want to have a exoscale system that has all of the combined capacity of the top 500 systems today but 1/20th of their power that is going to take rather novel technologies and everything we've been working on is exactly what's feeding that research and soon to be advanced development and then soon to be production in supply chain. >> Dave: Great. >> John: So the question I have is obviously we saw some really awesome Gen 10 stuff here at this show you guys are seeing that obviously you're on stage talking about a lot of the cool R&D, but really the reality is that's multiple years in the works some of this root of trust silicon technology that's pretty, getting the show buzzed up everyone's psyched about it. Dreamworks Animation's talking about how inorganic opportunities is helping their business and they got the security with the root of trust NIST certified and compliant. Pretty impressive. What's next? What else are you working on because this is where the R&D is on your shoulders for that next level of innovation. Where, what do you guys see that? Because security is a huge deal. That's that great example of how you guys innovated. Cause that'll stop the vector of a tax in the service area of IOT if you can get the servers to lock down and you have firmware that's secure, makes a lot of sense. That's probably the tip of the iceberg. What else is happening with security? >> Kirk: So when we think about security and our efforts on advanced development research around the machine what you're seeing here with the proliance is making the machines more secure. The inherent platform more secure. But the other thing I would point to you is the application we're running on the prototype. Large scale graph inference. And this is security because you have a platform like the machine. Able to digest hundreds and hundreds of tera bytes worth of log data to look for that fingerprint, that subtle clue that you have a system that has been compromised. And these are not blatant let's just blast everything out to some dot dot x x x sub domain, this is an advanced persistent thread by a very capable adversary who is very subtle in their reach out from a system that has been compromised to that command and control server. The signs are there if you can look at the data holistically. If you can look at that DNS log, graph of billions of entries everyday, constantly changing, if you can look at that as a graph in totality in a timeframe that matters then that's an empowering thing for a cyber defense team and I think that's one of the interesting things that we're adding to this discussion. Not only protect, detect and recover, but giving offensive weapons to our cyber defense team so they can hunt, they can hunt for those events for system threats. >> John: One of the things, Andrew I'll get your thoughts and reaction to this because Ill make an observation and you guys can comment and tell me I'm all wet, fell off the deep end or what not. Last year HP had great marketing around the machine. I love that Star Trek ad. It was beautiful and it was just... A machine is very, a great marketing technique. I mean use the machine... So a lot of people set expectations on the machine You saw articles being written maybe these people didn't understand it. Little bit pulled back, almost dampered down a little bit in terms of the marketing of the machine, other than the bin. Is that because you don't yet know what it's going to look like? Or there's so many broader possibilities where you're trying to set expectations? Cause the machine certainly has a lot of range and it's almost as if I could read your minds you don't want to post the position too early on what it could do. And that's my observation. Why the pullback? I mean certainly as a marketer I'd be all over that. >> Andrew: Yeah, I think part of it has been intentional just on how the ecosystem, we need the ecosystem developed kind of around this at the same time. Meaning, there are a lot of kind of moving parts to it whether it's around the open source community and kind of getting their head wrapped around what is this new architecture look like. We've got things like you know the Jin Zee Consortium where we're pouring a lot of our understanding and knowledge into that. And so we need a lot of partners, we know we're in a day and an age where look there's no single one company that's going to do every piece and part themselves. So part of it is kind of enough to get out there, to get the buzz, get the excitement to get other people then on board and now we have been heads down especially this last six months of... >> John: Jamming hard on it. >> Getting it all together. You know you think about what we showed first essentially first booted the thing in November and now you know we've got it running at this scale, that's really been the focus. But we needed a lot of that early engagement, interaction to get a lot of the other, members of the ecosystem kind of on board and starting to contribute. And really that's where we're at today. >> John: It's almost you want it let it take its own course organically because you mentioned just on the cyber surveillance opportunity around the crunching, you kind of don't know yet what the killer app is right? >> And that's the great thing of where we're at today now that we have kind of the prototype running at scale like this, it is allowing us to move beyond, look we've had the simulators to work with, we've had kind of emulation vehicles now you've got the real thing to run actual workloads on. You know we had the announcement around DZ and E as kind of an early early example, but it really now will allow us to do some refinement that allows us to get to those product concepts. >> Dave: I want to just ask the closing question. So I've had this screen here, it's like the theater, and I've been seeing these great things coming up and one was "Moore's Law is dead." >> Oh that was my session this morning. >> Another one was block chain. And unfortunately I couldn't hear it but I could see the tease. So when you guys come to work in the morning what's kind of the driving set of assumptions for you? Is it just the technology is limitless and we're going to go figure it out or are there things that sort of frame your raison d'etre? That drive your activities and thinking? And what are the fundamental assumptions that you guys use to drive your actions? >> Kirk: So what's been driving me for the last couple years is this exponential growth of information that we create as a species. That seems to have no upper bounding function that tamps it down. At the same time, the timeframe we want to get from information, from raw information to insight that we can take action on seems to be shrinking from days, weeks, minutes... Now it's down to micro seconds. If I want to have an intelligent power grid, intelligent 3G communication, I have to have micro seconds. So if you look at those two things and at the same time we just have to be the lucky few who are sitting in these seats right when Moore's Law is slowing down and will eventually flatten out. And so all the skills that we've had over the last 28 years of my career you look at those technologies and you say "Those aren't the ones that are going "to take us forward." This is an opportunity for us to really look and examine every piece of this, because if was something we could of just can't we just dot dot dot do one thing? We would do it, right? We can't just do one thing. We have to be more holistic if we're going to create the next 20, 30, 40 years of innovation. And that's really what I'm looking at. How do we get back exponential scaling on supply to meet this unending exponential demand? >> Dave: So technically I would imagine, that's a very hard thing to balance because the former says that we're going to have more data than we've ever seen. The latter says we've got to act on it fast which is a great trend for memory but the economics are going to be such a challenge to meet, to balance that. >> Kirk: We have to be able to afford the energy, and we have to be able to afford the material cost, and we have to be able to afford the business processes that do all these things. So yeah, you need breakthroughs. And that's really what we've been doing. And I think that's why we're so fortunate at Hewlett Packard Enterprise to have the labs team but also that world class engineering and that world class supply chain and a services team that can get us introduced to every interesting customer around the world who has those challenging problems and can give us that partnership and that insight to get those kind of breakthroughs. >> Dave: And I wonder if there will be a tipping point, if the tipping point will be, and I'm sure you've thought about this, a change in the application development model that drives so much value and so much productivity that it offsets some of the potential cost issues of changing the development paradigm. >> And I think you're seeing hints of that. Now we saw this when we went from systems of record, OLTP systems, to systems of engagement, mobile systems, and suddenly new ways to develop it. I think now the interesting thing is we move over to systems of action and we're moving from programmatic to training. And this is this interesting thing if you have those data bytes of data you can't have a pair of human eyeballs in front of that, you have to have a machine learning algorithm. That's the only thing that's voracious enough to consume this data in a timely enough fashion to get us answers, but you can't program it. We saw those old approaches in old school A.I., old school autonomous vehicle programs, they go about 10 feet, boom, and they'd flip over, right? Now you know they're on our streets and they are functioning. They're a little bit raw right now but that improvement cycle is fantastic because they're training, they're not programming. >> Great opportunity to your point about Moore's Law but also all this new functionality that has yet been defined, is right on the doorstep. Andrew, Kirk thank you so much for sharing. >> Andrew: Thank you >> Great insight, love Hewlett Packard Labs love the R&D conversation. Gets us a chance to go play in the wild and dream about the future you guys are out creating it congratulations and thanks for spending the time on The Cube, appreciate it. >> Thanks. >> The Cube coverage will continue here live at Las Vegas for HPE Discover 2017, Hewlett Packard Enterprises annual event. We'll be right back with more, stay with us. (bright music)

Published Date : Jun 8 2017

SUMMARY :

brought to you by Hewlett Packard Enterprise. go outside the lines if you will? kind of near term, maybe you know one to three I mean that's what you're... all that good stuff, you guys still have Breakthroughs, how do we differentiate is the security and chip thing was pretty hot, of $265 billion tam but the notable So that the customers don't have to taking on more of that burden to allow you in terms of what you know just from I mean but, this is the truth. that we created the applications for the... A lot of people don't know that Right now almost all that data on the edge vision, you got to apply R&D to the problem... How are you guys bringing in the young guns? All of that needs to happen at the edge as well Dave: So Kirk you have your machine button, So when I first entered you talked about So in terms of the three that Andrew laid out technologies that over the last three years of gateways, I think we can see these intercept that kind of magnifies the model we were So how much do you guys get involved hey we think you know maybe this system that has all of the combined capacity the servers to lock down and you have firmware But the other thing I would point to you John: One of the things, the ecosystem, we need the ecosystem kind of on board and starting to contribute. And that's the great thing of where we're the theater, and I've been seeing these that you guys use to drive your actions? and at the same time we just have to be but the economics are going to be such a challenge the energy, and we have to be able to afford that it offsets some of the potential cost issues to get us answers, but you can't program it. is right on the doorstep. and thanks for spending the time on We'll be right back with more, stay with us.

ENTITIES

Entity	Category	Confidence
Kirk	PERSON	0.99+
Andrew	PERSON	0.99+
Dave	PERSON	0.99+
Andrew Wheeler	PERSON	0.99+
Tim Berners-Lee	PERSON	0.99+
John	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Ray Kersewile	PERSON	0.99+
Hewlett Packard Labs	ORGANIZATION	0.99+
Meg	PERSON	0.99+
2003	DATE	0.99+
HP	ORGANIZATION	0.99+
Hewlett Packard	ORGANIZATION	0.99+
1994	DATE	0.99+
Las Vegas	LOCATION	0.99+
Gene Emdall	PERSON	0.99+
$265 billion	QUANTITY	0.99+
Kirk Bresniker	PERSON	0.99+
Jin Zee Consortium	ORGANIZATION	0.99+
November	DATE	0.99+
three	QUANTITY	0.99+
Dreamworks Animation	ORGANIZATION	0.99+
Last year	DATE	0.99+
Star Trek	TITLE	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
160 terabyte	QUANTITY	0.99+
three day	QUANTITY	0.99+
500 people	QUANTITY	0.99+
HP Labs	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
five year	QUANTITY	0.99+
One	QUANTITY	0.99+
yesterday	DATE	0.99+
three years	QUANTITY	0.99+
Hewlett Packard Labs	ORGANIZATION	0.99+
one	QUANTITY	0.98+
1%	QUANTITY	0.98+
Moore's Law is dead	TITLE	0.98+
early 1980s	DATE	0.98+
five years	QUANTITY	0.98+
five years ago	DATE	0.98+
first	QUANTITY	0.98+
today	DATE	0.98+
1/20th	QUANTITY	0.98+
three types	QUANTITY	0.97+
DCIG	ORGANIZATION	0.97+
500 systems	QUANTITY	0.97+

John Cavanaugh, HP - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's theCube, covering Spark Summit 2017, brought to you by Databricks. >> Welcome back to theCube at Spark Summit 2017. I don't know about you, George, I'm having a great time learning from all of our attendees. >> We've been absorbing now for almost two days. >> Yeah, well, and we're about to absorb a little bit more here, too, because the next guest, I looking forward to, I saw his name on the schedule, all right, that's the guy who talks about herding cats, it's John Cavanaugh, Master Architect from HP. John, welcome to the show. >> Great, thanks for being here. >> Well, I did see, I don't know if it's about cats in the Internet, but either cats or self-driving cars, one of the two in analogies. But talk to us about your session. Why did you call it Herding Cats, and is that related to maybe the organization at HP? >> Yeah, there's a lot of organizational dynamics as part of our migration at Spark. HP is a very distributed organization, and it has had a lot of distributed autonomy, so, you know, trying to get centralized activity is often a little challenging. You guys have often heard, you know, I am from the government, I'm here to help. That's often the kind of shields-up response you will get from folks, so we got a lot of dynamics in terms of trying to bring these distributed organizations on board to a new common platform, and a allay many of the fears that they had with making any kind of a change. >> So, are you centered at a specific division? >> So, yes, I'm the print platforms and future technology group. You know, there's two large business segments with HP. There's our personal systems group that produces everything from phones to business PCs to high-end gaming. But I'm in the printing group, and while many people are very familiar with your standard desktop printer, you know, the printers we sell really vary from a very small product we call Sprocket, it fits in your hand, battery-operated, to literally a web press that's bigger than your house and prints at hundreds of feet per minute. So, it's a very wide product line, and it has a lot of data collection. >> David: Do you have 3D printing as well? >> We do have 3D printing as well. That's an emergent area for us. I'm not super familiar with that. I'm mostly on the 2D side, but that's a very exciting space as well. >> So tell me about what kind of projects that you're working on that do require that kind of cross-team or cross-departmental cooperation. >> So, you know, in my talk, I talked about the Wild West Era of Big Data, and that was prior to 2015, and we had a lot of groups that were standing up all kinds of different big data infrastructures. And part of this stems from the fact that we were part of HP at the time, and we could buy servers and racks of servers at cost. Storage was cheap, all these things, so they sprouted up everywhere. And, around 2015, everybody started realizing, oh my God, this is completely fragmented. How do we pull things back together? And that's when a lot of groups started trying to develop platformish types of activities, and that's where we knew we needed to go, but there was even some disagreement from different groups, how do we move forward. So, there's been a lot of good work within HP in terms of creating a virtual community, and Spark really kind of caught on pretty quickly. Many people were really tired of kind of Hadoop. There were a lot of very opinionated models in Hadoop, where Spark opens up a lot more into the data science community. So, that went really well, and we made a big push into AWS for much of our cloud activities, and we really ended up then pretty quickly with Databricks as an enterprise partner for us. >> And so, George, you've done a lot of research. I'm sure you talked to enterprise companies along the way. Is this a common issue with big enterprises? >> Well, for most big data projects they've started, the ones we hear a lot about is there's a mandate from the CIO, we need a big data strategy, and so some of those, in the past, stand up five or 10-node Hadoop cluster and run some sort of pilot and say, this is our strategy. But is sounds like you herded a lot of cats... >> We had dozens of those small Hadoop clusters all around the company. (laughter) >> So, how did you go about converting that energy, that excess energy towards something more harmonized around Databricks? >> Well, a lot of people started recognizing we had a problem, and this really wasn't going to scale, and we really needed to come up with a broader way to share things across the organization. So, the timing was really right, and a lot of people were beginning to understand that. And, you know, we said for us, probably about five different kind of key decisions we ended up making. And part of the whole strategy was to empower the businesses. As I have mentioned, we are a very distributed organization, so, you can't really dictate the businesses. The businesses really need the owners' success. And one of the decisions that was made, it might be kind of controversial for many CIOs, is that we've made a big push on cloud-hosted and business-owned, not IT-owned. And one of the real big reasons for that is we were no longer viewing data and big data as kind of a business-intelligence activity or a standardized reporting activity. We really knew that, to be successful moving forward, is needed to be built into our products and services, and those products and services are managed by the businesses. So, it can't be something that would be tossed off to an IT organization. >> So that the IT organization, then, evolved into being more of an innovative entity versus a reactive or supportive entity for all those different distributing groups. >> Well, in our regard, we've ended up with AWS as part of our activity, and, really, much of our big data activities are driven by the businesses. The connections we have with IT are more related to CRM and product data master sheets and selling in channels and all that information. >> But if you take a bunch of business-led projects and then try and centralize some aspect of them, wouldn't IT typically become the sort of shared infrastructure architecture advisor for that, and then the businesses now have a harmonized platform on which they can build shared data sets? >> Actually, in our case, that's what we did. We had a lot of our businesses that already had significant services hosted in AWS. And those were very much part of the high-data generators. So, it became a very natural evolution to continue with some of our AWS relationships and continue on to Databricks. So, as an organization today, we have three kind of main buckets for our Databricks, but, you know, any business, they can get their accounts. We try and encourage everything to get into a data link, and that's three, and Parquet formats, one of the decisions that was adapted. And then, from there, people can begin to move. You know, you can get notebooks, you can share notebooks, you can look at those things. You know, the beauty of Databricks and AWS is instant on. If I want to play around with something with a half a dozen nodes, it's great. If I need a thousand for a workload, boom, I've got it! I know, kind of others, then, with this cost and the value returned, there's really no need for permissions or coordination with other entities, and that's kind of what we wanted the businesses to have that autonomy to drive their business success. >> But, does there not to be some central value added in the way of, say, data curation through a catalog or something like that? >> Yes, so, this is not necessarily a model where all the businesses are doing all kinds of crazy things. One of the things that we shepherded by one of our CTOs and the other functions, we ended up creating a virtual community within HP. This kind of started off with a lot of "tribal elders" or "tribal leaders." With this virtual community, today we get together every two weeks, and we have presentations and discussions on all things from data science into machine learning, and that's where a lot of this activity around how do we get better at sharing. And this is fostered, kind of splinters off for additional activity. So we have one on data telemetry within our organization. We're trying to standardize more data formats and schemas for those so we can have more broader sharing. So, these things have been occurring more organically as part of a developer enablement kind of moving up rather than more of kind of dictates moving down. >> That's interesting. Potentially, really important, when you say, you're trying to standardize some of the telemetry, what are you instrumenting. Is it just all the infrastructure or is it some of the products that HP makes? >> It's definitely the products and the software. You know, like I said, we manage a huge spectrum of print products, and my apologies if I'm focusing on it, but that is what I know the best. You know, we've actually been doing telemetry and analysis since the late 90s. You know, we wanted to understand use of supplies and usage so we could do our own forecasting, and that's really, really grown over the years. You know, now, we have parts of our services organization management services, where they're offering big data analytics as part of the package, and we provide information about predictive failure of parts. And that's going to be really valuable for some of our business partners that allows them. We have all kinds fancy algorithms that we work on. The customers have specific routes that they go for servicing, and we may be able to tell them, hey, in a certain time period, we think these devices in your field so you can coordinate your route to hit those on an efficient route rather than having to make a single truck roll for one repair, and do that before a customer experiences a problem. So, it's been kind of a great example of different ways that big data can impact the business. >> You know, I think Ali mentioned in the keynote this morning about the example of a customer getting a notification that their ink's going to run out, and the chance that you get to touch that customer and get them to respond and buy, you could make millions of dollar difference, right? Let's talk about some of the business outcomes and the impact that some of your workers have done, and what it means, really, to the business. >> Right now, we're trying to migrate a lot of legacy stuff, and you know, that's kind of boring. (laughs) It's just a lot of work, but there are things that need to happen. But there's really the power of the big data platform has been really great with Databricks. I know, John Landry, one of our CTOs, he's in the personal systems group. He had a great example on some problems they had with batteries and laptops, and, you know, they have a whole bunch of analytics. They've been monitoring batteries, and they found a collection of batteries that experienced very early failure rates. I happen to be able to narrow it down to specific lots from a specific supplier, and they were able to reach out to customers to get those batteries replaced before they died. >> So, a mini-recall instead of a massive PR failure. (laughs) >> You know, it was really focused on, you know, customers didn't even know they were going to have a problem with these batteries, that they were going to die early. You know, you got to them ahead of time, told them we knew this was going to be a problem and try to help them. I mean, what a great experience for a customer. (laughs) That's just great. >> So, once you had this telemetry, and it sounds like a bunch of shared repositories, not one intergalactic one. What were some of the other use cases like, you know, like the battery predictive failure type scenarios. >> So, you know, we have some very large gaps, or not gaps, with different categories. We have clearly consumer products. You know, you sell millions and millions of those, and we have little bit of telemetry with those. I think we want to understand failures and ink levels and some of these other things. But, on our commercial web presses, these very large devices, these are very sensitive. These things are down, they have a big problem. So, these things are generating all kinds of data. All right, we have systems on a premise with customers that are alerting them to potential failures, and there's more and more activity going on there to understand predictive failure and predictive kind of tolerance slippages. I'm not super familiar with that business, but I know some guys that they've started introducing more sensors into products, specifically so they can get more data, to understand things. You know, slight variations in tensioning and paper, you know, these things that are running hundreds of feet per minute can have a large impact. So, I think that's really where we see more and more of the value coming from is being able to return that value back to the customer, not just help us make better decisions, but to get that back to the customer. You know, we're talking about expanding more customer-facing analytics in these cases, or we'll expose to customers some of the raw data, and they can build their own dashboards. Some of these industries have traditionally been very analog, so this move to digital web process and this mountain of data is a little new for them, but HP can bring a lot to the table in terms of our experience in computing and big data to help them with their businesses. >> All right, great stuff. And we just got a minute to go before we're done. I have two questions for you, the first is an easy yes/no question. >> John: Okay. >> Is Purdue going to repeat as Big 10 champ in basketball? >> Oh, you know, I don't know. (laughs) I hope so! >> We both went to Purdue. >> I'm more focused on the Warriors winning. (laughter) >> All right, go Warriors! And, the real question is, what surprised you the most? This is your first Spark Summit. What surprised you the most about the event? >> So, you know, you see a lot of Internet-born companies, and it's amazing how many people have just gone fully native with Spark all over the place, and it's a beautiful thing to see. You know, in larger enterprises, that transition doesn't happen like that. I'm kind of jealous. (laughter) We have a lot more things slug through, but the excitement here and all the things that people are working on, you know, you can only see so many tracks. I'm going to have to spend two days when I get back, just watching the videos on all of the tracks I couldn't attend. >> All right, Internet-born companies versus the big enterprise. Good luck herding those cats, and thank you for sharing your story with us today and talking a little bit about the culture there at HP. >> John: Thank you very much. >> And thank you all for watching this segment of theCube. Stay with us, we're still covering Spark Summit 2017. This is Day Two, and we're not done yet. We'll see you in a few minutes. (theCube jingle)

Published Date : Jun 7 2017

SUMMARY :

covering Spark Summit 2017, brought to you by Databricks. Welcome back to theCube at Spark Summit 2017. all right, that's the guy who talks about herding cats, and is that related to maybe the organization at HP? and a allay many of the fears that they had and it has a lot of data collection. I'm mostly on the 2D side, that you're working on and we had a lot of groups that were standing up I'm sure you talked to enterprise companies along the way. the ones we hear a lot about is all around the company. and we really needed to come up with So that the IT organization, then, evolved and selling in channels and all that information. and Parquet formats, one of the decisions that was adapted. One of the things that we shepherded or is it some of the products that HP makes? and that's really, really grown over the years. and the chance that you get to touch that customer a lot of legacy stuff, and you know, that's kind of boring. So, a mini-recall instead of a massive PR failure. You know, it was really focused on, you know, What were some of the other use cases like, you know, and we have little bit of telemetry with those. And we just got a minute to go before we're done. Oh, you know, I don't know. I'm more focused on the Warriors winning. And, the real question is, what surprised you the most? and it's a beautiful thing to see. and thank you for sharing your story with us today And thank you all for watching this segment of theCube.

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
John	PERSON	0.99+
John Cavanaugh	PERSON	0.99+
David	PERSON	0.99+
John Landry	PERSON	0.99+
AWS	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
2015	DATE	0.99+
millions	QUANTITY	0.99+
two questions	QUANTITY	0.99+
two days	QUANTITY	0.99+
two	QUANTITY	0.99+
Ali	PERSON	0.99+
one	QUANTITY	0.99+
first	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
half a dozen	QUANTITY	0.99+
Hadoop	TITLE	0.99+
three	QUANTITY	0.99+
late 90s	DATE	0.98+
Warriors	ORGANIZATION	0.98+
Spark Summit 2017	EVENT	0.98+
hundreds of feet	QUANTITY	0.98+
dozens	QUANTITY	0.98+
One	QUANTITY	0.98+
both	QUANTITY	0.96+
Spark Summit	EVENT	0.96+
today	DATE	0.96+
hundreds of feet per minute	QUANTITY	0.94+
Spark	ORGANIZATION	0.93+
single truck	QUANTITY	0.93+
a thousand	QUANTITY	0.92+
one repair	QUANTITY	0.92+
five	QUANTITY	0.91+
this morning	DATE	0.89+
Purdue	ORGANIZATION	0.87+
Sprocket	ORGANIZATION	0.87+
2D	QUANTITY	0.84+
Day Two	QUANTITY	0.83+
every two weeks	QUANTITY	0.81+
dollar	QUANTITY	0.81+
three kind	QUANTITY	0.81+
two large business segments	QUANTITY	0.7+
Spark	TITLE	0.69+
five different	QUANTITY	0.64+
Herding Cats	ORGANIZATION	0.64+
about	QUANTITY	0.6+
10-node	QUANTITY	0.57+
so many	QUANTITY	0.51+
Purdue	EVENT	0.48+
theCube	ORGANIZATION	0.47+
Parquet	TITLE	0.42+
eally	ORGANIZATION	0.3+
10	QUANTITY	0.28+

Natalia Vassilieva & Kirk Bresniker, HP Labs - HPE Discover 2017

>> Announcer: Live from Las Vegas, it's the CUBE! Covering HPE Discover 2017. Brought to you by Hewlett Packard Enterprise. >> Hey, welcome back, everyone. We are live here in Las Vegas for SiliconANGLE Media's CUBE exclusive coverage of HPE Discover 2017. I'm John Furrier, my co-host, Dave Vellante. Our next guest is Kirk Bresniker, fellow and VP chief architect of Hewlett Packard Labs, and Natalia Vassilieva, senior research manager, Hewlett Packard Labs. Did I get that right? >> Yes! >> John: Okay, welcome to theCUBE, good to see you. >> Thank you. >> Thanks for coming on, really appreciate you guys coming on. One of the things I'm most excited about here at HPE Discover is, always like to geek out on the Hewlett Packard Labs booth, which is right behind us. If you go to the wide shot, you can see the awesome display. But there's some two things in there that I love. The Machine is in there, which I love the new branding, by the way, love that pyramid coming out of the, the phoenix rising out of the ashes. And also Memristor, really game-changing. This is underlying technology, but what's powering the business trends out there that you guys are kind of doing the R&D on is AI, and machine learning, and software's changing. What's your thoughts as you look at the labs, you look out on the landscape, and you do the R&D, what's the vision? >> One of the things what is so fascinating about the transitional period we're in. We look at the kind of technologies that we've had 'til date, and certainly spent a whole part of my career on, and yet all these technologies that we've had so far, they're all kind of getting about as good as they're going to get. You know, the Moore's Law semiconductor process steps, general-purpose operating systems, general-purpose microprocessors, they've had fantastic productivity growth, but they all have a natural life cycle, and they're all maturing. And part of The Machine research program has been, what do we think is coming next? And really, what's informing us as what we have to set as the goals are the kinds of applications that we expect. And those are data-intensive applications, not just petabytes, exabytes, but zettabytes. Tens of zettabytes, hundreds of zettabytes of data out there in all those sensors out there in the world. And when you want to analyze that data, you can't just push it back to the individual human, you need to employ machine learning algorithms to go through that data to call out and find those needles in those increasingly enormous haystacks, so that you can get that key correlation. And when you don't have to reduce and redact and summarize data, when you can operate on the data at that intelligent edge, you're going to find those correlations, and that machine learning algorithm is going to be that unbiased and unblinking eye that's going to find that key relationship that'll really have a transformational effect. >> I think that's interesting. I'd like to ask you just one follow-up question on that, because I think, you know, it reminds me back when I was in my youth, around packets, and you'd get the buffer, and the speeds, and feeds. At some point there was a wire speed capability. Hey, packets are moving, and you can do all this analysis at wire speed. What you're getting at is, data processing at the speed of, as fast as the data's coming in and out. Is that, if I get that right, is that kind of where you're going with this? Because if you have more data coming, potentially an infinite amount of data coming in, the data speed is going to be so high-velocity, how do you know what a needle looks like? >> I think that's a key, and that's why the research Natalia's been doing is so fundamental, is that we need to be able to process that incredible amount of information and be able to afford to do it. And the way that you will not be able to have it scale is if you have to take that data, compress it, reduce it, select it down because of some pre-determined decision you've made, transmit it to a centralized location, do the analysis there, then send back the action commands. Now, we need that cycle of intelligence measurement, analysis and action to be microseconds. And that means it needs to happen at the intelligent edge. I think that's where the understanding of how machine learning algorithms, that you don't program, you train, so that they can work off of this enormous amount of data, they voraciously consume the data, and produce insights. That's where machine learning will be the key. >> Natalia, tell us about your research on this area. Curious. Your thoughts. >> We started to look at existing machine learning algorithms, and whether their limiting factors today in the infrastructure which don't allow to progress the machine learning algorithms fast enough. So, one of the recent advances in AI is appearance, or revival, of those artificial neural networks. Deep learning. That's a very large hype around those types of algorithms. Every speech assistant which you get, Siri in your phone, Cortana, or whatever, Alexa by Amazon, all of them use deep learning to train speech recognition systems. If you go to Facebook and suddenly it starts you to propose to mark the faces of your friends, that the face detection, face recognition, also that was deep learning. So that's a revival of the old artificial neural networks. Today we are capable to train byte-light enough models for those types of tasks, but we want to move forward. We want to be able to process larger volumes of data, to find more complicated patterns, and to do that, we need more compute power. Again, today, the only way how you can add more compute power to that, you scale out. So there is no compute device on Earth today which is capable to do all the computation. You need to have many of them interconnect together, and they all crunch numbers for the same problem. But at some point, the communication between those nodes becomes a bottleneck. So you need to let know laboring node what you achieved, and you can't scale out anymore. Adding yet another node to the cluster won't lead up to the reduction of the training time. With The Machine, when we have added the memory during computing architecture, when all data seeds in the same shared pool of memory, and when all computing nodes have an ability to talk to that memory. We don't have that limitation anymore. So for us, we are looking forward to deploy those algorithms on that type of architecture. We envision significant speedups in the training. And it will allow us to retrain the model on the new data, which is coming. To do not do training offline anymore. >> So how does this all work? When HP split into two companies, Hewlett Packard Labs went to HPE and HP Labs went to HP Ink. So what went where, and then, first question. Then second question is, how do you decide what to work on? >> I think in terms of how we organize ourselves, obviously, things that were around printing and personal systems went to HP Ink. Things that were around analytics, enterprise, hardware and research, went to Hewlett Packard Labs. The one thing that we both found equally interesting was security, 'cause obviously, personal systems, enterprise systems, we all need systems that are increasingly secure because of the advanced, persistent threats that are constantly assaulting everything from our personal systems up through enterprise and public infrastructure. So that's how we've organized ourselves. Now in terms of what we get to work on, you know, we're in an interesting position. I came to Labs three years ago. I used to be the chief technologist for the server global business unit. I was in the world of big D, tiny R. Natalia and the research team at Labs, they were out there looking out five, 10, 15, or 20 years. Huge R, and then we would meet together occasionally. I think one of the things that's happened with our machine advanced development and research program is, I came to Labs not to become a researcher, but to facilitate that communication to bring in the engineering, the supply chain team, that technical and production prowess, our experience from our services teams, who know how things actually get deployed in the real world. And I get to set them at the bench with Natalia, with the researchers, and I get to make everyone unhappy. Hopefully in equal amounts. That the development teams realize we're going to make some progress. We will end up with fantastic progress and products, both conventional systems as well as new systems, but it will be a while. We need to get through, that's why we had to build our prototype. To say, "No, we need a construction proof of these ideas." The same time, with Natalia and the research teams, they were always looking for that next horizon, that next question. Maybe we pulled them a little bit closer, got a little answers out of them rather than the next question. So I think that's part of what we've been doing at the Labs is understanding, how do we organize ourselves? How do we work with the Hewlett Packard Enterprise Pathfinder program, to find those little startups who need that extra piece of something that we can offer as that partnering community? It's really a novel approach for us to understand how do we fill that gap, how do we still have great conventional products, how do we enable breakthrough new category products, and have it in a timeframe that matters? >> So, much tighter connection between the R and the D. And then, okay, so when Natalia wants to initiate a project, or somebody wants Natalia to initiate a project around AI, how does that work? Do you say, "Okay, submit an idea," and then it goes through some kind of peer review? And then, how does it get funded? Take us through that. >> I think I'll give my perspective, I would love to hear what you have from your side. For me, it's always been organic. The ideas that we had on The Machine, for me, my little thread, one of thousands that's been brought in to get us to this point, started about 2003, where we were getting ready for, we're midway through Blade Systems C-class. A category-defining product. A absolute home run in defining what a Blade system was going to be. And we're partway through that, and you realize you got a success on your hands. You think, "Wow, nothing gets better than this!" Then it starts to worry, what if nothing gets better than this? And you start thinking about that next set of things. Now, I had some insights of my own, but when you're a technologist and you have an insight, that's a great feeling for a little while, and then it's a little bit of a lonely feeling. No one else understands this but me, and is it always going to be that way? And then you have to find that business opportunity. So that's where talking with our field teams, talking with our customers, coming to events like Discover, where you see business opportunities, and you realize, my ingenuity and this business opportunity are a match. Now, the third piece of that is someone who can say, a business leader, who can say, "You know what?" "Your ingenuity and that opportunity can meet "in a finite time with finite resources." "Let's do it." And really, that's what Meg and leadership team did with us on The Machine. >> Kirk, I want to shift gears and talk about the Memristor, because I think that's a showcase that everyone's talking about. Actually, The Machine has been talked about for many years now, but Memristor changes the game. It kind of goes back to old-school analog, right? We're talking about, you know, login, end-login kind of performance, that we've never seen before. So it's a completely different take on memory, and this kind of brings up your vision and the team's vision of memory-driven computing. Which, some are saying can scale machine learning. 'Cause now you have data response times in microseconds, as you said, and provisioning containers in microseconds is actually really amazing. So, the question is, what is memory-driven computing? What does that mean? And what are the challenges in deep learning today? >> I'll do the machine learning-- >> I will do deep learning. >> You'll do the machine learning. So, when I think of memory-driven computing, it's the realization that we need a new set of technologies, and it's not just one thing. Can't we just do, dot-dot-dot, we would've done that one thing. This is more taking a holistic approach, looking at all the technologies that we need to pull together. Now, memories are fascinating, and our Memristor is one example of a new class of memory. But they also-- >> John: It's doing it differently, too, it's not like-- >> It's changing the physics. You want to change the economics of information technology? You change the physics you're using. So here, we're changing physics. And whether it's our work on the Memristor with Western Digital and the resistive RAM program, whether it's the phase-change memories, whether it's the spin-torque memories, they're all applying new physics. What they all share, though, is the characteristic that they can continue to scale. They can scale in the layers inside of a die. The die is inside of a package. The package is inside of a module, and then when we add photonics, a transformational information communications technology, now we're scaling from the package, to the enclosure, to the rack, cross the aisle, and then across the data center. All that memory accessible as memory. So that's the first piece. Large, persistent memories. The second piece is the fabric, the way we interconnect them so that we can have great computational, great memory, great communication devices available on industry open standards, that's the Gen-Z Consortium. The last piece is software. New software as well as adapting existing productive programming techniques, and enabling people to be very productive immediately. >> Before Natalia gets into her piece, I just want to ask a question, because this is interesting to me because, sorry to get geeky here, but, this is really cool because you're going analog with signaling. So, going back to the old concepts of signaling theory. You mentioned neural networks. It's almost a hand-in-glove situation with neural networks. Here, you have the next question, which is, connect the dots to machine learning and neural networks. This seems to be an interesting technology game-changer. Is that right? I mean, am I getting this right? What's this mean? >> I'll just add one piece, and then hear Natalia, who's the expert on the machine learning. For me, it's bringing that right ensemble of components together. Memory technologies, communication technologies, and, as you say, novel computational technologies. 'Cause transistors are not going to get smaller for very much longer. We have to think of something more clever to do than just stamp out another copy of a standard architecture. >> Yes, you asked about changes of deep learning. We look at the landscape of deep learning today, and the set of tasks which are solved today by those problems. We see that although there is a variety of tasks solved, most of them are from the same area. So we can analyze images very efficiently, we can analyze video, though it's all visual data, we can also do speech processing. There are few examples in other domains, with other data types, but they're much fewer. It's much less knowledge how to, which models to train for those applications. The thing that one of the challenges for deep learning is to expand the variety of applications which it can be used. And it's known that artificial neural networks are very well applicable to the data where there are many hidden patterns underneath. And there are multi-dimensional data, like data from sensors. But we still need to learn what's the right topology of neural networks to do that. What's the right algorithm to train that. So we need to broaden the scope of applications which can take advantage of deep learning. Another aspect is, which I mentioned before, the computational power of today's devices. If you think about the well-known analogy of artificial neural network in our brain, the size of the model which we train today, the artificial neural networks, they are much, much, much smaller than the analogous thing in our brain. Many orders of magnitude. It was shown that if you increase the size of the model, you can get better accuracy for some tasks. You can process a larger variety of data. But in order to train those large models, you need more data and you need more compute power. Today, we don't have enough compute power. Actually did some computation, though in order to train a model which is comparable in size with our human brain, you will need to train it in a reasonable time. You will need a compute device which is capable to perform 10 to the power of 26 floating-point operations per second. We are far, far-- >> John: Can you repeat that again? >> 10 to the power of 26. We are far, far below that point now. >> All right, so here's the question for you guys. There's all this deep learning source code out there. It's open bar for open source right now. All this goodness is pouring in. Google's donating code, you guys are donating code. It used to be like, you had to build your code from scratch. Borrow here and there, and share in open source. Now it's a tsunami of greatness, so I'm just going to build my own deep learning. How do customers do that? It's too hard. >> You are right on the point to the next challenge of deep learning, which I believe is out there. Because we have so many efforts to speed up the infrastructure, we have so many open source libraries. So now the question is, okay, I have my application at hand. What should I choose? What is the right compute node to the deep learning? Everybody use GPUs, but is it true for all models? How many GPUs do I need? What is the optimal number of nodes in the cluster? And we have a research effort towards to answer those questions as well. >> And a breathalyzer for all the drunk coders out there, open bar. I mean, a lot of young kids are coming in. This is a great opportunity for everyone. And in all seriousness, we need algorithms for the algorithms. >> And I think that's where it's so fascinating. We think of some classes of things, like recognizing written handwriting, recognizing voice, but when we want to apply machine learning and algorithms to the volume of sensor data, so that every manufactured item, and not only every item we manufacture, but every factory that can be fully instrumented with machine learning understanding how it can be optimized. And then, what of the business processes that are feeding that factory? And then, what are the overall economic factors that are feeding that business? And instrumenting and having this learning, this unblinking, unbiased eye examining to find those hidden correlations, those hidden connections, that could yield a very much more efficient system at every level of human enterprise. >> And the data's more diverse now than ever. I'm sorry to interrupt, but in Voice you mentioned you saw Siri, you see Alexa, you see Voice as one dataset. Data diversity's massive, so more needles, more types of needles than ever before. >> In that example that you gave, you need a domain expert. And there's plenty of those, but you also need a big brain to build the model, and train the model, and iterate. And there aren't that many of those. Is the state of machine learning and AI going to get to the point where that problem will solve itself, or do we just need to train more big brains? >> Actually, one of the advantages of deep learning that you don't need that much effort from the domain experts anymore, from the step which was called future engineering, like, what do you do with your data before you throw machine learning algorithm into that? So they're, pretty thing, cool thing about deep learning, artificial neural network, that you can throw almost raw data into that. And there are some examples out there, that the people without any knowledge in medicine won the competition of the drug recognition by applying deep neural networks to that, without knowing all the details about their connection between proteins, like that. Not domain experts, but they still were able to win that competition. Just because algorithm that good. >> Kirk, I want to ask you a final question before we break in the segment because, having spent nine years of my career at HP in the '80s and '90s, it's been well-known that there's been great research at HP. The R&D has been spectacular. Not too much R, I mean, too much D, not enough applied, you mention you're bringing that to market faster, so, the question is, what should customers know about Hewlett Packard Labs today? Your mission, obviously the memory-centric is the key thing. You got The Machine, you got the Memristor, you got a novel way of looking at things. What's the story that you'd like to share? Take a minute, close out the segment and share Hewlett Packard Labs' mission, and what expect to see from you guys in terms of your research, your development, your applications. What are you guys bringing out of the kitchen? What's cooking in the oven? >> I think for us, it is, we've been given an opportunity, an opportunity to take all of those ideas that we have been ruminating on for five, 10, maybe even 15 years. All those things that you thought, this is really something. And we've been given the opportunity to build a practical working example. We just turned on the prototype with more memory, more computation addressable simultaneously than anyone's ever assembled before. And so I think that's a real vote of confidence from our leadership team, that they said, "Now, the ideas you guys have, "this is going to change the way that the world works, "and we want to see you given every opportunity "to make that real, and to make it effective." And I think everything that Hewlett Packard Enterprise has done to focus the company on being that fantastic infrastructure, provider and partner is just enabling us to get this innovation, and making it meaningful. I've been designing printed circuit boards for 28 years, now, and I must admit, it's not as, you know, it is intellectually stimulating on one level, but then when you actually meet someone who's changing the face of Alzheimer's research, or changing the way that we produce energy as a society, and has an opportunity to really create a more sustainable world, then you say, "That's really worth it." That's why I get up, come to Labs every day, work with fantastic researchers like Natalia, work with great customers, great partners, and our whole supply chain, the whole team coming together. It's just spectacular. >> Well, congratulations, thanks for sharing the insight on theCUBE. Natalia, thank you very much for coming on. Great stuff going on, looking forward to keeping the progress and checking in with you guys. Always good to see what's going on in the Lab. That's the headroom, that's the future. That's the bridge to the future. Thanks for coming in theCUBE. Of course, more CUBE coverage here at HP Discover, with the keynotes coming up. Meg Whitman on stage with Antonio Neri. Back with more live coverage after this short break. Stay with us. (energetic techno music)

Published Date : Jun 6 2017

SUMMARY :

Brought to you by Hewlett Packard Enterprise. Did I get that right? the business trends out there that you guys and that machine learning algorithm is going to be the data speed is going to be so high-velocity, And the way that you will not be able to have it scale Natalia, tell us about your research on this area. and to do that, we need more compute power. Then second question is, how do you decide what to work on? And I get to set them at the bench Do you say, "Okay, submit an idea," and is it always going to be that way? and the team's vision of memory-driven computing. it's the realization that we need a new set of technologies, that they can continue to scale. connect the dots to machine learning and neural networks. We have to think of something more clever to do What's the right algorithm to train that. 10 to the power of 26. All right, so here's the question for you guys. What is the right compute node to the deep learning? And a breathalyzer for all the to the volume of sensor data, I'm sorry to interrupt, but in Voice you mentioned In that example that you gave, you need a domain expert. that you don't need that much effort and what expect to see from you guys "Now, the ideas you guys have, to keeping the progress and checking in with you guys.

ENTITIES

Entity	Category	Confidence
Natalia	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Natalia Vassilieva	PERSON	0.99+
Hewlett Packard Labs	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Meg Whitman	PERSON	0.99+
Antonio Neri	PERSON	0.99+
HP Labs	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Kirk Bresniker	PERSON	0.99+
HP	ORGANIZATION	0.99+
John	PERSON	0.99+
28 years	QUANTITY	0.99+
John Furrier	PERSON	0.99+
second piece	QUANTITY	0.99+
Siri	TITLE	0.99+
second question	QUANTITY	0.99+
Kirk	PERSON	0.99+
Cortana	TITLE	0.99+
first piece	QUANTITY	0.99+
10	QUANTITY	0.99+
15	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Earth	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
nine years	QUANTITY	0.99+
15 years	QUANTITY	0.99+
third piece	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Today	DATE	0.99+
Hewlett Packard Labs'	ORGANIZATION	0.99+
Western Digital	ORGANIZATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
HP Ink	ORGANIZATION	0.99+
today	DATE	0.99+
Tens of zettabytes	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Meg	PERSON	0.99+
Google	ORGANIZATION	0.99+
three years ago	DATE	0.99+
one piece	QUANTITY	0.98+
Alexa	TITLE	0.98+
26	QUANTITY	0.98+
both	QUANTITY	0.98+
two things	QUANTITY	0.97+
Memristor	TITLE	0.96+
Moore	ORGANIZATION	0.96+
Alzheimer	OTHER	0.95+
one	QUANTITY	0.94+
The Machine	TITLE	0.94+
HPE Discover 2017	EVENT	0.94+
one thing	QUANTITY	0.94+
Gen-Z Consortium	ORGANIZATION	0.94+
One	QUANTITY	0.93+
one level	QUANTITY	0.92+
HP Discover	ORGANIZATION	0.92+

John Landry, HP - Spark Summit East 2017 - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> Live from Boston, Massachusetts, this is the CUBE, covering Spark Summit East 2017 brought to you by databricks. Now, here are your hosts Dave Valante and George Gilbert. >> Welcome back to Boston everyone. It's snowing like crazy outside, it's a cold mid-winter day here in Boston but we're here with the CUBE, the world-wide leader in tech coverage. We are live covering Spark Summit. This is wall to wall coverage, this is our second day here. John Landry with us, he's the distinguished technologist for HP's personal systems data science group within Hewlett Packard. John, welcome. >> Thank you very much for having me here. >> So I was saying, I was joking, we do a lot of shows with HPE, it's nice to have HP back on the CUBE, it's been awhile. But I want to start there. The company split up just over a year ago and it's seemingly been successful for both sides but you were describing to us that you've gone through an IT transformation of sorts within HP. Can you describe that? >> In the past, we were basically a data warehousing type of approach with reporting and what have you coming out of data warehouses, using Vertica, but recently, we made an investment into more of a programming platform for analytics and so where transformation to the cloud is about that where we're basically instead of investing into our own data centers because really, with the split, our data centers went with Hewlett Packard Enterprise, is that we're building our software platform in the cloud and that software platform includes analytics and in this case, we're building big data on top of Spark and so that transformation is huge for us, but it's also enabled us to move a lot faster, the velocity of our business and to be able to match up to that better. Like I said, it's mainly around the software development really more than anything else. >> Describe your role in a little bit more detail inside of HP. >> My role is I'm the leader in our big data investments and so I've been leading teams internally and also collaborating across HP with our print group and what we've done is we've managed to put together a strategy around our cloud-based solution to that. One of the things that was important was we had a common platform because when you put a program platform in place, if it's not common, then we can't collaborate. Our investment could be fractured, we could have a lot of side little efforts going on and what have you so my role is to pry the leadership in the direction for that and also one of the reasons I'm here today is to get involved in the Spark community because our investment is in Spark so that's another part of my role is to get involved with the industry and to be able to connect with the experts in the industry so we can leverage off of that because we don't have that expertise internally. >> What are the strategic and tactical objectives of your analytics initiatives? Is it to get better predictive maintenance on your devices? Is it to create new services for customers? Can you describe that? >> It's two-fold, internal and external so internally, we got millions of dollars of opportunity to better our products with cost, also to optimize our business models and the way we can do that is by using the data that comes back from our products, our services, our customers, combining that together and creating models around that that are then automated and can be turned into apps that can be used internally by our organizations. The second part is to take the same approach, same data, but apply that back towards our customers and so with the split, our enterprise services group also went with Hewlett Packard Enterprise and so now, we have a dedicated effort towards creating manage services for the commercial environment. And that's both on the print size and on the personal system side so to basically fuel that, analytics is a big part of the story. So we've had different things that you'll see out there like touch point manager is one of our services we're delivering in personal systems. >> Dave: What is that? >> Touch point manager is aimed at providing management services for SMB and for commercial environments. So for instance, in touch point manager, we can provide predictive type of capabilities for support. A number of different services that companies are looking for when they buy our products. Another thing we're going after too is device as a service. So there's another thing that we've announced recently that basically we're invested into there and so this is obviously if you're delivering devices as a service, you want to do that as optimal as possible. Well, being able to understand the devices, what's happening with them, been able to predictive support on them, been able to optimize the usage of those devices, that's all important. >> Dave: A lot of data. >> The data really helps us out, right? So the data that we can collect back from our devices and to be able to take that and turn that around into applications that are delivering information inside or outside is huge for us, a huge opportunity. >> It's interesting where you talk about internal initiatives and manage services, which sound like they're most external, but on the internal ones, you were talking about taking customer data and internal data and turning those into live models. Can you elaborate on that? >> Sure, I can give you a great example is on our mobile products, they all have batteries. All of our batteries are instrumented as smart batteries and that's an industry standard but HP actually goes a step further on that, it's the information that we put into our batteries. So by monitoring those batteries and the usage in the field is we can tell how optimally they're performing, but also how they're being used and how we can better design batteries going forward. So in addition, we can actually provide information back into our supply chain. For instance, there's a cell supplier for the battery, there's a pack supplier, there's our unit manufacturer for the product, and so a lot of things that we've been able to uncover is that we can go and improve process. And so improving process alone helps to improve the quality of what we deliver and the quality of the experience to our customers. So that's one example of just using the data, turning that around into a model. >> Is there an advantage to having such high volume, such market share in getting not just more data, but sort of more of the bell curve, so you get the edge conditions? >> Absolutely, it's really interesting because when we started out on this, everybody's used to doing reporting which is absolute numbers and how much did you shift and all that kind of stuff. But, we're doing big data, right? So in big data, you just need a good sample population. Turn the data scientist into that and they've got their statistical algorithms against that. They give you the confidence factor based upon the data that you have so it's absolutely a good factor for us because we don't have to see all the platforms out there. Then, the other thing is, when you look at populations, we see variances in different customers so we're looking at, like one of our populations that's very valuable to us is our own, so we take the 60 thousand units that we have internally at HP and that's one of our sample populations. What a better way to get information on your own products? But, you take that and you take it to one of our other customers and their population's going to look slight different. Why? Because they use the products differently. So one of the things is just usage of the products, the environment they're used in, how they use them. Our sample populations are great in that respect. Of course, the other thing is, very important to point out, we only collect data under the rules and regulations that are out there, so we absolutely follow that and we absolutely keep our data secure and we absolutely keep everything and that's important. Sometimes, today they get a little bit spooked sometimes around that, but the case is that our services are provided based on customers signing up for them. >> I'm guessing you don't collect more data than Google. >> No, we're nowhere near Google. >> So, if you're not spooked at Google - >> That's what I tell people. I say if you got a smartphone, you're giving up a lot more data than we're collecting. >> Buy something from Amazon. Spark, where does Spark fit into all of this? >> Spark is great because we needed a programming platform that could scale in our data centers and in our previous approaches, we didn't have a programming platform. We started with a Hadoop, the Hadoop was very complex though. It really gets down to the hardware and you're programming and trying to distribute that load and getting clusters and you pick up Spark and immediately abstraction. The other thing is it allows me to hire people that can actually program on top of it. I don't have to get someone that knows Map Reduce. I can sit there and it's like what do you know? You know R, Scala, you know Python, it doesn't matter. I can run all of that on top of it. So that's huge for us. The other thing is flat out the speed because as you start getting going with this, we get this pull all of a sudden. It's like well I only need the data like once a month, it's like I need it once a week, I need it once a day, I need the output of this by the hour now. So, the scale and the speed of that is huge and then when you put that on the cloud platform, you know, Spark on a cloud platform like Amazon, now I've got access to all the compute instances. I can scale that, I can optimize it because I don't always need all the power. The flexibility of Spark and being able to deliver that is huge for our success. >> So, I've got to ask some columbo questions and George, maybe you can help me sort of frame it. So you mentioned you were using Hadoop. Like a lot of Hadoop practitioners, you found it very complex. Now, Hewlett Packard has resources. Many companies don't but so you mentioned people out doing Python and R and Scale and Map Reduce, are you basically saying okay, we're going to unify portions of our Hadoop complexity with Spark and that's going to simplify our efforts? >> No, what we actually did was we started on the Hadoop side of it. The first thing we did was try to move from a data warehouse to more of a data lake approach or repository and that was internal, right? >> Dave: And that was a cost reduction? >> That was a cost reduction but also, data accessibility. >> Dave: Yeah, okay. >> The other thing we did was ingesting the data. When you're starting to bring data in from millions of devices, we had a problem coming through the firewall type approach and you got to have something in front of that like a Kafka or something in front of it that can handle it. So when we moved to the cloud, we didn't even try to put up our own, we just used Kinesis and that we didn't have to spend any resources to go solve that problem. Well, the next thing was, when we got the data, you need to ingest the data in and our data's coming in, we want to split it out, we needed to clean it and what you, we actually started out running Java and then we ran Java on top of Hadoop, but then we came across Spark and we said that's it. For us to go to the next step of actually really get into Hadoop, we were going to have to get some more skills and to find the skills to actually program in Hadoop was going to be complex. And to train them organically was going to be complex. We got a lot of smart people, but- >> Dave: You got a lot of stuff to do, too. >> That's the thing, we wanted to spend more time getting information out of the data as opposed to the framework of getting it to run and everything. >> Dave: Okay, so there's a lot of questions coming out. You mentioned Kinesis, so you've replaced that? >> Yeah, when we went to the cloud, we used as many Amazon services as we can as opposed to growing something for ourselves so when we get onto Amazon, you know, getting data into an S3 bucket through Kineses was a no-brainer. When we transferred over to the cloud, it took us less than 30 days to point our devices at Kinesis and we had all our data flowing into S3. So that was like wow, let's go do something else. >> So I got to ask you something else. Again, I love when practitioners come on. So, one of the complaints that I hear sometimes from AWS users and I wonder if you see this is the data pipeline is getting more and more complex. I got an API for Kinesis, one for S3, one for DynamoDB, one for Elastic Plus. There must be 15 proprietary APIs that are primitive, and again, it gets complicated and sometimes it's hard to even figure out what's the right cost model to use. Is that increasingly becoming more complex or is it just so much simpler than what you had before and you're in nirvana right now? >> When you mentioned costs, just the cost of moving to the cloud was a major cost reduction for us. >> Reduction? >> So now it's - >> You had that HP corporate tax on you before - >> Yeah, now we're going from data centers and software licenses. >> So that was a big win for you? >> Yeah, huge, and that released us up to go spend dollars on resources to focus on the data science aspect. So when we start looking at it, we continually optimized, don't get me wrong. But, the point is, if we can bring it up real quickly, that's going to save us a lot of money even if you don't have to maintain it. So we want to focus on creating the code inside of Spark that's actually doing the real work as opposed to the infrastructure. So that cost savings was huge. Now, when you look at it over time, we could've over analyzed that and everything else, but what we did was we used a rapid prototyping approach and then from there, we continued to optimize. So what's really good about the cloud is you can predict the cost and with internal data centers and software licenses and everything else, you can't predict the cost because everybody's trying to figure out who's paying for what. But in the case of the cloud, it's all pretty much you get your bill and you understand what you're paying. So anyway - >> And then you can adjust accordingly? >> We continue to optimize so we use the services but if we have for some reason, it's going to deliver us an advantage, we'll go develop it. But right now, our advantage is we got umteen opportunities to create AI type code and applications to basically automate these services, we don't even have enough resources to do it right now. But, the common programming platform's going to help us. >> Can you drill into those umpteen examples? Just some of them because - >> I mentioned the battery one for instance. So take that across the whole system so now you've got your storage devices, you've got your software that's running on there, we've got built into our system security monitoring at the firmware level just basically connecting into that and adding AI around that is huge because now we can see a tax that may be happening upon your fleet and we can create services out of that. Anything that you can automate around that is money in our pocket or money in our customers' pocket so if we can save them money with these new services, they're going to be more willing to come to HP for products. >> It's actually more than just automation because it's the stuff you couldn't do with 1,000 monkeys trying to write Shakespeare. You have data that you could not get before. >> You're right, what we're doing, the automation is helping us uncover things that we would've never seen and you're right, the whole gorilla walking through the room, I could sit there and I could show you tons of examples of where we're missing the boat. Even when we brought up our first data sets, we started looking at them and some of the stuff we looked at, we thought this is just bad data and actually it wasn't, it was bad product. >> People talk about dark data - >> We had no data models, we had no data model to say is it good or bad? And now we have data models and we're continuing to create those data models around, you create the data model and then you can continue to teach it and that's where we create the apps around it. Our primitives are the data models that we're creating from the device data that we have. >> Are there some of these apps where some of the intelligence lives on the device and it can, like in a security attack, it's a big surface area, you want to lock it down right away. >> We do. The good example on the security is we built something into our products called Sure Start. What essentially it is is we have ability to monitor the firmware layer and so there's a local process that's running independent of everything else that's running that's monitoring what's happening at that firmware level. Well, if there's an attack, it's going to immediately prevent the attack or recover from the attack. Well, that's built into the product. >> But it has to have a model of what this anomalous behavior is. >> Well in our case, we're monitoring what the firmware should look like and if we see that the firmware, you know you take check sums from the firmware or the pattern - >> So the firmware does not change? >> Well basically we can take the characteristics of the firmware and monitor it. If we see that changing, then we know something's wrong. Now it can get corrupt through hardware failure maybe because glitches can happen maybe. I mean solar flares can cause problems sometimes. So, the point is we've found that customers had problems sometimes where basically their firmware would get corrupted and they couldn't start their system. So we're like are we getting attacked? Is this a hardware issue? Could it be bad Flash devices? There's always all kinds of things that could cause that. Well now we monitor it and we know what's going on. Now, the other cool thing is we create logs from that so when those events occur, we can collect those logs and we're monitoring those events so now we can have something monitor the logs that are monitoring all the units. So, if you've got millions of units out there, how are you going to do that manually? You can't and that's where the automation comes in. >> So the logs give you the ability up in the cloud or at HP to look at the ecosystem of devices, but there is intelligence down on the - >> There's intelligence to protect the device in an auto recover which is really cool. So in the past, you had to get your repair. Imagine if someone attacked your fleet of notebooks. Say you got 10 thousand of them and basically it brought every single one of them down one day. What would you do? >> Dave: Freak. >> And everything you got to replace. It was just an attack and it could happen so we basically protect against that with our products and at the same time, we can see that may be a current and then from the footprints of it, we can then do analysis on it and determine was that malicious, is this happening because of a hardware issue, is this happening because maybe we tried to update the firmware and something happened there? What caused that to happen? And so that's where collecting the data from the population then helps us do that and then mix that with other things like service events. Are we seeing service events being driven by this? Thermal, we can look at the thermal data. Maybe there's some kind of heat issue that's causing this to happen. So we starting mixing that. >> Did Samsung come calling to buy this? >> Well, actually what's funny is Samsung is actually a supplier of ours, is a battery supplier of ours. So, by monitoring the batteries, what's interesting is we're helping them out because we go back to them. One of the things I'm working on, is we want to create apps that can go back to them and they can see the performance of their product that they're delivering to us. So instead of us having to call a meeting and saying hey guys let's talk about this, we've got some problems here. Imagine how much time that takes. But if they can self-monitor, then they're going to want to keep supplying to us, then they're going to better their product. >> That's huge. What a productivity boost because you're like hey, we got a problem, let's meet and talk about it and then you take an action to go and figure out what it is. Now if you need a meeting, it's like let's look at the data. >> Yeah, you don't have enough people. >> But there's also potentially a shift in pricing power. I would imagine it shifts a little more in your favor if you have all the data that indicates the quality of their product. >> That's an interesting thing. I don't know that we've reached that point. I think that in the future, it would be something that could be included in the contracts. The fact that the world is the way it is today and data is a big part of that to where going forward, absolutely, the fact that you have that data helps you to better have a relationship with your suppliers. >> And your customers, I mean it used to be that the brand used to have all the information. The internet obviously changed all that, but this whole digital transformation and IOT and all those log data, that sort of levels the playing field back to the brand. >> John: It actually changes it. >> You can now add value for the consumer that you couldn't before. >> And that's what HP's trying to do. We're invested to exactly do that is to really improve or increase the value of our brand. We have a strong brand today but - >> What do you guys do with - we got to wrap - but what do you do with databricks? What's the relationship there? >> Databricks, again we decided that we didn't want to be the experts on managing the whole Spark thing. The other part was that we're going to be involved with Spark and help them drive the direction as far as our use cases and what have you. Databricks and Spark go hand in hand. They got the experts there and it's been huge, our relationship, being able to work with these guys. But I recognize the fact that, and going back to software development and everything else, we don't want to spare resources on that. We got too many other things to do and the less that I have to worry about my Spark code running and scaling and the cost of it and being able to put code in production, the better and so, having that layer there is saving us a ton of money and resources and a ton of time. Just imagine time to market, it's just huge. >> Alright, John, sorry we got to wrap. Awesome having you on, thanks for sharing your story. >> It's great to talk to you guys. >> Alright, keep it right there everybody. We'll be back with our next guest. This is the CUBE live from Spark Summit East, we'll be right back.

Published Date : Feb 9 2017

SUMMARY :

brought to you by databricks. the world-wide leader in tech coverage. we do a lot of shows with HPE, In the past, we were basically a data warehousing bit more detail inside of HP. One of the things that was important was we had a common the way we can do that is by using the data we can provide predictive type of capabilities for support. So the data that we can collect back from our devices It's interesting where you talk about internal and the quality of the experience to our customers. Then, the other thing is, when you look at populations, I say if you got a smartphone, you're giving up Spark, where does Spark fit into all of this? and then when you put that on the cloud platform, and that's going to simplify our efforts? and that was internal, right? and to find the skills to actually program That's the thing, we wanted to spend more time Dave: Okay, so there's a lot of questions coming out. so when we get onto Amazon, you know, getting data into So I got to ask you something else. of moving to the cloud was a major cost reduction for us. Yeah, now we're going from But, the point is, if we can bring it up real quickly, We continue to optimize so we use the services So take that across the whole system because it's the stuff you couldn't do with that we would've never seen and you're right, And now we have data models and we're continuing intelligence lives on the device and it can, The good example on the security is we built But it has to have a model of what Now, the other cool thing is we create logs from that So in the past, you had to get your repair. and at the same time, we can see that may be a current of their product that they're delivering to us. and then you take an action to go if you have all the data that indicates and data is a big part of that to where the playing field back to the brand. that you couldn't before. is to really improve or increase the value of our brand. and the less that I have to worry about Alright, John, sorry we got to wrap. This is the CUBE live from Spark Summit East,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Valante	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
HP	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
John Landry	PERSON	0.99+
Hewlett Packard	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
10 thousand	QUANTITY	0.99+
Java	TITLE	0.99+
Google	ORGANIZATION	0.99+
Samsung	ORGANIZATION	0.99+
Spark	ORGANIZATION	0.99+
second day	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
60 thousand units	QUANTITY	0.99+
Python	TITLE	0.99+
Hadoop	TITLE	0.99+
less than 30 days	QUANTITY	0.99+
millions of dollars	QUANTITY	0.99+
today	DATE	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
once a month	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
both sides	QUANTITY	0.99+
Spark	TITLE	0.99+
1,000 monkeys	QUANTITY	0.99+
one	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
once a week	QUANTITY	0.98+
once a day	QUANTITY	0.98+
15 proprietary APIs	QUANTITY	0.98+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one day	QUANTITY	0.98+
Map Reduce	TITLE	0.97+
Spark Summit East 2017	EVENT	0.97+
first data sets	QUANTITY	0.97+
two-fold	QUANTITY	0.97+
Spark Summit	EVENT	0.96+
R	TITLE	0.96+
a ton	QUANTITY	0.95+
millions of units	QUANTITY	0.95+
Scale	TITLE	0.95+
Kafka	TITLE	0.94+
Shakespeare	PERSON	0.94+
S3	TITLE	0.94+

Vish Mulchand, HP Storage | VMworld 2015

vmworld 2015 brought to you by VMware and its ecosystem sponsors and now your host dave vellante welcome back to San Francisco everybody this is the cue the cube is SiliconANGLE wiki bonds continuous coverage of vmworld 2015 this is our sixth year at vmworld we go out to the events we extract the signal from the noise our friend vish mulchen is here with HP storage fish it's always good to see you you know in your hometown and of in the backyard it's great to be in moscone thanks for coming back on the cube thanks Dave great great to be here as always again so we have seen you know I go back to 2010 and at the time you know 3par was a separate company and then we watched the the acquisition occur I'm really badly needed that acquisition was we were at vmworld when the the bidding war was occurring between Dell and yeah that's right and we predicted HP he's going to win that war and of course that changed change the course of the storage at HP you know permanently yes so it's been an amazing run three pars become the crown jewel of the of the portfolio but the most amazing thing is how you've evolved that platform into play into the all flash world very very competitive product so so we've been sort of documenting that that traction but give us the update let's talk about sort of where you've come from and you know where we are today sure sure Dave so I mean if we look back in june 2013 when we first announced the AFA right and since jun 2013 we've had a fire series of announcements in in december we announce something called adaptive sparing which you know was actually very unique flash innovation treating the flash separately giving customers twenty percent more capacity in jun 2014 we brought two dollars per gig deduplication in december 2014 we brought the constable we call the converge flash array right flasher a flash focus design but hey you can add spinning this to it if you want right and several of our customers are actually doing that because they have a need for that and then in june of 2015 we double down right and we announced the 20000 series we brought the affordability even better to a dollar fifty a gig and that was in June so the other amazing thing is the pace of the cadence of announcements I mean I had to say I mean remember for years you know HP the announcements were very slow to come up maybe have one a year maybe you know maybe a name change but now it's like bang bang bang I presume it to the architecture that allows you to do that but a lot of skeptics when you came out yes with the all-flash right yeah it's going to be a bolt on you said no you know NASA died we'll see but now you're proving it why help the people who sort of don't understand the nuances how was it that you were able to do that and what are the proof points that it's not just a bolt on right so you know I think the it all comes down to the architecture right you have to have an architecture that's modular that extensible and you know as we looked at the three Power Architecture all the attributes that we put in place early on we're very applicable to flash now flash did have some differences and we did account for some of the differences in the architecture but the architecture proved to be able to be extensible and a lot of the tenants around scalable controllers for performance the ASIC to offload the fine-grained virtualized operating system with a very small page allocation size all of those fundamentals were perfectly suited for flash right and and you can almost probably say they were there were too much for spinning disk right why was to say was that just was that luck because a bit but of a lot of what the original designers a three-part did were trying to most of it was trying to offset the deficiencies of spinning disk yeah you know did they just have like amazing vision or was it just I give I give the founders a lot of credit for their foresight and in fact if you look at the founders and I spoke to them they were they had a server background and they started right and they said its own server guess I'm sorry guys say they said to me wish when we did a server benchmark it would take us six months four and a half of those six months was getting the storage right and they said they really don't understand why it had to be so hard right and I think they've brought a very different approach to storage to how sort of the industry was handling storage right it was it was very different it actually turned it on his head and they are actually architected some very interesting capabilities which you know I'm very confident as we go to flash 2 point 0 as we talk about other newer non-volatile memory technologies if nan something other than man comes about you know I'm very confident that the architecture will be able to gehen to isolate the media from the customer Martin fake hope said that's member stare but we'll see we'll see what whatever gentleman sorry you know we'll go about the industry members has a big element there but we'll go what the industry wants to look at of course so let's talk about vmworld 2015 what you guys are doing here you know sure emphasize the announcements that you're making talk about that a little bit sure so in vmworld we had several announcements we made what i'll focus in is on the flash announcements and you know if you look at the approach we've taken with flash we've had three vectors right affordability performance and data services and some companies have done one or two but i think it's rare to see all three vectors being attack of the same time and that's been our approach from the start and all the announcements we talked about and in this announcement that we made this week same approach so let's let's maybe go down those three vectors Dave if you allow me to yeah please okay so so let's start with affordability and we announced a new 8000 series which is a refresh to the 7000 line right a very successful 7000 line of which is 7450 flash arrays one of them now the starting point for the 8200 the old flash 8200 now is down to nineteen thousand four hundred ninety seven dollars two controllers six drives six terabyte usable capacity 19,000 999 we're under 20 grand by a lot we make sure you got that 497 right so that's great we also announced then a lower entry price point to the 20000 series that we announced in earlier in June those were as your call aight controller systems we announced a lower price point 2450 a 4 controller capable system as well again on the theme of bringing affordability right driving the price down okay so you have dollar fifty per gig if you want to buy that way if low entry price point with 19 k if you want to buy that way or if you want a scalable system that you can grow to the extreme you can buy affordable price point that way as well right so in my mind the the adoption the success we've had in the marketplace has been a function of a couple of things affordability is a key one right it's economics that's what drives adoption okay now your performance everything's okay let's flash over he's got the same performance is high performance now it's somewhat true because relative to spinning disk it's gonna be you know better performance but there's it's nuanced so talk about your performance yeah so performance is very important we announced a couple of interesting performance first we talked about some some improvements in bandwidth now let's take a look at sort of why that matters right Dave so if you were doing a million iOS and there were small 4k blocks do the math it's four gigabytes per second now if you're doing large block iOS like if you're doing a sequel database query analytical query those are typically large block ayos right we do a million of those and there Sarah megan size then that bandwidth becomes a choke point to the array so we've announced with with the 8000 series you know twenty four gigs the second of bandwidth which is two and a half times more than so but this is ever saw it started erupting but this is why a lot of the existing arrays that bolted on flash failed what yeah so one of the reasons why they fail is their controllers are not able to handle the IO load and once even if they do can they handle the bandwidth requirements and then you know here's the other thing that matters is the latency right so the other thing we announced at the this week was a forty-four percent improvement in latency soumillon I ops 387 microseconds latency Adam denials that's just low latency so you're setting up this little latency storage versus capacity storage right and you got you playing both but we're obviously talking about the latency piece here okay correct so that's the performance piece and then there's there's there's actually two more there's the availability which answer this well free part is known for high availability and it's the new tier one yeah yeah so and it but there's data services associated with that yeah so the resiliency is a big factor there and you know there's single system resiliency pull out a drive pull out a controller fail a cache board how do you react right in fact the reason why we succeed in the marketplace that our customers tell us is that reliability factor and they go and they have these tests where they pull things in and out right and they watch how the other arrays operate right and you know consistently we've come back really operating well in the area of single system resiliency now there's also a multi-system resiliency which is what do I do with replication what I do with snapshots can I move my snapshots to addy duplicating backup device all right how quickly can I move how much do I move so I think there are all of these elements that you look at resiliency that I think important that's another piece and resiliency that's coming up as well emerging Dave and that's around protecting the access to your data security do you encrypt the data so now if you encrypt the data and you have a snapshot and you move that snapshot to a duplicating device what happens to that snapshot and the key do you have to a multiple key so your keys get compromised so that resiliency topic is a big one lots of different areas to go off go after and whether it's replications snapshots backup devices encryption key managers we have all those elevators well how about so again one of the we always talked about this one of the big advantages of an architecture that's been around for a decade is is you've got the stack it's hardened you know that sets the storage services so that's that's a big differentiator from what you see in a lot of the startups yes and and or the bolt ons which everybody thought you're going to be both on baby architected the whole thing so that's cool what about quality of service what about the ability to sort of address quality of service to pin application performance and to actually change that programmatically yeah so quality of service is a very very big big attribute of ours in fact week the product for full HP three parts called priority optimization and in this week's announcement we announce further enhancements first of all we have latency goals on our queue as product which i think is unique nobody else offers latency goals and this week we announced the latency goals going down to half a millisecond I mean if that array is operating at you know three to four hundred microseconds you want to be able to control your priorities with that granularity right and so qos granularity is exactly what we brought and you know Dave Lee if you remember when we did the last cube we talked to the cloud and they they had taken a gold silver bronze tier hardware-based and then put her on a flash array and put priority optimization to implement in software the gold silver bronze right yeah the cloud is a company music louder company yeah so that's right and that was interesting to see that they did that with with flash right you know yeah exactly yeah what do you think is going to happen there right is a worship we're hearing increasingly it shows like this and others that that you're starting to see more tearing and flash you're hearing it now in in the hadoop world and big data world the example that you just gave a lot of people initially and maybe still think you're going to have flashed here in the latency tier and you're going to have the capacity to air the bit bucket what's yours what you're thinking now on how that shakes shakes out and how practitioners should be thinking about their storage architectures going forward and I think you were gonna see the variety of that I think that's one very possible use case which says hey I have a applications that are critical service optimized service level optimized right that got to be on flash and then I may have either a backing store for time or I might have another set of applications that are not service level optimized more cost optimized may be right so and maybe that changes over time maybe it changes by quarter what is cost optimized today needs a spike and come back so this notion of data mobility I think it's very key right and sort of the fourth data service pillar I want to talk about because we announced for wave Federation which is the ability to take for arrays and operate as a single logical hole and you can federated Atta among those arrays now but if you extend the ideas can you federated to a backup device can you fed rate it to the generic cloud right can you federated to an archived here I think these are the kinds of things that our customers are asked that's right they want a first of all federal rate to another array to work load balance for example Oh asset refresh right but all of the other use cases federated a cloud federated to archived here those are all coming up alright so I suspect we're gonna see more of those as I said can I and let's stay tuned state-owned you know I mean as we as we look to to raise the bar once again these are some of the things what we're thinking about all right so so I know you can't give details but give us high level road map what should we be thinking about watching you know HP generally 3par and all flesh specifically yeah so I think we'll continue to drive a affordability right 3d and 3d Nance available as well now there's other flash technologies and you know we want to isolate our customers from whether it's CML CML see 3d 9 i'm gonna say to them look what's your price point what's your capacity point what's your availability point ok and we'll meet that let us worry about that technology problem out there how we get there so that continues to work us on you know the media faster controllers again to drive up the drive of the performance hosting the connects you know there's a lot of talk around the role of 25 gig Ethernet 32 gig fibre channel the RDMA technologies right I sir are I war rocky so there's all these things here nvm e to the backplane nvme to the host so you know flash j-bot so look at yeah it's shit we're shifting the bottleneck are we are you going to look at the bottlenecks across all areas into n and make sure that you're looking at this holistically right as you drive as you drive forward doesn't get less complicated but at least for the for the for the guys who are building this stuff hopefully for I we who are using it it does but fish motion thanks very much baby greater pleasure always pleasure sir I keep right there everybody will be back with our next guest this is the cube we're live from vmworld 2015 in moscone we'll be right back you

Published Date : Sep 2 2015

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
december 2014	DATE	0.99+
nineteen thousand	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
june 2013	DATE	0.99+
forty-four percent	QUANTITY	0.99+
twenty percent	QUANTITY	0.99+
2010	DATE	0.99+
Dave	PERSON	0.99+
June	DATE	0.99+
VMware	ORGANIZATION	0.99+
Dave Lee	PERSON	0.99+
jun 2014	DATE	0.99+
jun 2013	DATE	0.99+
19 k	QUANTITY	0.99+
one	QUANTITY	0.99+
december	DATE	0.99+
sixth year	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
two	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
june of 2015	DATE	0.99+
twenty four gigs	QUANTITY	0.99+
iOS	TITLE	0.99+
three	QUANTITY	0.99+
half a millisecond	QUANTITY	0.99+
three vectors	QUANTITY	0.99+
387 microseconds	QUANTITY	0.99+
dave vellante	PERSON	0.99+
NASA	ORGANIZATION	0.99+
Vish Mulchand	PERSON	0.99+
a million	QUANTITY	0.99+
25 gig	QUANTITY	0.98+
32 gig	QUANTITY	0.98+
this week	DATE	0.98+
four hundred microseconds	QUANTITY	0.98+
two and a half times	QUANTITY	0.98+
four and a half	QUANTITY	0.98+
both	QUANTITY	0.98+
vmworld	ORGANIZATION	0.98+
this week	DATE	0.97+
six terabyte	QUANTITY	0.97+
vish mulchen	PERSON	0.97+
today	DATE	0.97+
fifty a gig	QUANTITY	0.97+
Adam	PERSON	0.96+
six drives	QUANTITY	0.96+
two controllers	QUANTITY	0.95+
under 20 grand	QUANTITY	0.95+
a million	QUANTITY	0.95+
three-part	QUANTITY	0.95+
one a year	QUANTITY	0.93+
first	QUANTITY	0.92+
dollar fifty per gig	QUANTITY	0.92+
four hundred ninety seven dollars	QUANTITY	0.92+
single	QUANTITY	0.92+
three vectors	QUANTITY	0.91+
2015	DATE	0.91+
call aight	ORGANIZATION	0.91+
two more	QUANTITY	0.9+
19,000 999	QUANTITY	0.89+
AFA	ORGANIZATION	0.89+
a lot of people	QUANTITY	0.88+
moscone	LOCATION	0.88+
tier one	QUANTITY	0.87+
Sarah megan	PERSON	0.87+
Nance	ORGANIZATION	0.86+
three parts	QUANTITY	0.85+
3par	ORGANIZATION	0.84+
two dollars per gig	QUANTITY	0.84+
four gigabytes per second	QUANTITY	0.84+
vmworld	EVENT	0.83+
HP Storage	ORGANIZATION	0.83+
fourth data service	QUANTITY	0.81+
8000 series	COMMERCIAL_ITEM	0.8+

Stephanie McReynolds - HP Big Data 2015 - theCUBE

live from Boston Massachusetts extracting the signal from the noise it's the kue covering HP big data conference 2015 brought to you by HP software now your host John furrier and Dave vellante okay welcome back everyone we are here live in boston massachusetts for HP's big data conference this is a special presentation of the cube our flagship program where we go out to the events and extract the season for the noise I'm John furrier with Dave allante here Wikibon down on research our next guest Stephanie McReynolds VP margon elation hot new startup that's been kind of coming out of stealth that's out there big data a lot of great stuff Stephanie welcome to the cube great see you great to be here tell us what the start at first of all because good buzz going on it's kind of stealth buzz but it's really with the fought leaders and really the you know the people in the industry who know what they're talking about like what you guys are doing so so introduce the company tells me you guys are doing and relationship with Vertica and exciting stuff absolutely a lesion is a exciting company we just started to come out of south in March of this year and we came out of self with some great production customers so eBay is a customer they have hundreds of analysts using our systems we also have square as a customer smaller analytics team but the value that you Neelix teams are getting out of this product is really being able to access their data in human context so we do some machine learning to look at how individuals are using data in an organization and take that machine learning and also gather some of the human insights about how that data is being used by experts surface that all in line with in work so what kind of data cuz Stonebreaker was kind of talking yesterday about the 3 v's which we all know but the one that's really coming mainstream in terms of a problem space is variety variety you have the different variety of schema sources and then you have a lot of unstructured exhaust or data flying around can you be specific on what you guys do yeah I mean it's interesting because there's several definitions of data and big data going around right and so I'm you know we connect to a lot of database systems and we also connect to a lot of Hadoop implementations so we deal with both structured data as well as what i consider unstructured data and i think the third part of what we do is bring in context from human created data or cumin information with which robert yesterday was talking about a little bit which is you know what happens in a lot of analytic organizations is that and there's a very manual process of documenting some of the data that's being used in these projects and that's done on wiki pages or spreadsheets that are floating around the organization and that's actually a really black base camp all these collaboration all these collaboration platforms and what you realize when you start to really get into the work of using that information to try to write your queries is that trying to reference a wiki page and then write your sequel and flip back and forth between maybe ten different documents is not very productive for the analyst so what our customers are seeing is that by consolidating all of that data and information in one place where the tables are actually reference side by side with the annotations their analysts can get from twenty to fifty percent savings and productivity and new analysts maybe more importantly new analyst can get up to speed quite a bit quicker and that square the day I was talking to one of the the data scientists and he was was talking about you know his process for finding data in the organization which prior to using elation it would take about 30 minutes going two maybe three or four people to find the data he needed for his analysis and with elation in five seconds he can run a query search for the date he wants gets it back gets all kind of all that expert annotation already around that base data said he's ready to roll he can start I'm testing some of us akashi go platform right they've heard it was it a platform and it and you said you work with a lot of database the databases right so it's tightly integrated with the database in this use case so it's interesting and you know we see databases as a source of information so we don't create copies of the data on our platform we go out and point to the data where it lies and surface that you know that data to to the end user now in the case of verdict on our relationship with Vertica and we've also integrated verdict in our stack to support we call data forensics which is the building for not an analyst who's using the system day to day but for NIT individual to understand where the behaviors around this data and the types of analysis that are being done and so verdicts a great high performance platform for dashboarding and business intelligence a back end of that providing you know quick access to aggregates so one of they will work on a vertica you guys just the engine what specifically again yeah so so we use the the vertica the vertical engine underneath our forensics product and then the that's you know one portion of our platform the rest of our platform is built out on other other technologies so verdict is part of your solution it's part of our solution it's it's one application that we part of one application we deliver so we've been talking all week about this week Colin Mahoney in his talk yesterday and I saw a pretty little history on erp how initially was highly customized and became packaged apps and he sort of pointed to a similar track with analytics although he said it's not going to be the same it's going to be more composable sort of applications I wonder and historically the analytics in the database have been closely aligned I'll say maybe not integrated you see that model continuing do you see it more packaged apps or will thus what Collins calling composable apps what's the relationship between your platforming and the application yeah so our platform is is really more tooling for those individuals that are building or creating those applications so we're helping data scientists and analysts find what algorithms they want to use as a foundation for those applications so a little bit more on the discovery side where folks are doing a lot of experiment and experimentation they may be having to prepare data in different ways in order to figure out what might work for those applications and that's where we fit in as a vendor and what's your license model and so you know we're on a subscription model we have customers that have data teams in the in the hundreds at a place like eBay you know the smaller implementations could be maybe just teams of five analyst 10a analyst fairly small spatial subscription and it's a seat base subscription but we can run in the cloud we can run on premise and we do some interesting things around securing the data where you can and see your columns bommana at the data sets for financial services organizations and our customers that have security concerns and most of those are on premise top implementation 70 talk about the inspiration of the company in about the company he's been three years since then came out of stealth what's the founders like what's the DNA the company what do you guys do differently and what was the inspiration behind this yeah what's really what's really interesting I think about the founding of the company is that and the technical founders come from both Google and Apple so you have an interesting observation that both individuals had made independently hardcore algorithmic guy and then like relevant clean yeah and both those kind of made interesting observations about how Google and Apple two of the most data-driven companies you know on the planet we're struggling and their analytics teams were struggling with being able to share queries and share data sets and there was a lot of replication of work that was happening and so much for the night you know but both of these folks from different angles kind of came together at adulation said look there's there's a lot of machine learning algorithms that could help with this process and there's also a lot of good ways with natural language processing to let people interact with their data in more natural ways the founder from from Apple Aaron key he was on the Siri team so we had a lot of experience designing products for navigability and ease of use and natural language learning and so those two perspectives coming together have created some technology fundamentals in our product and it's an experience to some scar tissue from large-scale implementations of data yeah very large-scale implementations of data and also a really deep awareness of what the human equation brings to the table so machine learning algorithms aren't enough in and of themselves and I think ken rudin had some interesting comments this morning where you know he kind of pushed it one step further and said it's not just about finding insight data science about is about having impact and you can't have impact unless you create human contacts and you have communication and collaboration around the data so we give analyst a query tool by which we surface the machine learning context that we have about the data that's being used in the organization and what queries have been running that data but we surface in a way where the human can get recommendations about how to improve their their sequel and drive towards impact and then share that understanding with other analysts in the organization so you get an innovation community that's started so who you guys targets let's step back on the page go to market now you guys are launched got some funding can you share the amount or is it private confidential or was how much did you raise who are you targeting what's your go-to market what's the value proposition give us the give us this data yeah so its initial value proposition is just really about analyst productivity that's where we're targeted how can you take your teams of analysts and everyone knows it's hard to hire these days so you're not going to be able to grow those teams out overnight how do you make the analyst the data scientist the phd's you have on staff much more productive how do you take that eighty to ninety percent of the time that they make them using stuff sharing data because I stuff you in the sharing data try to get them out of the TD of trying to just find eight in the organization and prepare it and let them really innovate and and use that to drive value back to the to the organization so we're often selling to individual analysts to analytics teams the go to market starts there and the value proposition really extends much further in the organization so you know you find teams and organizations that have been trying to document their data through traditional data governance means or ETL tools for a very long time and a lot of those projects have stalled out and the way that we crawl systems and use machine learning automation and to automate some of that documentation really gives those projects and new life in our enterprise data has always been elusive I mean do you go back decades structured day to all these pre pre built databases it's been hard right so it's you can crack that nut that's going to be a very lucrative in this opportunity I got the Duke clusters now storing everything I mean some clients we talked to here on the key customers of a CHP or IBM big companies they're storing everything just because they don't know they do it again yeah I mean if the past has been hard in part because we in some cases over manage the modeling of the data and I think what's exciting now about storing all your data in Hadoop and storing first and then asking questions later is you're able to take a more discovery oriented hypothesis testing iterative approach and if you think about how true innovation works you know you build insights on top of one another to get to the big breakthrough concepts and so I think we're at an interesting point in the market for a solution like this that can help with that increasing complexity of data environment so you just raise your series a raised nine million you maybe did some seed round before that so pretty early days for you guys you mentioned natural language processing before one of your founders are you using NLP and in your solution in any way or so we have a we have a search interface that allows you to look for that technical data to look for metadata and for data objects and by entering a simple simple natural language search terms so we are using that as part of our interface in solution right and so kind of early customer successes can you talk about any examples or yeah you know there's some great examples and jointly with Vertica square is as a customer and their analytics team is using us on a day-to-day basis not only to find data sets and the organization but to document those those data sets and eBay has hundreds of analysts that are using elation today in a day to day manner and they've seen quite a bit of productivity out of their new analysts that are coming on the system's it used to take analysts about 18 months to really get their feet around them in the ebay environment because of the complexity of all of the different systems at ebay and understanding where to go for that customer table you know that they needed to use and now analysts are up and running about six months and their data governance team has found that elation has really automated and prioritized the process around documentation for them and so it's a great light a great foundation for them there and data curators and data stewards to go in and rich the data and collaborate more with the analysts and the actual data users to get to a point of catalogued catalog data disease so what's the next you guys going to be on the road in New York Post Radek hadoop world big data NYC is coming up a big event in New York I'm Cuba visa we're getting the word out about elation and then what we're doing we have customers that are you know starting to speak about their use cases and the value that they're seeing and will be in New York market share I believe will be speaking on our behalf there to share their stories and then we're also going to a couple other conferences after that you know the fall is an exciting time which one's your big ones there so i will be at strada in New York and a September early October and then mid-october we're going to be at both teradata partners and tableaus conference as well so we connect not only to databases of all set different sorts but also to go with users are the tools yeah awesome well anything else you'd like to add share at the company is awesome we're some great things about you guys been checking around I'll see you found out about you guys and a lot of people like the company I mean a lot of insiders like moving little see you didn't raise too much cash that's raised lettin that's not the million zillion dollar round I think what led you guys take nine million yeah raised a million and I you know I think we're building this company in a traditional value oriented way great word hey stay long bringing in revenue and trying to balance that out with the venture capital investment it's not that we won't take money but we want to build this company in a very durable so the vision is to build a durable company absolutely absolutely and that may be different than some of our competitors out there these days but that's that we've and I have not taken any financing and SiliconANGLE at all so you know we're getting we believe in that and you might pass up some things but you know what have control and you guys have some good partners so congratulations um final word what's this conference like you go to a lot of events what's your take on this on this event yeah I do i do end up going to a lot of events that's part of the marketing role you know i think what's interesting about this conference is that there are a lot of great conversations that are happening and happening not just from a technology perspective but also between business people and deep thinking about how to innovate and verticals customers i think are some of the most loyal customers i've seen in the in the market so it's great in their advanced to they're talking about some pretty big problems but they're solving it's not like little point solutions it's more we architecting some devops i get a dev I'm good I got trashed on Twitter private messages all last night about me calling this a DevOps show it's not really a DevOps cloud show but there's a DevOps vibe here the people who are working on the solutions I think they're just a real of real vibe people are solving real problems and they're talking about them and they're sharing their opinions and I I think that's you know that's similar to what you see in DevOps the guys with dev ops are in the front line the real engineers their engineering so they have to engineer because of that no pretenders here that's for sure are you talking about it's not a big sales conference right it's a lot of customer content their engineering solutions talking to Peter wants a bullshit they want reaiah I mean I got a lot on the table i'm gonna i'm doing some serious work and i want serious conversations and that's refreshing for us but we love love of hits like it's all right Stephanie thinks for so much come on cubes sharing your insight congratulations good luck with the new startup hot startups here in Boston hear the verdict HP software show will be right back more on the cube after this short break you you

Published Date : Aug 12 2015

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Colin Mahoney	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
twenty	QUANTITY	0.99+
Peter	PERSON	0.99+
eBay	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Boston	LOCATION	0.99+
three	QUANTITY	0.99+
John furrier	PERSON	0.99+
Stephanie	PERSON	0.99+
five seconds	QUANTITY	0.99+
Vertica square	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
ken rudin	PERSON	0.99+
three years	QUANTITY	0.99+
Dave vellante	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
nine million	QUANTITY	0.99+
ninety percent	QUANTITY	0.99+
Dave allante	PERSON	0.99+
yesterday	DATE	0.99+
Cuba	LOCATION	0.99+
both individuals	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Boston Massachusetts	LOCATION	0.99+
ten different documents	QUANTITY	0.99+
two	QUANTITY	0.99+
fifty percent	QUANTITY	0.98+
both	QUANTITY	0.98+
mid-october	DATE	0.98+
HP	ORGANIZATION	0.98+
robert	PERSON	0.98+
one	QUANTITY	0.98+
nine million	QUANTITY	0.98+
about six months	QUANTITY	0.98+
Collins	PERSON	0.97+
2015	DATE	0.97+
Hadoop	TITLE	0.97+
two perspectives	QUANTITY	0.97+
Aaron	PERSON	0.97+
Siri	TITLE	0.97+
four people	QUANTITY	0.97+
eighty	QUANTITY	0.97+
this week	DATE	0.97+
Neelix	ORGANIZATION	0.96+
about 30 minutes	QUANTITY	0.96+
Radek	PERSON	0.95+
CHP	ORGANIZATION	0.95+
HP Big Data	ORGANIZATION	0.95+
one place	QUANTITY	0.95+
about 18 months	QUANTITY	0.95+
March of this year	DATE	0.95+
NYC	LOCATION	0.95+
five	QUANTITY	0.94+
hundreds of analysts	QUANTITY	0.94+
ebay	ORGANIZATION	0.93+
eight	QUANTITY	0.93+
third part	QUANTITY	0.92+
boston massachusetts	LOCATION	0.91+
one application	QUANTITY	0.91+
70	QUANTITY	0.9+
this morning	DATE	0.9+
Twitter	ORGANIZATION	0.89+
today	DATE	0.89+
SiliconANGLE	ORGANIZATION	0.88+
big data	EVENT	0.88+
one step	QUANTITY	0.88+
September early October	DATE	0.87+
last night	DATE	0.84+
lot of events	QUANTITY	0.84+
NLP	ORGANIZATION	0.83+
a million	QUANTITY	0.79+
lot of events	QUANTITY	0.78+
lot	QUANTITY	0.78+

Patrick Osborne & Bill Walker - HP Discover 2015 - theCUBE - #HPDiscover

live from the sands convention center las vegas nevada extracting a signal from the noise it's the cube covering HP discover 2015 brought to you by HP and now your host dave vellante welcome back to HP discover everybody this is dave vellante check out HP discovered on social for all the social streams the video the content the special access patrick osborne is here from HP cube alum and he's joined by Bill Walker of 20th Century Fox gents welcome to the cube good thank you yeah thanks for having us discovering another discover a little different this year Patrick we got Meg talking about business outcomes and absolutely uber and their yeah all the kinds of some shit models are very different I mean obviously you come out here every year for the past number of years and you know it's all about the technology i'm always wowed by the broad you know portfolio that we have but really at the end of the day I think some of the messaging to the customers is you know we're here to help you solve your problems and parts of that is technology part of its services so this hot sort of new high-level messaging around transformation and helping people achieve these business outcomes I think it's a good fresh start yes so bill your business going through some interesting transformations yes today to talk about the high level the drivers in your business the yet new competitors you got you got huge opportunities to go into this digital transformation you've sort of early on in that so maybe talk about some of the drivers in your business sure absolutely I think for us you know you really hit the nail on the head in the sense that it's really been about the physical to digital transformation that the industry is you know kind of going through and also Fox is and you know on the infrastructure side and the IT side we're trying to support that you know as best we can and you know the name of the game lately has been speed to market right so we partnered very tightly with HP on not only the hardware but the software side in you know building out kind of a brand new digital supply chain environment in Las Vegas actually right here and one of our major data centers where we deliver all of our digital content to all of our providers so EST VOD providers like amazon and itunes as well as you know major broadcasters so you've got a facility out here that is essentially your your cloud yes yes we do that that's our primary place where we deliver everything out of its it's great we're using all HP hardware and software there so we're customers across the board in the sense that we have blades we've got three par we're using store once as well as the HP software stack like cloud system on top of that that part is part of a super nap yes yes so yeah we we're facility we're in super net we love it it's a great facility we moved there like a little over two years ago and it's it's been awesome experience that made it into the any of the movies no it must hey I know well they it's impressive on the outside and the inside right yeah was it the old member the robocop oh yeah they have that storagetek tape library way magnet yeah they were great at four days these impressive data centers look amazing so so talk a little bit more about the you called it the digital supply chain that's a powerful concept what's behind that yeah so you know we we've obviously been in the physical supply chain business for a while on the home entertainment side so thank DVDs you know blu-rays that kind of thing but as we transition from people buying physical media to digital media a lot of the workflows and you know the supply chain aspect of it is still there but now we're talking digital and not physical so one of the things we've done at Fox is we've you know we've created what we call our digital supply chain so you've got you know they not only you know things like content delivery in there but you've got you know watermarking you know all the all the hallmarks of what you would need in a in a digital environment to deliver that customer you know quality product from end to end and protect your IP yeah exactly where T is a big one so we'll talk about more about security data maybe there's a general topic and then let's go to dig deeper every good for sure I mean security is obviously one of our big drivers I mean obviously with everything that's been in the news lately we're no different in the sense that we take it very very seriously you know on the data protection front like I said we're big store wants customers we love the product we're using it heavily in our in our data center to protect our content as well as our data so how much time let's unpack that a little bit what's it what's it look like laughs so you said a bunch of different you know HP products can you can you help us to understand how much you know storage kind of servers what kind of apps paint a picture of your your infrastructure for sure so we've got you know a lot actually several racks of gear 3par like I said we're big three part customers so we have several racks a three part that we're using kind of across the board a lot on the database side you know and heiio scenarios storeonce is kind of that underpinning piece that everything funnels back to that provides you know data deduplication backup archival that kind of thing okay so can we talk more about sort of your objectives of protecting data I mean obviously don't lose it but there's you know time to recover there's data loss how are you approaching that yep so we we've got you know our primary facility at switch as well as a dr facility off-site we're using store once we r you know we've got them in both places we're doing replication both ways to ensure you know if we were to have a vent at one facility or we didn't have data available we can quickly recover from the other you know rtly is it's been a great success for us because we've moved from tape-based you know back up and i really didn't mention that but you know where we came from you know two two and a half years ago you know from our LA and chandler data centers we have very very heavy investment in tape infrastructure and one of the things we into decided when we went to this new you know environment in Las Vegas is we wanted it to be completely tapeless you know to be flexible right in that environment and you know we pick store once we went all disk-based and you know RTO wise is fantastic because you know as opposed to tape if you have an event if you happen to not have the tape on site your RT 0 is dictated you know kind of by when you can get the tape back with the exit yes yes fast as you can get here right with the store once though it's just there we can we can you know bring it back in minutes and in fact we actually had a kind of not funny but but interesting incident happened early on where you know we kind of had an hoops incident where somebody deleted a vm and you know with store once we already had it had it there we were able to recover it in minutes and have it working again which is not something we were able to do in in previous iteration so it's really RTO is your primary supposed to RP oh yeah and Patrick I'm sure you see it all over the board with with customers right i mean yeah absolutely i mean it is the whole environment is based on this digital content that it's the lifeblood of you know what they're doing as a business and what they're delivering you know to your customers so that what we're seeing in the data protection standpoint is that more apps are mission critical right they're moving from business-critical the mission-critical the RTOS and our feos are definitely more aggressive you know month by month quarter-by-quarter people are moving from days two hours to minutes and we want to have more they won't have access to more data that's near line and online for so you can basically restore that right away so we're seeing people architecting solutions for store ones where they'd want a couple weeks maybe a couple months of data stored on that from a vaca perspective now we're talking having conversations about three to five years seven years 10 years right so definitely a paradigm shift in terms of data protection and the clouds change that a lot absolutely how so talk about that I think you know because the cloud there's not really a concept of tape per se I mean I know you know some providers have a delayed you know a kind of recovery type mechanism but I think in general people are assuming you've got the data on disk or you know available somewhere and you're able to recall it right and you know almost any cloud provider I think today is structured that way and has some kind of object storage where you can back up to but it's an online situation right and I think that's kind of become the new the new standard for the expectation of you know it's dumping it into an object store an able to recover from that yeah i like to say backup is one thing recoveries everything so there's a software component that that's the good that and what about tape you using I mean you must be used tape in your business right we do still have tape but I think where it makes sense we're trying to get rid of it you know we obviously there's a lot of physical nature with tape you know for us it's also manpower you have to have you know it's a lot of manpower involved in just managing tape and whatnot so where we can especially strategically in our data centers we're trying to get out out of using tape and using you know just a long-term archiving long-term retention with your digital assets obviously you would take for that we definitely have scenarios at the studio where it's still used for sure yeah but not obviously not for backup no yeah yeah I think you know with my team we're starting to think of the the notion of backup maybe in the traditional sense it's kind of going away because I think what people think of backup they think tape they think these scenarios and I think it's you know it's changing to more of a you know having having various generations on disk so you have the concept of you know okay being able to go back in time but near real-time recovery a time machine for the enterprise yeah yeah we talk when we talk to customers it's usually around the areas of application data protection or a service data protection and then long term preservation of assets as opposed to backup and archive right so there because they have a very different business processes around them and you can apply different technologies to the two of them so in some some technologies are appropriate for one some are appropriate for the others so we're you know we're seeing a lot of customers really focus on day one of how I'm going to protect that data how I'm going to make data protection an automatic part of the infrastructure so I don't have to have separate backup team and separate you know specific processes this whole area of things being sort of automatically protected as part of the infrastructure is it's definitely worth a lot but I think that's a really important point to make data protection has historically been a bolt on right uh we got to protect the data yep and so you're saying that you're finally seeing customers integrate data protection as part of the fundamental solution absolutely the two things so the two things that now I'm seeing it from a fundamental part of the initial solution bill that is data protections built in right so you're seeing the techniques of snapshot and replication being melded with you know backup techniques like policy management indexing and all that kind of stuff right and then the other sort of conversation we're having with people who put infrastructure in place is how am I going to get off this in five years five to seven years right so because the amount the size of the data sets are becoming so big that replicating data data migrations migrating your backup data are there they're difficult the difficult task so people are doing a lot more planning ahead to understand how am I going to protect this data now right from a different set of scenarios and how am I going to start do some hardware lifecycle management from an infrastructure standpoint underneath that data as I go into the future are you a data protector customer what do you use not not currently although we are you know we are looking at it for sure yeah today we're actually net back up yeah yeah okay I mean it's a lot of ways to skin that kappa yeah that's still not in your group is it nope nice meg just make it but they have a saying this for a decade the data protect there should be a part of the storage solution I mean it's anyway we work with them every day fantastic I got a tight relationship yeah yeah I'm still get paid for it do get paid for it that's good okay well that's a start yeah yeah awesome alright let's see what else uh what's going on the show this year with you Oh lots of stuff of the show so obviously you you heard about flash right yeah we've heard a lot of flashes fam yeah it's great mokin fast yeah so there's a in it's funny there's a lot of implications to flash even on data protection right so this is a big area for us obviously is huge in the market the media and the speed in what flash brings to the table allows you to do some different things from Dave protection standpoint as well right so this concept of copy data management you've heard this in terms of now i can take copies of databases copies of data sets serve them up to uat test development environment so you know your speed of development by having access to copies of that you know of that original production data set is being enabled by media like flash no flash you can do lots of random i/o you can with with modern architectures like three par for example it's multi-tenant right you have quality of service on there so now we're in the past you'd have to clone a number of data sets copy them off restore them from backups for the purposes of having a you know a test data set now you can run all that on the same infrastructure so flash is great from a performance standpoint for you know speeding up your transactions feeding of your database your workflow but there's a lot of other things that allows us to do to help the overall speed of development which is kind of cool so the copy data management things interesting I mean yeah so active feos obviously popularizing it Dell fixes another one yeah the problem is they want me to rip out or not use my might reap are snapshots and I love my three parts don't want to put in a whole new infrastructure around it so is there I mean the opportunity you got a catalog in in-store wants maybe I could use that somehow that technology so that's what we're doing right so we're taking these techniques that you've had in traditional backup for years and then things you have on primary storage right snapshots and replication but with the with the advantages of flash now you're able to do a lot more with it and bringing those two techniques together we're doing it with software we're doing it with sort of extensible protocols and SD SDKs on the infrastructure itself so we're not introducing any sort of sand virtualization techniques or you know in line fibre channel you know type of virtualization technology we're allowing you do that as a part of the infrastructure itself so you know we're combining things like three par with Recovery Manager central and store ones to provide those type of experience I think the killer app they're Jews potentially is test dev right i mean if you can take copies that are more current give it to the especially with flash give it to the developers but they're not working on you know n minus three copies absolutely yeah and they're way more productive I know what kind of discussions are you having internal how do you service the developer community are they what kind of pressures are they putting on you bill yeah it's that probably the same things you've heard I mean you know agility speed I know for us you know because we're we're big on the cloud journey right now in terms of delivering you know private cloud services for our customers inside Fox one of the areas where we're actively really striving for is to do you know some deeper integration with some of the dev teams where they've got you know kind of closed loop cycles you know DevOps type cycles that they're developing with you know familiar tooling which you know is in the market that out there the Jenkins etc you know my team we're definitely working on trying to integrate a lot of the automation we're doing around cloud with what they're doing on the test dev site to kind of create a nice you know cohesive whole so you know rather than delivering just a server to them we can deliver an entire in a build environment and tear it down you know build it up and tear down dynamic flames so you mentioned a store once customer talked about RTO being really on the primary metric that you're trying to optimize waiting sir patrick comes out to California you know hits the beach makes a quick sales call writes it off wait what do you want to know from him yeah okay Oh with you that the time so what kind of discussions do you have with with Patrick around where you want to where you want to go what you want out of the product when I roadmap to the club yeah I think one of the things you know we're as I said before you know we're three par and store wants customers and I think we're where we see you know things headed in the future we'd love to see even deeper integration with three par and store once and you know we're actually having a discussion my team before this and one of the things they threw out there like hey why can't we just combine them into one product you know and I know right now they're separate but sure maybe maybe in the near future you know the the notion of having this this external device it's separate from 3par that you're you know moving to you know maybe maybe some of that gets melded together and what does that do for you it minimizes the need to manage another appliance absolutely right so it simplifies your your infrastructure tighter integration yep so better reliability and yeah I mean you know we're like a lot of technology shops in the sense that well we're trying to squeeze you know as much as we can you know with the team that we have in terms of Technology and still deliver a lot of services so you know we're always looking to if we can take two and make it one or you know that kind of scenario for simplification that's what we want to do too and more with less but no so let me ask you a question when you do more with less and you've dropped money to the CFO's bottom line today they carve off the you get a lick off that cone or they say hey they'll nice job here's a little you know we'll take twenty percent of that savings and give it back yeah it's I for us it's just the you know the slap on the back the handshake that we did it what are you seeing without me from our mothers hear from our product portfolio standpoint we're simplifying right we want to have I think we're in a unique position in terms of we want to be the best storage division inside of HP Enterprise right we don't want to be the best storage division standalone right so that affords us a lot of experiences for that we can bring to the customer when you bring in you know the blades and compute and networking and storage I mean what you see up on stage with one view and all of our element managers you know it's it doesn't sound sexy at the end of the day but basically having a same look and feel the same taxonomy that you use for all of our products is like a huge simplification for customers not having to you know learn new you is and why not so we have other competitors who you know they're bringing 7 10 12 you know different architectures for a primary storage the table right we're consolidating that and providing customers that the ability to they can go in a cost optimized software to find you know deployment model you can have appliances that are tuned in high performance same look and feel same CLI same utilities same data services so we want choice but it has to be simple because too much so what do you think about that whole software-defined mean is that the future is it this Patrick sort of implying sort of the the lower-cost sort of software only model what do you guys say yeah well we're big believers in software to find you know like I said we're we're kind of in it on you know on the whole stack in the sense that we know not only a part where we have software with HP we're also doing a lot with the team around Helion OpenStack right now and you know one of the big bets we're making is we think openstax going to be big we you know I know internally when we've talked with you know a lot of the development teams the idea of API defined infrastructure that's more malleable is tremendously exciting so what are you in with OpenStack well so right now we're actually we're kind of in that you know early stage entire ya des you know trying to trying to get a feel for it cuz you know one of the things I always say you know right now with OpenStack is it's kind of a two-way street you know there's the infrastructure part of it that my team has to deliver but the the other side of it is really the developers you know getting their hands around it getting a feel for it you know maybe even doing some platform-as-a-service with Cloud Foundry you know that kind of thing and they're really developing for that platform and getting the most out of it because you know in a lot of cases you know you're coming from a traditional environment where you know you had physical servers you put virtualization on top of it everybody's kind of used to that maybe a single VM kind of scenario but when you move to something like OpenStack you kind of got to rethink how you approach application building just think all right gents we're out of time going to leave it there but Patrick last last word for you why HP why HP I think we've got some exciting times ahead of us this year right so unlocking some velocity and value for for everyone with HP Enterprise kind of like just to echo what I said before about you know we're a portfolio company that brings a lot of technology services to our customers and at the end of the day my bet is that standalone companies that focus on one thing like storage or one thing like network or specifically compute I don't see a path forward for that over time right customers are buying solutions and systems and converged art you know infrastructure how you see this you know hyper converge theme right HP is one of the few companies that can bring all those elements to our customers as part of the equation so that for me that's why I stay here and why we've got such a great technology path forward yeah the 80s and 90s are about disintegration of IT and creating those silos and now we're seeing the reintegration so Patrick a bill thanks very much for coming on the cube absolutely thank you so much to have you guys here all right all right keep it right to everybody will be back with our next guest right after this short break you

Published Date : Jun 4 2015

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
patrick	PERSON	0.99+
Bill Walker	PERSON	0.99+
California	LOCATION	0.99+
two things	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
twenty percent	QUANTITY	0.99+
Patrick Osborne	PERSON	0.99+
two	QUANTITY	0.99+
five years	QUANTITY	0.99+
patrick osborne	PERSON	0.99+
two things	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
Fox	ORGANIZATION	0.99+
two hours	QUANTITY	0.99+
90s	DATE	0.99+
five	QUANTITY	0.99+
two techniques	QUANTITY	0.99+
today	DATE	0.99+
seven years	QUANTITY	0.99+
LA	LOCATION	0.99+
las vegas	LOCATION	0.99+
this year	DATE	0.98+
Meg	PERSON	0.98+
one	QUANTITY	0.98+
Dell	ORGANIZATION	0.98+
both places	QUANTITY	0.98+
20th Century Fox	ORGANIZATION	0.98+
seven years	QUANTITY	0.97+
both ways	QUANTITY	0.97+
dave vellante	PERSON	0.97+
80s	DATE	0.96+
day one	QUANTITY	0.96+
10 years	QUANTITY	0.96+
one product	QUANTITY	0.95+
three parts	QUANTITY	0.95+
2015	DATE	0.94+
chandler	ORGANIZATION	0.93+
OpenStack	TITLE	0.93+
nevada	LOCATION	0.93+
a decade	QUANTITY	0.92+
this year	DATE	0.92+
five years	QUANTITY	0.92+
CLI	TITLE	0.92+
Jenkins	TITLE	0.91+
Cloud Foundry	TITLE	0.9+
two and a half years ago	DATE	0.9+
amazon	ORGANIZATION	0.89+
itunes	TITLE	0.89+
Jews	OTHER	0.89+
two	DATE	0.88+
two-way	QUANTITY	0.87+
Dave	PERSON	0.86+
over two years ago	DATE	0.86+
three	QUANTITY	0.86+
openstax	TITLE	0.82+
one facility	QUANTITY	0.82+
four days	QUANTITY	0.79+
storagetek	ORGANIZATION	0.77+

Steve Wooledge - HP Discover Las Vegas 2014 - theCUBE - #HPDiscover

>>Live from Las Vegas, Nevada. It's a queue at HP. Discover 2014 brought to you by HP. >>Welcome back, everyone live here in Las Vegas for HP. Discover 2014. This is the cube we're out. We go where the action is. We're on the ground here at HP. Discover getting all the signals, sharing them with you, extracting the signal from the noise. I'm John furrier, founder of SiliconANGLE. I joined Steve Woolwich VP of product marketing at map art technologies. Great to see you welcome to the cube. Thank you. I know you got a plane to catch up, but I really wanted to squeeze you in because you guys are a leader in the big data space. You guys are in the top three, the three big whales map are Hortonworks, Cloudera. Um, you know, part of the original big data industry, which, you know, when we did the cube, when we first started the industry, you had like 30, 34 employees, total combined with three, one company Cloudera, and then Matt are announced and then Hortonworks, you guys have been part of that. Holy Trinity of, of early pioneers. Give us the update you guys are doing very, very well. Uh, we talked to you guys at the dupe summit last week. So Jack Norris for the party, give us the update what's going on with the momentum and the traction. And then I want to talk about some of the things with the product. >>Yeah. So we've seen a tremendous uptick in sales at map. Are we tripled revenue? We announced that publicly about a month ago. So we went up 300% in sales, over Q3, I'm sorry, Q1 of 2013. And I think it's really, you know, the maturity of the market. As people move more towards production, they appreciate the enterprise features. We built into the map, our distribution for Hadoop. So, um, you know, the stats I would share is that 80% of our customers triple the size of their cluster within the first 12 months and 50% of them doubled the size of the cluster because there's the, you know, they had that first production success use case and they find other applications and start rolling out more and more. So it's been great for us. >>You know, I always joke with Jack Norris, who's the VP of marketing over there. And John Frodo is the CEO about Matt bars, humbleness. You don't have the fanfare of all the height, depressed love cloud era. Now see they had done some pretty amazing things. They've had a liquidity event, so essentially kind of an IPO, if you will, that huge ex uh, financing from Intel and they're doing great big Salesforce. Hortonworks has got their open source play. You guys got, you got your heads down as well. So talk about that. How many employees you guys have and what's going on with the product? How many, how many new, what, how many products do you guys actually, >>We have, well, we have one product. So we have the map, our distribution for Hadoop, and it's got all the open source packages directly within it, but where we really innovate is in the course. So that's where we, we spent our time early on was really innovating that data platform to give everything within the Hadoop ecosystem, more reliability, better availability, performance, security scale, >>It's open source contributions to the court. And you guys put stuff on top of that, uh, >>And how it works. Yeah. And even some projects we lead the projects like with Apache Mahal and Apache drill, which is coming into beta shortly other projects, we commit and contribute back. But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is at that data platform level. So >>HP is a big data leader officer. They bought, uh, autonomy. They have HP Vertica. You guys are here. Hey, what are you doing here? Obviously we covered the cube, uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity going on other integration opportunities? >>Yeah, a few things. So, um, obviously the HP Vertica news was big. We went into general availability that solution the first week of may. So, um, what we have is the HP Vertica database integrated directly on top of our data platform. So it's this hybrid solution where you have full SQL database directly within your Hadoop distribution. Um, so it had a couple sessions on that. We had, uh, a nice panel discussion with our friends from Cloudera and Hortonworks. So really good discussion with HP about just the ecosystem and how it's evolving. The other things we're doing with HP now is, you know, we've got reference architectures on their hardware lines. So, um, you know, people can deploy Mapbox on the hardware of HP, but then also we're talking with the, um, the autonomy group about enterprise search and looking at a similar type of integration where you could have the search integrated directly into your Hadoop distro. And we've got some joint accounts we're piloting that she goes, now, >>You guys are integrating with HP pretty significantly that deals is working well. Absolutely. What's the coolest thing that you've seen with an HP that you can share. How so I asked you in the big data landscape, everyone's Bucher, you know, hunkering down, working on their feature, but outside in the real world, big data, it's not on the top of mind of the CIO, 24 7. It's probably an item that they're dressing. What have you seen and what have you been most impressed with at HP here? >>Yeah. Say, you know, this is my first HP event like this. I think the strategy they have is really good. I think in certain areas like the cloud in particular with the helium, I think they made a lot of early investments there and place some bets. And I think that's going to pay off well for them. And that marries pretty nicely with our strategy as well in terms of, you know, we have on-premise deployments, but we're also an OEM if you will, within Amazon web services. So we have a lot of agility in the cloud if you will. And I think as those products and the partnerships with HP, evolvable, we'll be playing a lot more with them in the cloud as well. >>I see that asks you a question. I want you to share with the folks out there in your own words, what is it about map bar that they may or may not understand or might not know about? Um, a little humble brag out there and share some, share some, uh, insight of, into, into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you guys do shit share with them, with what, what map map is all about. >>Yeah. I mean, for me, I was in this space with Aster data and kind of the whole Hadoop and MapReduce area since 2008 and pretty familiar with everybody in the space. I really looked at Matt bars, the best technology hands down, you look at the Forrester wave and they rank us as having the best technology today, as well as product roadmap. I think the misperception is people think, oh, it's proprietary and close. It's actually the opposite of that. We have an unbiased open-source approach where we'll ship in support in our distribution, in the entire Apache spark stack. We're not selective over which projects within Apache spark. We support. Um, I feel like SQL on Hadoop. We support Impala as well as hive and other SQL on to do technologies, including the ability to integrate HP Vertica directly in the system. And it's because of the openness of our platform. I'd say it's actually more open because of the standards we've integrated into the data platform to support a lot of third-party tools directly within it. So there is no locked in the storage formats are all the same. The code that runs on top of the distribution from the projects is exactly the same. So you can build a project in hive or some other system, and you can port it between any of the distributions. So there isn't a, lock-in >>The end of the day, what the customers want is they want ease of integration. They want reliability. That's right. And so what are you guys working on next? What's the big, uh, product marketing roadmap that you can share with us? >>Yeah, I think for us, because of the innovations we did in the data platform allows us to support not only more applications, but more types of operational systems. So integrating things like fraud detection and recommendation engines directly with the analytical systems to really speed up that, um, accuracy and, and, uh, in targeting and detecting risk and things like that. So I think now over time, you know, Hadoop has sort of been this batch analytic type of platform, but the ability to converge operations and analytics in one system is really going to be enabled by technology like Matt BARR. >>How many employees do you guys have now? Uh, >>I'm not sure what our CFO would. Let me say that before. You can say we're over 200 at this point >>As well. And over five, the customers which got the data, you guys do summit graduations, we covered your relationship with HP during our big data SV. That was exciting. Good to see John Schroeder, big, very impressive team. I'm impressed with map. I will always have been. You guys have Stephanie kept your knitting saved. Are you going to do, and again, leading the big data space, um, and again, not proprietary is a very key word and that's really cool. So thanks for coming on. Like you really appreciate Steve. We'll be right back. This is the cube live in Las Vegas, extracting the city from the noise with map bar here at the HP discover 2014. We'll be right back here for the short break.

Published Date : Jun 12 2014

SUMMARY :

Discover 2014 brought to you by HP. Uh, we talked to you guys at the dupe summit last week. So, um, you know, the stats You guys got, you got your heads down as well. and it's got all the open source packages directly within it, but where we really innovate is in the course. And you guys put stuff on top of that, But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity So it's this hybrid solution where you have full SQL How so I asked you in the big data landscape, everyone's Bucher, So we have a lot of agility in the cloud if you will. into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you So you can build a project in hive or some What's the big, uh, product marketing roadmap that you can So I think now over time, you know, Hadoop has sort of been this batch analytic Let me say that before. And over five, the customers which got the data, you guys do summit graduations,

ENTITIES

Entity	Category	Confidence
John Schroeder	PERSON	0.99+
Steve Woolwich	PERSON	0.99+
Steve	PERSON	0.99+
Jack Norris	PERSON	0.99+
HP	ORGANIZATION	0.99+
John Frodo	PERSON	0.99+
three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Steve Wooledge	PERSON	0.99+
50%	QUANTITY	0.99+
John furrier	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Matt BARR	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Stephanie	PERSON	0.99+
30	QUANTITY	0.99+
300%	QUANTITY	0.99+
first	QUANTITY	0.99+
last week	DATE	0.99+
Aster	ORGANIZATION	0.99+
2008	DATE	0.98+
Q1	DATE	0.98+
Las Vegas, Nevada	LOCATION	0.98+
one product	QUANTITY	0.98+
34 employees	QUANTITY	0.98+
one system	QUANTITY	0.98+
evolvable	ORGANIZATION	0.98+
over five	QUANTITY	0.97+
SQL	TITLE	0.97+
three big whales	QUANTITY	0.97+
MapReduce	ORGANIZATION	0.96+
SiliconANGLE	ORGANIZATION	0.96+
first 12 months	QUANTITY	0.95+
Apache Mahal	ORGANIZATION	0.95+
map map	ORGANIZATION	0.95+
over 200	QUANTITY	0.95+
24	OTHER	0.94+
today	DATE	0.94+
Intel	ORGANIZATION	0.92+
Matt	PERSON	0.92+
Salesforce	ORGANIZATION	0.91+
2014	DATE	0.9+
Impala	TITLE	0.9+
Hadoop	ORGANIZATION	0.89+
HP Vertica	ORGANIZATION	0.89+
map bar	ORGANIZATION	0.89+
Hadoop	TITLE	0.86+
one company	QUANTITY	0.85+
dupe summit	EVENT	0.84+
about a month ago	DATE	0.83+
Bucher	PERSON	0.81+
Discover 2014	EVENT	0.78+
first week of may	DATE	0.77+
Apache drill	ORGANIZATION	0.74+
#HPDiscover	ORGANIZATION	0.73+
Mapbox	TITLE	0.73+
2013	DATE	0.72+
SQL on	TITLE	0.7+
art technologies	ORGANIZATION	0.63+
Apache	ORGANIZATION	0.61+

Bobby Patrick - HP Discover Las Vegas 2014 - theCUBE - #HPDiscover

live from Las Vegas Nevada it's the cube at HP discover 2014 brought to you by HP the keynotes this afternoon meg whitman was just on a panel with thomas friedman and intel and satya nadella microsoft and pretty interesting i was it was interesting i'm here with Jeff Rick to note how passionate meg is about politics and government wow I'm she comforted by boat for Bobby Patrick is here we've been drilling down into cloud all day Bobby is the CMO of the HP Cloud Division a lot of new announcements coming on a lot of action and HP Cloud Bobby welcome to the cube yeah thank it's great to be here yeah good to see you so yeah good keynotes good good that was a good refresher you know a lot of these keynotes just products pushing and pushing we had some of that earlier but I thought it was a good eye opening refreshing right kind of discussion so it was very worthwhile but anyway you're relatively new to to HP to run to actually soon how's it going it's great it's exciting i joined it like a great time for the company we were gearing up for the big launch of our new brand HP Helion that that was launched on May seventh so just a little over a month ago and we hit the mark market hard globally it's a complete pull together of all of our products and services around cloud under a single brand customers love it and and it's really reiterated our commitment to OpenStack and you know it's great HP announced the billion dollar commitment to HP Helion over the next two years so it's backed by some some big funding that's a great time to come in so I saw that what is that would help us unpack that billion dollars it was big number right it's popular number right even we aren't buffin right her site Warren Buffett hehe underwrote the whole thing the March Madness right giving away a billion dollars for the perfect bracket right no longer a million does out of the abelian so what is that billion what does it go to what does it comprised yeah I mean it goes 2 r.d where where the most where the most active corporate sponsor behind OpenStack which is the fastest-growing open source project on the planet we are we have more contributors we have more team leads for the different projects and so we're working with the community we're hiring OpenStack experts always looking for the best in the world all around the world and we're then hardening and curating it in making a commercial now with our support and we believe it's the underpinning of the future of what we call hybrid cloud the ability to put some of your information some of your applications with an enterprise some of the public cloud some in different countries that matter for compliance reasons and to be able to move around between those different clouds in a very easy fashion so this money is going to that rd2 skills and to you know truly a global global launch so when you think about the sort of messaging for our HP Cloud what do you want customers to think about in the Helion brand and the HP Cloud yeah the number one thing is commitment to open standards so we are if you heard Martin Fink today talk about HP Labs and they're coming to open source we're all in on open source we believe it's the way to deliver innovation faster we can bring the market tech new technologies faster to customers so we're all into open source we are committed to the projects that matter to the next 20 years of IT and so that could emma has a real though we have to be to prove it with to say you know you can run our software on other hardware we think it'll be we'll have some optimal integrated solutions for you using our entire stack but this is about this is about eliminating vendor lock-in which is one of the biggest challenges at IT departments have faced in the last 20 years and so I think the commitment behind it open is at the core of our messaging so we should mention so Martin fake gave i really liked his presentation i have been safer I don't know for years that HP's got to get back to its roots right which are in fence right and I have not heard until today something that excited me about invention and we saw it today right now invention is not easy all we've talked about a lot that the previous administration cut cut cut by the bone right it takes a long time to turn that's Nisha but but we saw today think was put into that job for very particular reason I said about two things one it's a guy who's going to commercialize inventions answer the marketplace and two there's going to be a heavy systems focus so he basically showed a little leg on the machine which eventually is probably gonna be powering your clouds right he also announced HP is going to put forth a new open source operating system optimized for non-volatile memory not only a blank sheet of paper that they're going to work on with universities but also a Linux derivative stripped-down Linux driven and one for Android that was excited yeah I think what's great also is the cloud business actually falls under market so our our entire business worldwide in our cloud effort our rd on product development is all under martin who runs our CTO of our of our HP labs and when you look at the problems he's addressing with the machine and he's going after it's going after the massive scale challenges of the internet right and the massive scale challenge to the cloud and the day-to-day lose that we're all that we're all facing within the Internet of Things and so you know what's great is by being a part of the labs and being part of Martin's organization you know we're we're injecting that thinking into our cloud we're injecting it into our innovation and and you can see a road map here right you can see this this whole new architecture you talk about architecture that's been in existence since 1950 it was called the von Neumann architecture all the way to now and you know the world with copper at the core you know the world's in need of a new architecture and so it's great to be part of that there's that was a cool talk you talking about electrons photons and ions electrons compute compute autonomy photon photons communicate anions door right and that in essence is the future direction of where HP is going with the machine run a civ massive memory blowing away the volatility hierarchy blowing way ultimately slow spinning disks using memory store right as the platform for future systems I love it yeah he mentioned also but one thing that's close to my heart is the distributed mesh you saw that distributed mesh where we're different different hardware software combinations sit at different points of the you know the net work and they work together you know compute and data and that's really hybrid cloud you know hybrid cloud is putting compute workloads in certain areas and having data stored and distributed for maximum availability and doing that you know with self-service and doing that in a way that you know I see over nations can scale effectively yeah I think that you know as a marketing person you realize that customers want to know that your relevant for their future right and you know as much as I love things like store once it's not the future of computing Ryan comes out HP Labs this potentially is so that's got to have customers really excite this really the first time you've unveiled it right massively in the public scale right maybe you're talking you know that's why that's why i joined the HP i saw that coming out a few months ago and the the new style of IT thinking we're we're saying you know we're radically going to be at the core of helping IT transition from the old style very inward to a customer centric style 21 you know where you're delivering the customer you know consumer experience in the business world and i saw that with HP and it got me excited and i joined on board not upside yeah the other part that Martin mentioned I no idea of the power of HP Labs but the leveraging open source as well which are I probably not a tool in the arsenal not that long ago to really bring the power of a large communities engaged you can attack right specific problems and make that a core piece of the of the process yeah we think about it we've got thousands of the world's best developers right the Millennial developers these guys working all around the clock working on you know our core cloud future called OpenStack contributing to that right including our experts and then we're taking that and then bringing it to market you know into providing that twenty four seven support testing and hardening it you know doing the things need to do to help it enterprise feel comfortable with that decision you could never do that we could never do that and deliver that kind of innovation on our own just couldn't afford it we wouldn't be able to deliver on it you know these are the best minds of the world who are contributing this and we're we're all in nope in fact so you talked about we talked about what the brand is stand for you said open no lock-in can open source innovation occur at a pace with somebody who's got full control of a stack it's much faster actually I mean this is the you watch the innovation of OpenStack it's only what four years old we just at a four-year birthday of OpenStack already that's an entire cloud computing platform you've got databases service projects like trove you've got object storage projects like Swift and block storage like Senator you know all of these things are being worked on by people around the world you could never deliver and so what's happening is the pace of innovation with an open source project like OpenStack is like it's a hockey stick and and so I think yeah I think if we did this ourselves we or anyone else you would never be able to deliver the kind of innovation it's coming to market now we talked about some of the announcements you guys know why don't we actually go back a month right but Helion and then work through today we've got some HPC announcements you got the network you know for Helion right start with Helia so what's great about healing on is is it really brought together a lot of great products and services of the cloud that already existed and it took OpenStack and it was our first foray into the market with an OpenStack distribution and what's important actually is we have technology one called HP cloud system that is actually the most popular cloud platform right now private cloud platform on the planet about almost two thousand users right or two thousand companies third of the Fortune 100 right now using that technology so it is a proven capable platform used by big banks and others we're injecting OpenStack into that so that you can you can over time scale that out with new applications and so the launch really was about pulling all the pieces together pulling our support and services together and saying to a customer you know with confidence here's here's our cloud portfolio and here's how we can take you on a journey it's your pace and accelerate that journey take advantage of that cloud portfolio and that was really the launch month ago today and it discover I mean only a month later we've already done a number of great things but one is we brought out OpenStack the commercial version so we've launched community one you can download it thousands of downloads already the commercial versions coming up now and we announced pricing and what we are all about here this is what it really really important we are about accelerating the adoption of OpenStack throughout the enterprise we're about breaking down the barriers that have that have inhibited the proliferation of this great technology so one of those things today was the price point we announced 1000 for three dollars per year per server all in price point for HP Helion OpenStack and that's critical because this is a scale out a scale-out product you're going to have dozens hundreds maybe even thousands of these all around the world and so the price point is it's disruptive it's the lowest of the planet and and you know we said it's gonna be simple and easy we're not going to do all of this good better best packaging it's it's super easy and that's a big part of today the other part of today as we said you know what we're going to work with partners we're going to deploy this all around the world and that was the helium Network announcement along with ATT and the British Telecom and Intel and that's that's just huge for today now now helium comprises both on-premise in an HP public cloud correct that's right so talk about how that pricing works I mean I like what you're saying simple because cloud pricing is really complicated yeah so we use we wear that we're probably the largest user of OpenStack in production in production today without public cloud so we use it and people can consume services from that buy them on a on a you know on an as you go basis but with OpenStack which you what's really happening is people are able to deploy their own private clouds right they're able to a service provider could deploy and build their own public cloud so when I talk about the price point talking about a customer building their own cloud building their own cloud and a third party data center or in one of HP's 82 data centers and that that price point is is is you know it's easy easy to use you can predict it in your business model and feel comfortable about what it's going to cost you know two three four years out and so help me understand let's unpack that a little bit what am I getting for that fourteen hundred dollars per so you get the entire so this is what's amazing you get the entire cloud operating system called OpenStack right you get all of the projects now that are part of the OpenStack bill you're getting a top you're getting an object story it's it's a you know a la amazon s3 but in a box called Swift right with a swift API and you can build that and do that yourself now you can do that in a way that controls that gives you full control and full flexibility you get databases the service product you get a cute engine with cinder grizzly everything that's right no lad for the computer and so you get all of this in that box all of this and you can go deploy this and you can benefit now from the thousands of developers who are every six weeks putting out new code and innovative so okay so all the new innovations will fall under that umbrella and that's right at any price they choose to use you might say I'm just building a cloud storage environment you might choose to be heavy on Swift that's what you're doing but it is all inclusive and you can use the entire cloud platform or you can build a storage platform or databases a service platform that's a different model clearly what a customer is telling you about that yeah so they well they want they want the control and the flexibility of having their own platform for you know security reasons their own for compliance they want to put their data you know in their own centers but they're also saying I want to use public cloud some too and I like the idea that if OpenStack is here and OpenStack is here right same code bases I can fairly easily take a workload take an application to go from here to here and back and forth that kind of flexibility call interoperability and that's what's coming down the road with OpenStack underneath is something that does not exist today is everybody wants make sure I understand so I'm paid 1400 hours per server for that OpenStack instance on-premise and then when I want to access public cloud services I'm what you would pay an answer you might want to burst you might want to just go do you might have some peak demand he's burst out there you pay for and I would vote for money to make your partner of ours yep excellent now you also had some hpc announcements that's right so there's a number what's great is HP now is people are taking Helion OpenStack and they're putting it in their products are hpc group a high-performance computing group said hey we want to have a self-service mechanism we want to be able to scale out sap architecture people want in that in hpc so they put OpenStack inside their solution and launched it today and so it's you know OpenStack and better than hpc open hybrid simple to consume is what I'm that's right that's right it's ductable and predictable all right good Dave Lisa Marie wrote the book on this so this is great if you don't believe Bobby Lisa came I gave me this right gave me the books it's the OpenStack technology breaking the enterprise barrier you've got it you got it it's one of the best best reads on the planet right now yeah excellent all right so what does it go to the next level what is it I'm just buying computer part of just I'm just getting capacity if you just want capacity you might say you might just build a storage cloud yourself or you might use the our public cloud storage or with our Helia network our partners around the world be deploying OpenStack and you can buy it from them awesome all right we got to leave it there Bobby thanks so much for coming to the cube is a pleasure meantime take it all right keep it right to everybody John furrier is in the house he's back from San Francisco or San Jose good to have him back John keep right there but back with job fair in just a moment

Published Date : Jun 12 2014

SUMMARY :

of the planet and and you know we said

ENTITIES

Entity	Category	Confidence
San Francisco	LOCATION	0.99+
San Jose	LOCATION	0.99+
Martin	PERSON	0.99+
British Telecom	ORGANIZATION	0.99+
May seventh	DATE	0.99+
Jeff Rick	PERSON	0.99+
billion dollars	QUANTITY	0.99+
1000	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
Bobby Patrick	PERSON	0.99+
82 data centers	QUANTITY	0.99+
Dave Lisa Marie	PERSON	0.99+
today	DATE	0.99+
thomas friedman	PERSON	0.99+
martin	PERSON	0.99+
Warren Buffett	PERSON	0.99+
four-year	QUANTITY	0.99+
ATT	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
Swift	TITLE	0.99+
a month later	DATE	0.99+
Android	TITLE	0.99+
Martin Fink	PERSON	0.99+
hpc	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Linux	TITLE	0.99+
billion dollar	QUANTITY	0.99+
Helion	ORGANIZATION	0.99+
Bobby	PERSON	0.99+
Bobby Lisa	PERSON	0.99+
John	PERSON	0.99+
March Madness	EVENT	0.98+
one	QUANTITY	0.98+
two	QUANTITY	0.98+
Ryan	PERSON	0.98+
OpenStack	TITLE	0.98+
Las Vegas Nevada	LOCATION	0.98+
1950	DATE	0.97+
Nisha	PERSON	0.97+
two thousand companies	QUANTITY	0.97+
HP Helion	ORGANIZATION	0.97+
meg	PERSON	0.96+
Helia	ORGANIZATION	0.95+
month ago	DATE	0.95+
few months ago	DATE	0.94+
HP Labs	ORGANIZATION	0.94+
both	QUANTITY	0.94+
dozens hundreds	QUANTITY	0.94+
OpenStack	ORGANIZATION	0.93+
third	QUANTITY	0.93+
first time	QUANTITY	0.92+
first foray	QUANTITY	0.92+
fourteen hundred dollars per	QUANTITY	0.91+
four years old	QUANTITY	0.9+

UNLIST TILL 4/1 - How The Trade Desk Reports Against Two 320-node Clusters Packed with Raw Data

hi everybody thank you for joining us today for the virtual Vertica BBC 2020 today's breakout session is entitled Vertica and en mode at the trade desk my name is su LeClair director of marketing at Vertica and I'll be your host for this webinar joining me is Ron Cormier senior Vertica database engineer at the trade desk before we begin I encourage you to submit questions or comments during the virtual session you don't have to wait just type your question or comment in the question box below the slides and click submit there will be a Q&A session at the end of the presentation we'll answer as many questions as we're able to during that time any questions that we don't address we'll do our best to answer them offline alternatively you can visit vertical forums to post your questions there after the session our engineering team is planning to join the forums to keep the conversation going also a quick reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slide and yes this virtual session is being recorded and will be available to view on demand this week we'll send you a notification as soon as it's ready so let's get started over to you run thanks - before I get started I'll just mention that my slide template was created before social distancing was a thing so hopefully some of the images will harken us back to a time when we could actually all be in the same room but with that I want to get started uh the date before I get started in thinking about the technology I just wanted to cover my background real quick because I think it's peach to where we're coming from with vertically on at the trade desk and I'll start out just by pointing out that prior to my time in the trade desk I was a tech consultant at HP HP America and so I traveled the world working with Vertica customers helping them configure install tune set up their verdict and databases and get them working properly so I've seen the biggest and the smallest implementations and everything in between and and so now I'm actually principal database engineer straight desk and and the reason I mentioned this is to let you know that I'm a practitioner I'm working with with the product every day or most days this is a marketing material so hopefully the the technical details in this presentation are are helpful I work with Vertica of course and that is most relative or relevant to our ETL and reporting stack and so what we're doing is we're taking about the data in the Vertica and running reports for our customers and we're an ad tech so I did want to just briefly describe what what that means and how it affects our implementation so I'm not going to cover the all the details of this slide but basically I want to point out that the trade desk is a DSP it's a demand-side provider and so we place ads on behalf of our customers or agencies and ad agencies and their customers that are advertised as brands themselves and the ads get placed on to websites and mobile applications and anywhere anywhere digital advertising happens so publishers are what we think ocean like we see here espn.com msn.com and so on and so every time a user goes to one of these sites or one of these digital places and an auction takes place and what people are bidding on is the privilege of showing and add one or more ads to users and so this is this is really important because it helps fund the internet ads can be annoying sometimes but they actually help help are incredibly helpful in how we get much much of our content and this is happening in real time at very high volumes so on the open Internet there is anywhere from seven to thirteen million auctions happening every second of those seven to thirteen million auctions happening every second the trade desk bids on hundreds of thousands per second um so that gives it and anytime we did we have an event that ends up in Vertica that's that's one of the main drivers of our data volume and certainly other events make their way into Vertica as well but that wanted to give you a sense of the scale of the data and sort of how it's impacting or how it is impacted by sort of real real people in the world so um the uh let's let's take a little bit more into the workload and and we have the three B's in spades late like many many people listening to a massive volume velocity and variety in terms of the data sizes I've got some information here some stats on on the raw data sizes that we deal with on a daily basis per day so we ingest 85 terabytes of raw data per day and then once we get it into Vertica we do some transformations we do matching which is like joins basically and we do some aggregation group buys to reduce the data and make it clean it up make it so it's more efficient to consume buy our reporting layer so that matching in aggregation produces about ten new terabytes of raw data per day it all comes from the it all comes from the data that was ingested but it's new data and so that's so it is reduced quite a bit but it's still pretty pretty high high volume and so we have this aggregated data that we then run reports on on behalf of our customers so we have about 40,000 reports per day oh that's probably that's actually a little bit old and older number it's probably closer to 50 or 55,000 reports per day at this point so it's I think probably a pretty common use case for for Vertica customers it's maybe a little different in the sense that most of the reports themselves are >> reports so they're not it's not a user sitting at a keyboard waiting for the result basically we have we we have a workflow where we do the ingest we do this transform and then and then once once all the data is available for a day we run reports on behalf of our customer to let me have our customers on that that daily data and then we send the reports out you via email or we drop them in a shared location and then they they look at the reports at some later point of time so it's up until yawn we did all this work on on enterprise Vertica at our peak we had four production enterprise clusters each which held two petabytes of raw data and I'll give you some details on on how those enterprise clusters were configured in the hardware but before I do that I want to talk about the reporting workload specifically so the the reporting workload is particularly lumpy and what I mean by that is there's a bunch of work that becomes available bunch of queries that we need to run in a short period of time after after the days just an aggregation is completed and then the clusters are relatively quiet for the remaining portion of the day that's not to say they are they're not doing anything as far as read workload but they certainly are but it's much less reactivity after that big spike so what I'm showing here is our reporting queue and the spike is is when all those reports become a bit sort of ailable to be processed we can't we can't process we can't run the report until we've done the full ingest and matching and aggregation for the day and so right around 1:00 or 2:00 a.m. UTC time every day that's when we get this spike and the spike we affectionately called the UTC hump but basically it's a huge number of queries that need to be processed sort of as soon as possible and we have service levels that dictate what as soon as possible means but I think the spike illustrates our use case pretty pretty accurately and um it really as we'll see it's really well suited for pervert icky on and we'll see what that means so we've got our we had our enterprise clusters that I mentioned earlier and just to give you some details on what they look like there they were independent and mirrored and so what that means is all four clusters held the same data and we did this intentionally because we wanted to be able to run our report anywhere we so so we've got this big queue over port is big a number of reports that need to be run and we've got these we started we started with one cluster and then we got we found that it couldn't keep up so we added a second and we found the number of reports went up that we needed to run that short period of time and and so on so we eventually ended up with four Enterprise clusters basically with this with the and we'd say they were mirrored they all had the same data they weren't however synchronized they were independent and so basically we would run the the tailpipe line so to speak we would run ingest and the matching and the aggregation on all the clusters in parallel so they it wasn't as if each cluster proceeded to the next step in sync with which dump the other clusters they were run independently so it was sort of like each each cluster would eventually get get consistent and so this this worked pretty well for for us but it created some imbalances and there was some cost concerns that will dig into but just to tell you about each of these each of these clusters they each had 50 nodes they had 72 logical CPU cores a half half a terabyte of RAM a bunch of raid rated disk drives and 2 petabytes of raw data as I stated before so pretty big beefy nodes that are physical physical nodes that we held we had in our data centers we actually reached these nodes so so it was on our data center providers data centers and the these were these these were what we built our business on basically but there was a number of challenges that we ran into as we as we continue to build our business and add data and add workload and and the first one is is some in ceremony can relate to his capacity planning so we had to prove think about the future and try to predict the amount of work that was going to need to be done and how much hardware we were going to need to satisfy that work to meet that demand and that's that's just generally a hard thing to do it's very difficult to verdict the future as we can probably all attest to and how much the world has changed and even in the last month so it's a it's a very difficult thing to do to look six twelve eighteen eighteen months into the future and sort of get it right and and and what people what we tended to do is we reach or we tried to our art plans our estimates were very conservative so we overbought in a lot of cases and not only that we had to plan for the peak so we're planning for that that that point in time that those number of hours in the early morning when we had to we had all those reports to run and so that so so we ended up buying a lot of hardware and we actually sort of overbought at times and then and then as the hardware were days it would kind of come into it would come into maturity and we have our our our workload would sort of come approach matching the demand so that was one of the big challenges the next challenge is that we were running on disk you can we wanted to add data in sort of two dimensions the only dimensions that everybody can think about we wanted to add more columns to our big aggregates and we wanted to keep our big aggregates for for longer periods of time so both horizontally and vertically we wanted to expand the datasets but we basically were running out of disk there was no more disk in and it's hard to add a disc to Vertica in enterprise mode not not impossible but certainly hard and and one cannot add discs without adding compute because enterprise mode the disk is all local to each of the nodes for most most people you can do not exchange with sands and other external rays but that's there are a number of other challenges with that so um adding in order to add disk we had to add compute and that basically meant kept us out of balance we're adding more compute than we needed for the amount of disk so that was the problem certainly physical nodes getting them the order delivered racked cables even before we even start such Vertica there's lead times there and and so it's also long commitment since we like I mentioned me Lisa hardware so we were committing to these nodes these physical servers for two or three years at a time and I mentioned that can be a hard thing to do but we wanted to least to keep our capex down so we wanted to keep our aggregates for a long period of time we could have done crazy things or more exotic things to to help us with this if we had to in enterprise mode we could have started to like daisy chain clusters together and that would have been sort of a non-trivial engineering effort because we would need to then figure out how to migrate data source first to recharge the data across all the clusters and we had to migrate data from one cluster to another cluster hesitation and we would have to think about how to aggregate run queries across clusters so if you assured data set spans two clusters it would have had to sort of aggregated within each cluster maybe and then build something on top the aggregated the data from each of those clusters so not impossible things but certainly not easy things and luckily for us we started talking about two Vertica about separation of compute and storage and I know other customers were talking to Vertica as we were people had had these problems and so Vertica inyeon mode came to the rescue and what I want to do is just talk about nyan mode really briefly for for those in the audience who aren't familiar but it's basically Vertigo's answered to the separation of computing storage it allows one to scale compute and or storage separately and and this there's a number of advantages to doing that whereas in the old enterprise days when you add a compute you added stores and vice-versa now we can now we can add one or the other or both according to how we want to and so really briefly how this works this slide this figure was taken directly from the verdict and documentation and so just just to talk really briefly about how it works the taking advantage of the cloud and so in this case Amazon Web Services the elasticity in the cloud and basically we've got you seen two instances so elastic cloud compute servers that access data that's in an s3 bucket and so three three ec2 nodes and in a bucket or the the blue objects in this diagram and the difference is a couple of a couple of big differences one the data no longer the persistent storage of the data the data where the data lives is no longer on each of the notes the persistent stores of the data is in s3 bucket and so what that does is it basically solves one of our first big problems which is we were running out of disk the s3 has for all intensive purposes infinite storage so we can keep much more data there and that mostly solved one of our big problems so the persistent data lives on s3 now what happens is when a query runs it runs on one of the three nodes that you see here and assuming we'll talk about depo in a second but what happens in a brand new cluster where it's just just spun up the hardware is the query will will run on those ec2 nodes but there will be no data so those nodes will reach out to s3 and run the query on remote storage so that so the query that the nodes are literally reaching out to the communal storage for the data and processing it entirely without using any data on on the nodes themselves and so that that that works pretty well it's not as fast as if the data was local to the nodes but um what Vertica did is they built a caching layer on on each of the node and that's what the depot represents so the depot is some amount of disk that is relatively local to the ec2 node and so when the query runs on remote stores on the on the s3 data it then queues up the data for download to the nodes and so the data will get will reside in the Depot so that the next query or the subsequent subsequent queries can run on local storage instead of remote stores and that speeds things up quite a bit so that that's that's what the role of the Depot is the depot is basically a caching layer and we'll talk about the details of how we can see your in our Depot the other thing that I want to point out is that since this is the cloud another problem that helps us solve is the concurrency problem so you can imagine that these three nodes are one sort of cluster and what we can do is we can spit up another three nodes and have it point to the same s3 communal storage bucket so now we've got six nodes pointing to the same data but we've you isolated each of the three nodes so that they act as if they are their own cluster and so vertical calls them sub-clusters so we've got two sub clusters each of which has three nodes and what this has essentially done it is it doubled the concurrency doubled the number of queries that can run at any given time because we've now got this new place which new this new chunk of compute which which can answer queries and so that has given us the ability to add concurrency much faster and I'll point out that for since it's cloud and and there are on-demand pricing models we can have significant savings because when a sub cluster is not needed we can stop it and we pay almost nothing for it so that's that's really really important really helpful especially for our workload which I pointed out before was so lumpy so those hours of the day when it's relatively quiet I can go and stop a bunch of sub clusters and and I will pay for them so that that yields nice cost savings let's be on in a nutshell obviously engineers and the documentation can use a lot more information and I'm happy to field questions later on as well but I want to talk about how how we implemented beyond at the trade desk and so I'll start on the left hand side at the top the the what we're representing here is some clusters so there's some cluster 0 r e t l sub cluster and it is a our primary sub cluster so when you get into the world of eon there's primary Club questions and secondary sub classes and it has to do with quorum so primary sub clusters are the sub clusters that we always expect to be up and running and they they contribute to quorum they decide whether there's enough instances number a number of enough nodes to have the database start up and so these this is where we run our ETL workload which is the ingest the match in the aggregate part of the work that I talked about earlier so these nodes are always up and running because our ETL pipeline is always on we're internet ad tech company like I mentioned and so we're constantly getting costly running ad and there's always data flowing into the system and the matching is happening in the aggregation so that part happens 24/7 and we wanted so that those nodes will always be up and running and we need this we need that those process needs to be super efficient and so what that is reflected in our instance type so each of our sub clusters is sixty four nodes we'll talk about how we came at that number but the infant type for the ETL sub cluster the primary subclusters is I 3x large so that is one of the instance types that has quite a bit of nvme stores attached and we'll talk about that but on 32 cores 240 four gigs of ram on each node and and that what that allows us to do I should have put the amount of nvme but I think it's seven terabytes for anything me storage what that allows us to do is to basically ensure that our ETL everything that this sub cluster does is always in Depot and so that that makes sure that it's always fast now when we get to the secondary subclusters these are as mentioned secondary so they can stop and start and it won't affect the cluster going up or down so they're they're sort of independent and we've got four what we call Rhian subclusters and and they're not read by definition or technically they're not read only any any sub cluster can ingest and create your data within the database and that'll all get that'll all get pushed to the s3 bucket but logically for us they're read only like these we just most of these the work that they happen to do is read only which it is which is nice because if it's read only it doesn't need to worry about commits and we let we let the primary subclusters or ETL so close to worry about committing data and we don't have to we don't have to have the all nodes in the database participating in transaction commits so we've got a for read subclusters and we've got one EP also cluster so a total of five sub clusters each so plus they're running sixty-four nodes so that gives us a 320 node database all things counted and not all those nodes are up at the same time as I mentioned but often often for big chunks of the days most of the read nodes are down but they do all spin up during our during our busy time so for the reading so clusters we've got I three for Excel so again the I three incidents family type which has nvme stores these notes have I think three and a half terabytes of nvme per node we just rate it to nvme drives we raid zero them together and 16 cores 122 gigs of ram so these are smaller you'll notice but it works out well for us because the the read workload is is typically dealing with much smaller data sets than then the ingest or the aggregation workbook so we can we can run these workloads on on smaller instances and leave a little bit of money and get more granularity with how many sub clusters are stopped and started at any given time the nvme doesn't persist the data on it isn't persisted remember you stop and start this is an important detail but it's okay because the depot does a pretty good job in that in that algorithm where it pulls data in that's recently used and the that gets pushed out a victim is the data that's least reasons use so it was used a long time ago so it's probably not going to be used to get so we've got um five sub-clusters and we have actually got to two of those so we've got a 320 node cluster in u.s. East and a 320 node cluster in u.s. West so we've got a high availability region diversity so and their peers like I talked about before they're they're independent but but yours they are each run 128 shards and and so with that what that which shards are is basically the it's similar to segmentation when you take those dataset you divide it into chunks and though and each sub cluster can concede want the data set in its entirety and so each sub cluster is dealing with 128 shards it shows 128 because it'll give us even distribution of the data on 64 node subclusters 60 120 might evenly by 64 and so there's so there's no data skew and and we chose 128 because the sort of ginger proof in case we wanted to double the size of any of the questions we can double the number of notes and we still have no excuse the data would be distributed evenly the disk what we've done is so we've got a couple of raid arrays we've got an EBS based array that they're catalog uses so the catalog storage location and I think we take for for EBS volumes and raid 0 them together and come up with 128 gigabyte Drive and we wanted an EPS for the catalog because it we can stop and start nodes and that data will persist it will come back when the node comes up so we don't have to run a bunch of configuration when the node starts up basically the node starts it automatically joins the cluster and and very strongly there after it starts processing work let's catalog and EBS now the nvme is another raid zero as I mess with this data and is ephemeral so let me stop and start it goes away but basically we take 512 gigabytes of the nvme and we give it to the data temp storage location and then we take whatever is remaining and give it to the depot and since the ETL and the reading clusters are different instance types they the depot is is side differently but otherwise it's the same across small clusters also it all adds up what what we have is now we we stopped the purging data for some of our big a grits we added bunch more columns and what basically we at this point we have 8 petabytes of raw data in each Jian cluster and it is obviously about 4 times what we can hold in our enterprise classes and we can continue to add to this maybe we need to add compute maybe we don't but the the amount of data that can can be held there against can obviously grow much more we've also built in auto scaling tool or service that basically monitors the queue that I showed you earlier monitors for those spikes I want to see as low spikes it then goes and starts up instances one sub-collector any of the sub clusters so that's that's how that's how we we have compute match the capacity match that's the demand also point out that we actually have one sub cluster is a specialized nodes it doesn't actually it's not strictly a customer reports sub clusters so we had this this tool called planner which basically optimizes ad campaigns for for our customers and we built it it runs on Vertica uses data and Vertica runs vertical queries and it was it was wildly successful um so we wanted to have some dedicated compute and beyond witty on it made it really easy to basically spin up one of these sub clusters or new sub cluster and say here you go planner team do what you want you can you can completely maximize the resources on these nodes and it won't affect any of the other operations that were doing the ingest the matching the aggregation or the reports up so it gave us a great deal of flexibility and agility which is super helpful so the question is has it been worth it and without a doubt the answer is yes we're doing things that we never could have done before sort of with reasonable cost we have lots more data specialized nodes and more agility but how do you quantify that because I don't want to try to quantify it for you guys but it's difficult because each eon we still have some enterprise nodes by the way cost as you have two of them but we also have these Eon clusters and so they're there they're running different workloads the aggregation is different the ingest is running more on eon does the number of nodes is different the hardware is different so there are significant differences between enterprise and and beyond and when we combine them together to do the entire workload but eon is definitely doing the majority of the workload it has most of the data it has data that goes is much older so it handles the the heavy heavy lifting now the query performance is more anecdotal still but basically when the data is in the Depot the query performance is very similar to enterprise quite close when the data is not in Depot and it needs to run our remote storage the the query performance is is is not as good it can be multiples it's not an order not orders of magnitude worse but certainly multiple the amount of time that it takes to run on enterprise but the good news is after the data downloads those young clusters quickly catch up as the cache populates there of cost I'd love to be able to tell you that we're running to X the number of reports or things are finishing 8x faster but it's not that simple as you Iran is that you it is me I seem to have gotten to thank you you hear me okay I can hear you now yeah we're still recording but that's fine we can edit this so if I'm just talking to the person the support person he will extend our recording time so if you want to maybe pick back up from the beginning of the slide and then we'll just edit out this this quiet period that we have sir okay great I'm going to go back on mute and why don't you just go back to the previous slide and then come into this one again and I'll make sure that I tell the person who yep perfect and then we'll continue from there is that okay yeah sound good all right all right I'm going back on yet so the question is has it been worth it and for us the answer has been a resounding yes we're doing things that we never could have done at reasonable cost before and we got more data we've got this Y note this law has nodes and in work we're much more agile so how to quantify that um well it's not quite as simple and straightforward as you might hope I mean we still have enterprise clusters we've got to update the the four that we had at peak so we've still got two of those around and we got our two yawn clusters but they're running different workloads and they're comprised of entirely different hardware the dependence has I've covered the number of nodes is different for sub-clusters so 64 versus 50 is going to have different performance the the workload itself the aggregation is aggregating more columns on yon because that's where we have disk available the queries themselves are different they're running more more queries on more intensive data intensive queries on yon because that's where the data is available so in a sense it is Jian is doing the heavy lifting for the cluster for our workload in terms of query performance still a little anecdotal but like when the queries that run on the enterprise cluster the performance matches that of the enterprise cluster quite closely when the data is in the Depot when the data is not in a Depot and Vertica has to go out to the f32 to get the data performance degrades as you might expect it can but it depends on the curious all things like counts counts are is really fast but if you need lots of the data from the material others to realize lots of columns that can run slower I'm not orders of magnitude slower but certainly multiple of the amount of time in terms of costs anecdotal will give a little bit more quantifying here so what I try to do is I try to figure out multiply it out if I wanted to run the entire workload on enterprise and I wanted to run the entire workload on e on with all the data we have today all the queries everything and to try to get it to the Apple tab so for enterprise the the and estimate that we do need approximately 18,000 cores CPU cores all together and that's a big number but that's doesn't even cover all the non-trivial engineering work that would need to be required that I kind of referenced earlier things like starting the data among multiple clusters migrating the data from one culture to another the daisy chain type stuff so that's that's the data point now for eon is to run the entire workload estimate we need about twenty thousand four hundred and eighty CPU cores so more CPU cores uh then then enterprise however about half of those and partly ten thousand of both CPU cores would only run for about six hours per day and so with the on demand and elasticity of the cloud that that is a huge advantage and so we are definitely moving as fast as we can to being on all Aeon we have we have time left on our contract with the enterprise clusters or not we're not able to get rid of them quite yet but Eon is certainly the way of the future for us I also want to point out that uh I mean yawn is we found to be the most efficient MPP database on the market and what that refers to is for a given dollar of spend of cost we get the most from that zone we get the most out of Vertica for that dollar compared to other cloud and MPP database platforms so our business is really happy with what we've been able to deliver with Yan Yan has also given us the ability to begin a new use case which is probably this case is probably pretty familiar to folks on the call where it's UI based so we'll have a website that our customers can log into and on that website they'll be able to run reports on queries through the website and have that run directly on a separate row to get beyond cluster and so much more latent latency sensitive and concurrency sensitive so the workflow that I've described up until this point has been pretty steady throughout the day and then we get our spike and then and then it goes back to normal for the rest of the day this workload it will be potentially more variable we don't know exactly when our engineers are going to deliver some huge feature that is going to make a 1-1 make a lot of people want to log into the website and check how their campaigns are doing so we but Yohn really helps us with this because we can add a capacity so easily we cannot compute and we can add so we can scale that up and down as needed and it allows us to match the concurrency so beyond the concurrency is much more variable we don't need a big long lead time so we're really excited about about this so last slide here I just want to leave you with some things to think about if you're about to embark or getting started on your journey with vertically on one of the things that you'll have to think about is the no account in the shard count so they're kind of tightly coupled the node count we determined by figuring like spinning up some instances in a single sub cluster and getting performance smaller to finding an acceptable performance considering current workload future workload for the queries that we had when we started and so we went with 64 we wanted to you want to certainly want to increase over 50 but we didn't want to have them be too big because of course it costs money and so what you like to do things in power to so 64 nodes and then the shard count for the shards again is like the data segmentation is a new type of segmentation on the data and the start out we went with 128 it began the reason is so that we could have no skew but you know could process the same same amount of data and we wanted to future-proof it so that's probably it's probably a nice general recommendation doubleness account for the nodes the instance type and and how much people space those are certainly things you're going to consider like I was talking about we went for they I three for Excel I 3/8 Excel because they offer good good Depot stores which gives us a really consistent good performance and it is all in Depot the pretty good mud presentation and some information on on I think we're going to use our r5 or the are for instance types for for our UI cluster so much less the data smaller so much less enter this on Depot so we don't need on that nvm you stores the reader we're going to want to have a reserved a mix of reserved and on-demand instances if you're if you're 24/7 shop like we are like so our ETL subclusters those are reserved instances because we know we're going to run those 24 hours a day 365 days a year so there's no advantage of having them be on-demand on demand cost more than reserve so we get cost savings on on figuring out what we're going to run and have keep running and it's the read subclusters that are for the most part on on demand we have one of our each sub Buster's is actually on 24/7 because we keep it up for ad-hoc queries your analyst queries that we don't know when exactly they're going to hit and they want to be able to continue working whenever they want to in terms of the initial data load the initial data ingest what we had to do and now how it works till today is you've got to basically load all your data from scratch there isn't a great tooling just yet for data populate or moving from enterprise to Aeon so what we did is we exported all the data in our enterprise cluster into park' files and put those out on s3 and then we ingested them into into our first Eon cluster so it's kind of a pain we script it out a bunch of stuff obviously but they worked and the good news is that once you do that like the second yon cluster is just a bucket copy in it and so there's tools missions that can help help with that you're going to want to manage your fetches and addiction so this is the data that's in the cache is what I'm referring to here the data that's in the default and so like I talked about we have our ETL cluster which has the most recent data that's just an injected and the most difficult data that's been aggregated so this really recent data so we wouldn't want anybody logging into that ETL cluster and running queries on big aggregates to go back one three years because that would invalidate the cache the depot would start pulling in that historical data and it was our assessing that historical data and evicting the recent data which would slow things out flow down that ETL pipelines so we didn't want that so we need to make sure that users whether their service accounts or human users are connecting to the right phone cluster and I mean we just created the adventure users with IPS and target groups to palm those pretty-pretty it was definitely something to think about lastly if you're like us and you're going to want to stop and start nodes you're going to have to have a service that does that for you we're where we built this very simple tool that basically monitors the queue and stops and starts subclusters accordingly we're hoping that that we can work with Vertica to have it be a little bit more driven by the cloud configuration itself so for us all amazon and we love it if we could have it have a scale with the with the with the eight of us can take through points do things to watch out for when when you're working with Eon is the first is system table queries on storage layer or metadata and the thing to be careful of is that the storage layer metadata is replicated it's caught as a copy for each of the sub clusters that are out there so we have the ETL sub cluster and our resources so for each of the five sub clusters there is a copy of all the data in storage containers system table all the data and partitions system table so when you want to use this new system tables for analyzing how much data you have or any other analysis make sure that you filter your query with a node name and so for us the node name is less than or equal to 64 because each of our sub clusters at 64 so we limit we limit the nodes to the to the 64 et 64 node ETL collector otherwise if we didn't have this filter we would get 5x the values for counts and some sort of stuff and lastly there is a problem that we're kind of working on and thinking about is a DC table data for sub clusters that are our stops when when the instances stopped literally the operating system is down and there's no way to access it so it takes the DC table DC table data with it and so I cannot after after my so close to scale up in the morning and then they scale down I can't run DC table queries on how what performed well and where and that sort of stuff because it's local to those nodes so we're working on something so something to be aware of and we're working on a solution or an implementation to try to suck that data out of all the notes you can those read only knows that stop and start all the time and bring it in to some other kind of repository perhaps another vertical cluster so that we can run analysis and monitoring even you want those those are down that's it um thanks for taking the time to look into my presentation really do it thank you Ron that was a tremendous amount of information thank you for sharing that with everyone um we have some questions come in that I would like to present to you Ron if you have a couple min it your first let's jump right in the first one a loading 85 terabytes per day of data is pretty significant amount what format does that data come in and what does that load process look like yeah a great question so the format is a tab separated files that are Jesus compressed and the reason for that could basically historical we don't have much tabs in our data and this is how how the data gets compressed and moved off of our our bidders the things that generate most of this data so it's a PSD gzip compressed and how you kind of we kind of have how we load it I would say we have actually kind of a Cadillac loader in a couple of different perspectives one is um we've got this autist raishin layer that's homegrown managing the logs is the data that gets loaded into Vertica and so we accumulate data and then we take we take some some files and we push them to redistribute them along the ETL nodes in the cluster and so we're literally pushing the file to through the nodes and we then run a copy statement to to ingest data in the database and then we remove the file from from the nodes themselves and so it's a little bit extra data movement which you may think about changing in the future assisting we move more and more to be on well the really nice thing about this especially for for the enterprise clusters is that the copy' statements are really fast and so we the coffee statements use memory but let's pick any other query but the performance of the cautery statement is really sensitive to the amount of available memory and so since the data is local to the nodes literally in the data directory that I referenced earlier it can access that data from the nvme stores and the kabhi statement runs very fast and then that memory is available to do something else and so we pay a little bit of cost in terms of latency and in terms of downloading the data to the nose we might as we move more and more PC on we might start ingesting it directly from s3 not copying the nodes first we'll see about that what's there that's how that's how we read the data interesting works great thanks Ron um another question what was the biggest challenge you found when migrating from on-prem to AWS uh yeah so um a couple of things that come to mind the first was the baculum the data load it was kind of a pain I mean like I referenced in that last slide only because I mean we didn't have tools built to do this so I mean we had to script some stuff out and it wasn't overly complex but yes it's just a lot of data to move I mean even with starting with with two petabytes so making sure that there there is no missed data no gaps making and moving it from the enterprise cluster so what we did is we exported it to the local disk on the enterprise buses and we then we push this history and then we ingested it in ze on again Allspark X oh so it's a lot of days to move around and I mean we have to you have to take an outage at some point stop loading data while we do that final kiss-up phase and so that was that was a challenge a sort of a one-time challenge the other saying that I mean we've been dealing with a week not that we're dealing with but with his challenge was is I mean it's relatively you can still throw totally new product for vertical and so we are big advantages of beyond is allow us to stop and start nodes and recently Vertica has gotten quite good at stopping in part starting nodes for a while there it was it was it took a really long time to start to Noah back up and it could be invasive but we worked with with the engineering team with Yan Zi and others to really really reduce that and now it's not really an issue that we think that we think too much about hey thanks towards the end of the presentation you had said that you've got 128 shards but you have your some clusters are usually around 64 nodes and you had talked about a ratio of two to one why is that and if you were to do it again would you use 128 shards ah good question so that is a reference the reason why is because we wanted to future professionals so basically we wanted to make sure that the number of stars was evenly divisible by the number of nodes and you could I could have done that was 64 I could have done that with 128 or any other multiple entities for but we went with 128 is to try to protect ourselves in the future so that if we wanted to double the number of nodes in the ECL phone cluster specifically we could have done that so that was double from 64 to 128 and then each node would have happened just one chart that it had would have to deal with so so no skew um the second part of question if I had to do it if I had to do it over again I think I would have done I think I would have stuck with 128 we still have I mean so we either running this cluster for more than 18 months now I think especially in USC and we haven't needed to increase the number of nodes so in that sense like it's been a little bit extra overhead having more shards but it gives us the peace of mind that we can easily double that and not have to worry about it so I think I think everyone is a nice place to start and you may even consider a three to one or four to one if if you're if you're expecting really rapid growth that you were just getting started with you on and your business and your gates that's a small now but what you expect to have them grow up significantly less powerful green thank you Ron that's with all the questions that we have out there for today if you do have others please feel free to send them in and we will get back to you and we'll respond directly via email and again our engineers will be available on the vertical forums where you can continue the discussion with them there I want to thank Ron for the great presentation and also the audience for your participation in questions please note that a replay of today's event and a copy of the slides will be available on demand shortly and of course we invite you to share this information with your colleagues as well again thank you and this concludes this webinar and have a great day you

Published Date : Mar 30 2020

SUMMARY :

stats on on the raw data sizes that we is so that we could have no skew but you

ENTITIES

Entity	Category	Confidence
Ron Cormier	PERSON	0.99+
seven	QUANTITY	0.99+
Ron	PERSON	0.99+
two	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
8 petabytes	QUANTITY	0.99+
122 gigs	QUANTITY	0.99+
85 terabytes	QUANTITY	0.99+
Excel	TITLE	0.99+
512 gigabytes	QUANTITY	0.99+
128 gigabyte	QUANTITY	0.99+
three nodes	QUANTITY	0.99+
three years	QUANTITY	0.99+
six nodes	QUANTITY	0.99+
each cluster	QUANTITY	0.99+
two petabytes	QUANTITY	0.99+
240	QUANTITY	0.99+
2 petabytes	QUANTITY	0.99+
16 cores	QUANTITY	0.99+
espn.com	OTHER	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Yan Yan	ORGANIZATION	0.99+
more than 18 months	QUANTITY	0.99+
today	DATE	0.99+
each cluster	QUANTITY	0.99+
one	QUANTITY	0.99+
one cluster	QUANTITY	0.99+
each	QUANTITY	0.99+
amazon	ORGANIZATION	0.99+
32 cores	QUANTITY	0.99+
ten thousand	QUANTITY	0.98+
each sub cluster	QUANTITY	0.98+
one cluster	QUANTITY	0.98+
72	QUANTITY	0.98+
seven terabytes	QUANTITY	0.98+
two dimensions	QUANTITY	0.98+
Two	QUANTITY	0.98+
5x	QUANTITY	0.98+
first one	QUANTITY	0.98+
first	QUANTITY	0.98+
eon	ORGANIZATION	0.98+
128	QUANTITY	0.98+
50	QUANTITY	0.98+
four gigs	QUANTITY	0.98+
s3	TITLE	0.98+
three and a half terabytes	QUANTITY	0.98+
this week	DATE	0.98+
64	QUANTITY	0.98+
8x	QUANTITY	0.97+
one chart	QUANTITY	0.97+
about ten new terabytes	QUANTITY	0.97+
one-time	QUANTITY	0.97+
two instances	QUANTITY	0.97+
Depot	ORGANIZATION	0.97+
last month	DATE	0.97+
five sub-clusters	QUANTITY	0.97+
two clusters	QUANTITY	0.97+
each node	QUANTITY	0.97+
five sub clusters	QUANTITY	0.96+

Distributed Data with Unifi Software

>> Narrator: From the Silicon Angle Media Office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Stu Miniman. >> Hi, I'm Stu Miniman and we're here at the east coast studio for Silicon Angle Media. Happy to welcome back to the program, a many time guest, Chris Selland, who is now the Vice President of strategic growth with Unifi Software. Great to see you Chris. >> Thanks so much Stu, great to see you too. >> Alright, so Chris, we'd had you in your previous role many times. >> Chris: Yes >> I think not only is the first time we've had you on since you made the switch, but also first time we've had somebody from Unifi Software on. So, why don't you give us a little bit of background of Unifi and what brought you to this opportunity. >> Sure, absolutely happy to sort of open up the relationship with Unifi Software. I'm sure it's going to be a long and good one. But I joined the company about six months ago at this point. So I joined earlier this year. I actually had worked with Unifi for a bit as partners. Where when I was previously at the Vertica business inside of HP/HP, as you know for a number of years prior to that, where we did all the work together. I also knew the founders of Unifi, who were actually at Greenplum, which was a direct Vertica competitor. Greenplum is acquired by EMC. Vertica was acquired by HP. We were sort of friendly respected competitors. And so I have known the founders for a long time. But it was partly the people, but it was really the sort of the idea, the product. I was actually reading the report that Peter Burris or the piece that Peter Burris just did on I guess wikibon.com about distributed data. And it played so into our value proposition. We just see it's where things are going. I think it's where things are going right now. And I think the market's bearing that out. >> The piece you reference, it was actually, it's a Wikibon research meeting, we run those weekly. Internally, we're actually going to be doing them soon we will be broadcasting video. Cause, of course, we do a lot of video. But we pull the whole team together, and it was one, George Gilbert actually led this for us, talking about what architectures do I need to build, when I start doing distributed data. With my background really more in kind of the cloud and infrastructure world. We see it's a hybrid, and many times a multi-cloud world. And, therefore, one of the things we look at that's critical is wait, if I've got things in multiple places. I've got my SAS over here, I've got multiple public clouds I'm using, and I've got my data center. How do I get my arms around all the pieces? And of course data is critical to that. >> Right, exactly, and the fact that more and more people need data to do their jobs these days. Working with data is no longer just the area where data scientists, I mean organizations are certainly investing in data scientists, but there's a shortage, but at the same time, marketing people, finance people, operations people, supply chain folks. They need data to do their jobs. And as you said where it is, it's distributed, it's in legacy systems, it's in the data center, it's in warehouses, it's in SAS applications, it's in the cloud, it's on premise, It's all over the place, so, yep. >> Chris, I've talked to so many companies that are, everybody seems to be nibbling at a piece of this. We go to the Amazon show and there's this just ginormous ecosystem that everybody's picking at. Can you drill in a little bit for what problems do you solve there. I have talked to people. Everything from just trying to get the licensing in place, trying to empower the business unit to do things, trying to do government compliance of course. So where's Unifi's point in this. >> Well, having come out of essentially the data warehousing market. And now of course this has been going on, of course with all the investments in HDFS, Hadoop infrastructure, and open source infrastructure. There's been this fundamental thinking that, well the answer's if I get all of the data in one place then I can analyze it. Well that just doesn't work. >> Right. >> Because it's just not feasible. So I think really and its really when you step back it's one of these like ah-ha that makes total sense, right. What we do is we basically catalog the data in place. So you can use your legacy data that's on the main frame. Let's say I'm a marketing person. I'm trying to do an analysis of selling trends, marketing trends, marketing effectiveness. And I want to use some order data that's on the main frame, I want some click stream data that's sitting in HDFS, I want some customer data in the CRM system, or maybe it's in Sales Force, or Mercado. I need some data out of Workday. I want to use some external data. I want to use, say, weather data to look at seasonal analysis. I want to do neighborhooding. So, how do I do that? You know I may be sitting there with Qlik or Tableau or Looker or one of these modern B.I. products or visualization products, but at the same time where's the data. So our value proposition it starts with we catalog the data and we show where the data is. Okay, you've got these data sources, this is what they are, we describe them. And then there's a whole collaboration element to the platform that lets people as they're using the data say, well yes that's order data, but that's old data. So it's good if you use it up to 2007, but the more current data's over here. Do things like that. And then we also then help the person use it. And again I almost said IT, but it's not real data scientists, it's not just them. It's really about democratizing the use. Because business people don't know how to do inner and outer joins and things like that or what a schema is. They just know, I'm trying do a better job of analyzing sales trends. I got all these different data sources, but then once I found them, once I've decided what I want to use, how do I use them? So we answer that question too. >> Yea, Chris reminds me a lot of some the early value propositions we heard when kind of Hadoop and the whole big data wave came. It was how do I get as a smaller company, or even if I'm a bigger company, do it faster, do it for less money than the things it use to be. Okay, its going to be millions of dollars and it's going to take me 18 months to roll out. Is it right to say this is kind of an extension of that big data wave or what's different and what's the same? >> Absolutely, we use a lot of that stuff. I mean we basically use, and we've got flexibility in what we can use, but for most of our customers we use HDFS to store the data. We use Hive as the most typical data form, you have flexibility around there. We use MapReduce, or Spark to do transformation of the data. So we use all of those open source components, and as the product is being used, as the platform is being used and as multiple users, cause it's designed to be an enterprise platform, are using it, the data does eventually migrate into the data lake, but we don't require you to sort of get it there as a prerequisite. As I said, this is one of the things that we really talk about a lot. We catalog the data where it is, in place, so you don't have to move it to use it, you don't have to move it to see it. But at the same time if you want to move it you can. The fundamental idea I got to move it all first, I got to put it all in one place first, it never works. We've come into so many projects where organizations have tried to do that and they just can't, it's too complex these days. >> Alright, Chris, what are some of the organizational dynamics you're seeing from your customers. You mention data scientist, the business users. Who is identifying, whose driving this issues, whose got the budget to try to fix some of these challenges. >> Well, it tends to be our best implementations are driven really, almost all of them these days, are driven by used cases. So they're driven by business needs. Some of the big ones. I've sort of talked about customers already, but like customer 360 views. For instance, there's a very large credit union client of ours, that they have all of their data, that is organized by accounts, but they can't really look at Stu Miniman as my customer. How do I look at Stu's value to us as a customer? I can look at his mortgage account, I can look at his savings account, I can look at his checking account, I can look at his debit card, but I can't just see Stu. I want to like organize my data, that way. That type of customer 360 or marketing analysis I talked about is a great use case. Another one that we've been seeing a lot of is compliance. Where just having a better handle on what data is where it is. This is where some of the governance aspects of what we do also comes into play. Even though we're very much about solving business problems. There's a very strong data governance. Because when you are doing things like data compliance. We're working, for instance, with MoneyGram, is a customer of ours. Who this day and age in particular, when there's money flows across the borders, there's often times regulators want to know, wait that money that went from here to there, tell me where it came from, tell me where it went, tell me the lineage. And they need to be able to respond to those inquiries very very quickly. Now the reality is that data sits in all sorts of different places, both inside and outside of the organization. Being able to organize that and give the ability to respond more quickly and effectively is a big competitive advantage. Both helps with avoiding regulatory fines, but also helps with customers responsiveness. And then you've got things GDPR, the General Data Protection Regulation, I believe it is, which is being driven by the EU. Where its sort of like the next Y2K. Anybody in data, if they are not paying attention to it, they need to be pretty quick. At least if they're a big enough company they're doing business in Europe. Because if you are doing business with European companies or European customers, this is going to be a requirement as of May next year. There's a whole 'nother set of how data's kept, how data's stored, what customers can control over data. Things like 'Right to Be Forgotten'. This need to comply with regulatory... As data's gotten more important, as you might imagine, the regulators have gotten more interested in what organizations are doing with data. Having a framework with that, organizes and helps you be more compliant with those regulations is absolutely critical. >> Yeah, my understanding of GDPR, if you don't comply, there's hefty fines. >> Chris: Major Fines. >> Major Fines. That are going to hit you. Does Unifi solve that? Is there other re-architecture, redesign that customers need to do to be able to be compliant? [speaking at The same Time] >> No, no that's the whole idea again where being able to leave the data where it is, but know what it is and know where it is and if and when I need to use it and where it came from and where it's going and where it went. All of those things, so we provide the platform that enables the customers to use it or the partners to build the solutions for their customers. >> Curious, customers, their adoption of public cloud, how does that play into what you are doing? They deploy more SAS environments. We were having a conversation off camera today talking about the consolidation that's happening in the software world. What does those dynamics mean for your customers? >> Well public cloud is obviously booming and growing and any organization has some public cloud infrastructure at this point, just about any organization. There's some very heavily regulated areas. Actually health care's probably a good example. Where there's very little public cloud. But even there we're working with... we're part of the Microsoft Accelerator Program. Work very closely with the Azure team, for instance. And they're working in some health care environments, where you have to be things like HIPAA compliant, so there is a lot of caution around that. But none the less, the move to public cloud is certainly happening. I think I was just reading some stats the other day. I can't remember if they're Wikibon or other stats. It's still only about 5% of IT spending. And the reality is organizations of any size have plenty of on-prem data. And of course with all the use of SAS solutions, with Salesforce, Workday, Mercado, all of these different SAS applications, it's also in somebody else's data center, much of our data as well. So it's absolutely a hybrid environment. That's why the report that you guys put out on distributed data, really it spoke so much to what out value proposition is. And that's why you know I'm really glad to be here to talk to you about it. >> Great, Chris tell us a little bit, the company itself, how many employees you have, what metrics can you share about the number of customers, revenue, things like that. >> Sure, no, we've got about, I believe about 65 people at the company right now. I joined like I said earlier this year, late February, early March. At that point we we were like 40 people, so we've been growing very quickly. I can't get in too specifically to like our revenue, but basically we're well in the triple digit growth phase. We're still a small company, but we're growing quickly. Our number of customers it's up in the triple digits as well. So expanding very rapidly. And again we're a platform company, so we serve a variety of industries. Some of the big ones are health care, financial services. But even more in the industries it tends to be driven by these used cases I talked about as well. And we're building out our partnerships also, so that's a big part of what I do also. >> Can you share anything about funding where you are? >> Oh yeah, funding, you asked about that, sorry. Yes, we raised our B round of funding, which closed in March of this year. So we [mumbles], a company called Pelion Venture Partners, who you may know, Canaan Partners, and then most recently Scale Venture Partners are investors. So the companies raised a little over $32 million dollars so far. >> Partnerships, you mentioned Microsoft already. Any other key partnerships you want to call out? >> We're doing a lot of work. We have a very broad partner network, which we're building up, but some of the ones that we are sort of leaning in the most with, Microsoft is certainly one. We're doing a lot of work guys at Cloudera as well. We also work with Hortonworks, we also work with MapR. We're really working almost across the board in the BI space. We have spent a lot of time with the folks at Looker. Who was also a partner I was working with very closely during my Vertica days. We're working with Qlik, we're working with Tableau. We're really working with actually just about everybody in sort of BI and visualization. I don't think people like the term BI anymore. The desktop visualization space. And then on public cloud, also Google, Amazon, so really all the kind of major players. I would say that they're the ones that we worked with the most closely to date. As I mentioned earlier we're part of the Microsoft Accelerator Program, so we're certainly very involved in the Microsoft ecosystem. I actually just wrote a blog post, which I don't believe has been published yet, about some of the, what we call the full stack solutions we have been rolling out with Microsoft for a few customers. Where we're sitting on Azure, we're using HDInsight, which is essentially Microsoft's Hadoop cloud Hadoop distribution, visualized empower BI. So we've really got to lot of deep integration with Microsoft, but we've got a broad network as well. And then I should also mention service providers. We're building out our service provider partnerships also. >> Yeah, Chris I'm surprised we haven't talked about kind of AI yet at all, machine learning. It feels like everybody that was doing big data, now has kind pivoted in maybe a little bit early in the buzz word phase. What's your take on that? You've been apart of this for a while. Is big data just old now and we have a new thing, or how do you put those together? >> Well I think what we do maps very well until, at least my personal view of what's going on with AI/ML, is that it's really part of the fabric of what our product does. I talked before about once you sort of found the data you want to use, how do I use it? Well there's a lot of ML built into that. Where essentially, I see these different datasets, I want to use them... We do what's called one click functions. Which basically... What happens is these one click functions get smarter as more and more people use the product and use the data. So that if I've got some table over here and then I've got some SAS data source over there and one user of the product... or we might see field names that we, we grab the metadata, even though we don't require moving the data, we grab the metadata, we look at the metadata and then we'll sort of tell the user, we suggest that you join this data source with that data source and see what it looks like. And if they say: ah that worked, then we say oh okay that's part of sort of the whole ML infrastructure. Then we are more likely to advise the next few folks with the one click function that, hey if you trying to do a analysis of sales trends, well you might want to use this source and that source and you might want to join them together this way. So it's a combination of sort of AI and ML built into the fabric of what we do, and then also the community aspect of more and more people using it. But that's, going back to your original question, That's what I think that... There was quote, I'll misquote it, so I'm not going to directly say it, but it was just.. I think it might have John Ferrier, who was recently was talking about ML and just sort of saying you know eventually we're not going to talk about ML anymore than we talk about phone business or something. It's just going to become sort of integrated into the fabric of how organizations do business and how organizations do things. So we very much got it built in. You could certainly call us an AI/ML company if you want, its actually definitely part of our slide deck. But at the same time its something that will just sort of become a part of doing business over time. But it really, it depends on large data sets. As we all know, this is why it's so cheap to get Amazon Echoes and such these days. Because it's really beneficial, because the more data... There's value in that data, there was just another piece, I actually shared it on Linkedin today as a matter of fact, about, talking about Amazon and Whole Foods and saying: why are they getting such a valuation premium? They're getting such a valuation premium, because they're smart about using data, but one of the reasons they're smart about using the data is cause they have the data. So the more data you collect, the more data you use, the smarter the systems get, the more useful the solutions become. >> Absolutely, last year when Amazon reinvented, John Ferrier interviewed Andy Jassy and I had posited that the customer flywheel, is going to be replaced by that data flywheel. And enhanced to make things spin even further. >> That's exactly right and once you get that flywheel going it becomes a bigger and bigger competitive advantage, by the way that's also why the regulators are getting interested these days too, right? There's sort of, that flywheel going back the other way, but from our perspective... I mean first of all it just makes economic sense, right? These things could conceivably get out of control, that's at least what the regulators think, if you're not careful at least there's some oversight and I would say that, yes probably some oversight is a good idea, so you've got kind of flywheels pushing in both directions. But one way or another organizations need to get much smarter and much more precise and prescriptive about how they use data. And that's really what we're trying to help with. >> Okay, Chris want to give you the final word, Unify Software, you're working on kind of the strategic road pieces. What should we look for from you in your segment through the rest of 2017? >> Well, I think, I've always been a big believer, I've probably cited 'Crossing the Chasm' like so many times on theCUBE, during my prior HP 10 year and such but you know, I'm a big believer and we should be talking about customers, we should be talking about used cases. It's not about alphabet soup technology or data lakes, it's about the solutions and it's about how organizations are moving themselves forward with data. Going back to that Amazon example, so I think from us, yes we just released 2.O, we've got a very active blog, come by unifisoftware.com, visit it. But it's also going to be around what our customers are doing and that's really what we're going to try to promote. I mean if you remember this was also something, that for all the years I've worked with you guys I've been very much... You always have to make sure that the customer has agreed to be cited, it's nice when you can name them and reference them and we're working on our customer references, because that's what I think is the most powerful in this day and age, because again, going back to my, what I said before about, this is going throughout organizations now. People don't necessarily care about the technology infrastructure, but they care about what's being done with it. And so, being able to tell those customer stories, I think that's what you're going to probably see and hear the most from us. But we'll talk about our product as much as you let us as well. >> Great thing, it reminds me of when Wikibon was founded it was really about IT practice, users being able to share with their peers. Now when the software economy today, when they're doing things in software often that can be leveraged by their peers and that flywheel that they're doing, just like when Salesforce first rolled out, they make one change and then everybody else has that option. We're starting to see that more and more as we deploy as SAS and as cloud, it's not the shrink wrap software anymore. >> I think to that point, you know, I was at a conference earlier this year and it was an IT conference, but I was really sort of floored, because when you ask what we're talking about, what the enlightened IT folks and there is more and more enlightened IT folks we're talking about these days, it's the same thing. Right, it's how our business is succeeding, by being better at leveraging data. And I think the opportunities for people in IT... But they really have to think outside of the box, it's not about Hadoop and Sqoop and Sequel and Java anymore it's really about business solutions, but if you can start to think that way, I think there's tremendous opportunities and we're just scratching the surface. >> Absolutely, we found that really some of the proof points of what digital transformation really is for the companies. Alright Chris Selland, always a pleasure to catch up with you. Thanks so much for joining us and thank you for watching theCUBE. >> Chris: Thanks too. (techno music)

Published Date : Aug 2 2017

SUMMARY :

Narrator: From the Silicon Angle Media Office Great to see you Chris. we'd had you in your previous role many times. I think not only is the first time we've had you on But I joined the company about six months ago at this point. And of course data is critical to that. it's in legacy systems, it's in the data center, I have talked to people. the data warehousing market. So I think really and its really when you step back and it's going to take me 18 months to roll out. But at the same time if you want to move it you can. You mention data scientist, the business users. and give the ability to respond more quickly Yeah, my understanding of GDPR, if you don't comply, that customers need to do to be able to be compliant? that enables the customers how does that play into what you are doing? to be here to talk to you about it. what metrics can you share about the number of customers, But even more in the industries it tends to be So the companies raised a little Any other key partnerships you want to call out? so really all the kind of major players. in the buzz word phase. So the more data you collect, the more data you use, and I had posited that the customer flywheel, There's sort of, that flywheel going back the other way, What should we look for from you in your segment that for all the years I've worked with you guys We're starting to see that more and more as we deploy I think to that point, you know, and thank you for watching theCUBE. Chris: Thanks too.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
George Gilbert	PERSON	0.99+
John Ferrier	PERSON	0.99+
Unifi	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Chris Selland	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Pelion Venture Partners	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Greenplum	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Google	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Stu	PERSON	0.99+
Unifi Software	ORGANIZATION	0.99+
Whole Foods	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Canaan Partners	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
EMC	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
last year	DATE	0.99+
Looker	ORGANIZATION	0.99+
May next year	DATE	0.99+
EU	ORGANIZATION	0.99+
late February	DATE	0.99+
40 people	QUANTITY	0.99+
18 months	QUANTITY	0.99+
MoneyGram	ORGANIZATION	0.99+
Qlik	ORGANIZATION	0.99+
HP/HP	ORGANIZATION	0.99+
Scale Venture Partners	ORGANIZATION	0.99+
360 views	QUANTITY	0.99+
one	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
early March	DATE	0.99+
Echoes	COMMERCIAL_ITEM	0.99+
Both	QUANTITY	0.99+
Tableau	ORGANIZATION	0.99+
millions of dollars	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
both	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Linkedin	ORGANIZATION	0.98+
one click	QUANTITY	0.98+
one place	QUANTITY	0.98+
Java	TITLE	0.98+
2007	DATE	0.98+
over $32 million	QUANTITY	0.98+
today	DATE	0.98+
Spark	TITLE	0.98+
HIPAA	TITLE	0.98+
first time	QUANTITY	0.98+
earlier this year	DATE	0.98+
unifisoftware.com	OTHER	0.98+
10 year	QUANTITY	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for HP/HP: