John Landry, HP - Spark Summit East 2017 - Spark Summit East 2017 - #SparkSummit - #theCUBE
>> Live from Boston, Massachusetts, this is the CUBE, covering Spark Summit East 2017 brought to you by databricks. Now, here are your hosts Dave Valante and George Gilbert. >> Welcome back to Boston everyone. It's snowing like crazy outside, it's a cold mid-winter day here in Boston but we're here with the CUBE, the world-wide leader in tech coverage. We are live covering Spark Summit. This is wall to wall coverage, this is our second day here. John Landry with us, he's the distinguished technologist for HP's personal systems data science group within Hewlett Packard. John, welcome. >> Thank you very much for having me here. >> So I was saying, I was joking, we do a lot of shows with HPE, it's nice to have HP back on the CUBE, it's been awhile. But I want to start there. The company split up just over a year ago and it's seemingly been successful for both sides but you were describing to us that you've gone through an IT transformation of sorts within HP. Can you describe that? >> In the past, we were basically a data warehousing type of approach with reporting and what have you coming out of data warehouses, using Vertica, but recently, we made an investment into more of a programming platform for analytics and so where transformation to the cloud is about that where we're basically instead of investing into our own data centers because really, with the split, our data centers went with Hewlett Packard Enterprise, is that we're building our software platform in the cloud and that software platform includes analytics and in this case, we're building big data on top of Spark and so that transformation is huge for us, but it's also enabled us to move a lot faster, the velocity of our business and to be able to match up to that better. Like I said, it's mainly around the software development really more than anything else. >> Describe your role in a little bit more detail inside of HP. >> My role is I'm the leader in our big data investments and so I've been leading teams internally and also collaborating across HP with our print group and what we've done is we've managed to put together a strategy around our cloud-based solution to that. One of the things that was important was we had a common platform because when you put a program platform in place, if it's not common, then we can't collaborate. Our investment could be fractured, we could have a lot of side little efforts going on and what have you so my role is to pry the leadership in the direction for that and also one of the reasons I'm here today is to get involved in the Spark community because our investment is in Spark so that's another part of my role is to get involved with the industry and to be able to connect with the experts in the industry so we can leverage off of that because we don't have that expertise internally. >> What are the strategic and tactical objectives of your analytics initiatives? Is it to get better predictive maintenance on your devices? Is it to create new services for customers? Can you describe that? >> It's two-fold, internal and external so internally, we got millions of dollars of opportunity to better our products with cost, also to optimize our business models and the way we can do that is by using the data that comes back from our products, our services, our customers, combining that together and creating models around that that are then automated and can be turned into apps that can be used internally by our organizations. The second part is to take the same approach, same data, but apply that back towards our customers and so with the split, our enterprise services group also went with Hewlett Packard Enterprise and so now, we have a dedicated effort towards creating manage services for the commercial environment. And that's both on the print size and on the personal system side so to basically fuel that, analytics is a big part of the story. So we've had different things that you'll see out there like touch point manager is one of our services we're delivering in personal systems. >> Dave: What is that? >> Touch point manager is aimed at providing management services for SMB and for commercial environments. So for instance, in touch point manager, we can provide predictive type of capabilities for support. A number of different services that companies are looking for when they buy our products. Another thing we're going after too is device as a service. So there's another thing that we've announced recently that basically we're invested into there and so this is obviously if you're delivering devices as a service, you want to do that as optimal as possible. Well, being able to understand the devices, what's happening with them, been able to predictive support on them, been able to optimize the usage of those devices, that's all important. >> Dave: A lot of data. >> The data really helps us out, right? So the data that we can collect back from our devices and to be able to take that and turn that around into applications that are delivering information inside or outside is huge for us, a huge opportunity. >> It's interesting where you talk about internal initiatives and manage services, which sound like they're most external, but on the internal ones, you were talking about taking customer data and internal data and turning those into live models. Can you elaborate on that? >> Sure, I can give you a great example is on our mobile products, they all have batteries. All of our batteries are instrumented as smart batteries and that's an industry standard but HP actually goes a step further on that, it's the information that we put into our batteries. So by monitoring those batteries and the usage in the field is we can tell how optimally they're performing, but also how they're being used and how we can better design batteries going forward. So in addition, we can actually provide information back into our supply chain. For instance, there's a cell supplier for the battery, there's a pack supplier, there's our unit manufacturer for the product, and so a lot of things that we've been able to uncover is that we can go and improve process. And so improving process alone helps to improve the quality of what we deliver and the quality of the experience to our customers. So that's one example of just using the data, turning that around into a model. >> Is there an advantage to having such high volume, such market share in getting not just more data, but sort of more of the bell curve, so you get the edge conditions? >> Absolutely, it's really interesting because when we started out on this, everybody's used to doing reporting which is absolute numbers and how much did you shift and all that kind of stuff. But, we're doing big data, right? So in big data, you just need a good sample population. Turn the data scientist into that and they've got their statistical algorithms against that. They give you the confidence factor based upon the data that you have so it's absolutely a good factor for us because we don't have to see all the platforms out there. Then, the other thing is, when you look at populations, we see variances in different customers so we're looking at, like one of our populations that's very valuable to us is our own, so we take the 60 thousand units that we have internally at HP and that's one of our sample populations. What a better way to get information on your own products? But, you take that and you take it to one of our other customers and their population's going to look slight different. Why? Because they use the products differently. So one of the things is just usage of the products, the environment they're used in, how they use them. Our sample populations are great in that respect. Of course, the other thing is, very important to point out, we only collect data under the rules and regulations that are out there, so we absolutely follow that and we absolutely keep our data secure and we absolutely keep everything and that's important. Sometimes, today they get a little bit spooked sometimes around that, but the case is that our services are provided based on customers signing up for them. >> I'm guessing you don't collect more data than Google. >> No, we're nowhere near Google. >> So, if you're not spooked at Google - >> That's what I tell people. I say if you got a smartphone, you're giving up a lot more data than we're collecting. >> Buy something from Amazon. Spark, where does Spark fit into all of this? >> Spark is great because we needed a programming platform that could scale in our data centers and in our previous approaches, we didn't have a programming platform. We started with a Hadoop, the Hadoop was very complex though. It really gets down to the hardware and you're programming and trying to distribute that load and getting clusters and you pick up Spark and immediately abstraction. The other thing is it allows me to hire people that can actually program on top of it. I don't have to get someone that knows Map Reduce. I can sit there and it's like what do you know? You know R, Scala, you know Python, it doesn't matter. I can run all of that on top of it. So that's huge for us. The other thing is flat out the speed because as you start getting going with this, we get this pull all of a sudden. It's like well I only need the data like once a month, it's like I need it once a week, I need it once a day, I need the output of this by the hour now. So, the scale and the speed of that is huge and then when you put that on the cloud platform, you know, Spark on a cloud platform like Amazon, now I've got access to all the compute instances. I can scale that, I can optimize it because I don't always need all the power. The flexibility of Spark and being able to deliver that is huge for our success. >> So, I've got to ask some columbo questions and George, maybe you can help me sort of frame it. So you mentioned you were using Hadoop. Like a lot of Hadoop practitioners, you found it very complex. Now, Hewlett Packard has resources. Many companies don't but so you mentioned people out doing Python and R and Scale and Map Reduce, are you basically saying okay, we're going to unify portions of our Hadoop complexity with Spark and that's going to simplify our efforts? >> No, what we actually did was we started on the Hadoop side of it. The first thing we did was try to move from a data warehouse to more of a data lake approach or repository and that was internal, right? >> Dave: And that was a cost reduction? >> That was a cost reduction but also, data accessibility. >> Dave: Yeah, okay. >> The other thing we did was ingesting the data. When you're starting to bring data in from millions of devices, we had a problem coming through the firewall type approach and you got to have something in front of that like a Kafka or something in front of it that can handle it. So when we moved to the cloud, we didn't even try to put up our own, we just used Kinesis and that we didn't have to spend any resources to go solve that problem. Well, the next thing was, when we got the data, you need to ingest the data in and our data's coming in, we want to split it out, we needed to clean it and what you, we actually started out running Java and then we ran Java on top of Hadoop, but then we came across Spark and we said that's it. For us to go to the next step of actually really get into Hadoop, we were going to have to get some more skills and to find the skills to actually program in Hadoop was going to be complex. And to train them organically was going to be complex. We got a lot of smart people, but- >> Dave: You got a lot of stuff to do, too. >> That's the thing, we wanted to spend more time getting information out of the data as opposed to the framework of getting it to run and everything. >> Dave: Okay, so there's a lot of questions coming out. You mentioned Kinesis, so you've replaced that? >> Yeah, when we went to the cloud, we used as many Amazon services as we can as opposed to growing something for ourselves so when we get onto Amazon, you know, getting data into an S3 bucket through Kineses was a no-brainer. When we transferred over to the cloud, it took us less than 30 days to point our devices at Kinesis and we had all our data flowing into S3. So that was like wow, let's go do something else. >> So I got to ask you something else. Again, I love when practitioners come on. So, one of the complaints that I hear sometimes from AWS users and I wonder if you see this is the data pipeline is getting more and more complex. I got an API for Kinesis, one for S3, one for DynamoDB, one for Elastic Plus. There must be 15 proprietary APIs that are primitive, and again, it gets complicated and sometimes it's hard to even figure out what's the right cost model to use. Is that increasingly becoming more complex or is it just so much simpler than what you had before and you're in nirvana right now? >> When you mentioned costs, just the cost of moving to the cloud was a major cost reduction for us. >> Reduction? >> So now it's - >> You had that HP corporate tax on you before - >> Yeah, now we're going from data centers and software licenses. >> So that was a big win for you? >> Yeah, huge, and that released us up to go spend dollars on resources to focus on the data science aspect. So when we start looking at it, we continually optimized, don't get me wrong. But, the point is, if we can bring it up real quickly, that's going to save us a lot of money even if you don't have to maintain it. So we want to focus on creating the code inside of Spark that's actually doing the real work as opposed to the infrastructure. So that cost savings was huge. Now, when you look at it over time, we could've over analyzed that and everything else, but what we did was we used a rapid prototyping approach and then from there, we continued to optimize. So what's really good about the cloud is you can predict the cost and with internal data centers and software licenses and everything else, you can't predict the cost because everybody's trying to figure out who's paying for what. But in the case of the cloud, it's all pretty much you get your bill and you understand what you're paying. So anyway - >> And then you can adjust accordingly? >> We continue to optimize so we use the services but if we have for some reason, it's going to deliver us an advantage, we'll go develop it. But right now, our advantage is we got umteen opportunities to create AI type code and applications to basically automate these services, we don't even have enough resources to do it right now. But, the common programming platform's going to help us. >> Can you drill into those umpteen examples? Just some of them because - >> I mentioned the battery one for instance. So take that across the whole system so now you've got your storage devices, you've got your software that's running on there, we've got built into our system security monitoring at the firmware level just basically connecting into that and adding AI around that is huge because now we can see a tax that may be happening upon your fleet and we can create services out of that. Anything that you can automate around that is money in our pocket or money in our customers' pocket so if we can save them money with these new services, they're going to be more willing to come to HP for products. >> It's actually more than just automation because it's the stuff you couldn't do with 1,000 monkeys trying to write Shakespeare. You have data that you could not get before. >> You're right, what we're doing, the automation is helping us uncover things that we would've never seen and you're right, the whole gorilla walking through the room, I could sit there and I could show you tons of examples of where we're missing the boat. Even when we brought up our first data sets, we started looking at them and some of the stuff we looked at, we thought this is just bad data and actually it wasn't, it was bad product. >> People talk about dark data - >> We had no data models, we had no data model to say is it good or bad? And now we have data models and we're continuing to create those data models around, you create the data model and then you can continue to teach it and that's where we create the apps around it. Our primitives are the data models that we're creating from the device data that we have. >> Are there some of these apps where some of the intelligence lives on the device and it can, like in a security attack, it's a big surface area, you want to lock it down right away. >> We do. The good example on the security is we built something into our products called Sure Start. What essentially it is is we have ability to monitor the firmware layer and so there's a local process that's running independent of everything else that's running that's monitoring what's happening at that firmware level. Well, if there's an attack, it's going to immediately prevent the attack or recover from the attack. Well, that's built into the product. >> But it has to have a model of what this anomalous behavior is. >> Well in our case, we're monitoring what the firmware should look like and if we see that the firmware, you know you take check sums from the firmware or the pattern - >> So the firmware does not change? >> Well basically we can take the characteristics of the firmware and monitor it. If we see that changing, then we know something's wrong. Now it can get corrupt through hardware failure maybe because glitches can happen maybe. I mean solar flares can cause problems sometimes. So, the point is we've found that customers had problems sometimes where basically their firmware would get corrupted and they couldn't start their system. So we're like are we getting attacked? Is this a hardware issue? Could it be bad Flash devices? There's always all kinds of things that could cause that. Well now we monitor it and we know what's going on. Now, the other cool thing is we create logs from that so when those events occur, we can collect those logs and we're monitoring those events so now we can have something monitor the logs that are monitoring all the units. So, if you've got millions of units out there, how are you going to do that manually? You can't and that's where the automation comes in. >> So the logs give you the ability up in the cloud or at HP to look at the ecosystem of devices, but there is intelligence down on the - >> There's intelligence to protect the device in an auto recover which is really cool. So in the past, you had to get your repair. Imagine if someone attacked your fleet of notebooks. Say you got 10 thousand of them and basically it brought every single one of them down one day. What would you do? >> Dave: Freak. >> And everything you got to replace. It was just an attack and it could happen so we basically protect against that with our products and at the same time, we can see that may be a current and then from the footprints of it, we can then do analysis on it and determine was that malicious, is this happening because of a hardware issue, is this happening because maybe we tried to update the firmware and something happened there? What caused that to happen? And so that's where collecting the data from the population then helps us do that and then mix that with other things like service events. Are we seeing service events being driven by this? Thermal, we can look at the thermal data. Maybe there's some kind of heat issue that's causing this to happen. So we starting mixing that. >> Did Samsung come calling to buy this? >> Well, actually what's funny is Samsung is actually a supplier of ours, is a battery supplier of ours. So, by monitoring the batteries, what's interesting is we're helping them out because we go back to them. One of the things I'm working on, is we want to create apps that can go back to them and they can see the performance of their product that they're delivering to us. So instead of us having to call a meeting and saying hey guys let's talk about this, we've got some problems here. Imagine how much time that takes. But if they can self-monitor, then they're going to want to keep supplying to us, then they're going to better their product. >> That's huge. What a productivity boost because you're like hey, we got a problem, let's meet and talk about it and then you take an action to go and figure out what it is. Now if you need a meeting, it's like let's look at the data. >> Yeah, you don't have enough people. >> But there's also potentially a shift in pricing power. I would imagine it shifts a little more in your favor if you have all the data that indicates the quality of their product. >> That's an interesting thing. I don't know that we've reached that point. I think that in the future, it would be something that could be included in the contracts. The fact that the world is the way it is today and data is a big part of that to where going forward, absolutely, the fact that you have that data helps you to better have a relationship with your suppliers. >> And your customers, I mean it used to be that the brand used to have all the information. The internet obviously changed all that, but this whole digital transformation and IOT and all those log data, that sort of levels the playing field back to the brand. >> John: It actually changes it. >> You can now add value for the consumer that you couldn't before. >> And that's what HP's trying to do. We're invested to exactly do that is to really improve or increase the value of our brand. We have a strong brand today but - >> What do you guys do with - we got to wrap - but what do you do with databricks? What's the relationship there? >> Databricks, again we decided that we didn't want to be the experts on managing the whole Spark thing. The other part was that we're going to be involved with Spark and help them drive the direction as far as our use cases and what have you. Databricks and Spark go hand in hand. They got the experts there and it's been huge, our relationship, being able to work with these guys. But I recognize the fact that, and going back to software development and everything else, we don't want to spare resources on that. We got too many other things to do and the less that I have to worry about my Spark code running and scaling and the cost of it and being able to put code in production, the better and so, having that layer there is saving us a ton of money and resources and a ton of time. Just imagine time to market, it's just huge. >> Alright, John, sorry we got to wrap. Awesome having you on, thanks for sharing your story. >> It's great to talk to you guys. >> Alright, keep it right there everybody. We'll be back with our next guest. This is the CUBE live from Spark Summit East, we'll be right back.
SUMMARY :
brought to you by databricks. the world-wide leader in tech coverage. we do a lot of shows with HPE, In the past, we were basically a data warehousing bit more detail inside of HP. One of the things that was important was we had a common the way we can do that is by using the data we can provide predictive type of capabilities for support. So the data that we can collect back from our devices It's interesting where you talk about internal and the quality of the experience to our customers. Then, the other thing is, when you look at populations, I say if you got a smartphone, you're giving up Spark, where does Spark fit into all of this? and then when you put that on the cloud platform, and that's going to simplify our efforts? and that was internal, right? and to find the skills to actually program That's the thing, we wanted to spend more time Dave: Okay, so there's a lot of questions coming out. so when we get onto Amazon, you know, getting data into So I got to ask you something else. of moving to the cloud was a major cost reduction for us. Yeah, now we're going from But, the point is, if we can bring it up real quickly, We continue to optimize so we use the services So take that across the whole system because it's the stuff you couldn't do with that we would've never seen and you're right, And now we have data models and we're continuing intelligence lives on the device and it can, The good example on the security is we built But it has to have a model of what Now, the other cool thing is we create logs from that So in the past, you had to get your repair. and at the same time, we can see that may be a current of their product that they're delivering to us. and then you take an action to go if you have all the data that indicates and data is a big part of that to where the playing field back to the brand. that you couldn't before. is to really improve or increase the value of our brand. and the less that I have to worry about Alright, John, sorry we got to wrap. This is the CUBE live from Spark Summit East,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
John | PERSON | 0.99+ |
George | PERSON | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Boston | LOCATION | 0.99+ |
John Landry | PERSON | 0.99+ |
Hewlett Packard | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
10 thousand | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
ORGANIZATION | 0.99+ | |
Samsung | ORGANIZATION | 0.99+ |
Spark | ORGANIZATION | 0.99+ |
second day | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
second part | QUANTITY | 0.99+ |
60 thousand units | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
Hadoop | TITLE | 0.99+ |
less than 30 days | QUANTITY | 0.99+ |
millions of dollars | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Hewlett Packard Enterprise | ORGANIZATION | 0.99+ |
once a month | QUANTITY | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
both sides | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
1,000 monkeys | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
once a week | QUANTITY | 0.98+ |
once a day | QUANTITY | 0.98+ |
15 proprietary APIs | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
one day | QUANTITY | 0.98+ |
Map Reduce | TITLE | 0.97+ |
Spark Summit East 2017 | EVENT | 0.97+ |
first data sets | QUANTITY | 0.97+ |
two-fold | QUANTITY | 0.97+ |
Spark Summit | EVENT | 0.96+ |
R | TITLE | 0.96+ |
a ton | QUANTITY | 0.95+ |
millions of units | QUANTITY | 0.95+ |
Scale | TITLE | 0.95+ |
Kafka | TITLE | 0.94+ |
Shakespeare | PERSON | 0.94+ |
S3 | TITLE | 0.94+ |
John Cavanaugh, HP - #SparkSummit - #theCUBE
>> Announcer: Live from San Francisco, it's theCube, covering Spark Summit 2017, brought to you by Databricks. >> Welcome back to theCube at Spark Summit 2017. I don't know about you, George, I'm having a great time learning from all of our attendees. >> We've been absorbing now for almost two days. >> Yeah, well, and we're about to absorb a little bit more here, too, because the next guest, I looking forward to, I saw his name on the schedule, all right, that's the guy who talks about herding cats, it's John Cavanaugh, Master Architect from HP. John, welcome to the show. >> Great, thanks for being here. >> Well, I did see, I don't know if it's about cats in the Internet, but either cats or self-driving cars, one of the two in analogies. But talk to us about your session. Why did you call it Herding Cats, and is that related to maybe the organization at HP? >> Yeah, there's a lot of organizational dynamics as part of our migration at Spark. HP is a very distributed organization, and it has had a lot of distributed autonomy, so, you know, trying to get centralized activity is often a little challenging. You guys have often heard, you know, I am from the government, I'm here to help. That's often the kind of shields-up response you will get from folks, so we got a lot of dynamics in terms of trying to bring these distributed organizations on board to a new common platform, and a allay many of the fears that they had with making any kind of a change. >> So, are you centered at a specific division? >> So, yes, I'm the print platforms and future technology group. You know, there's two large business segments with HP. There's our personal systems group that produces everything from phones to business PCs to high-end gaming. But I'm in the printing group, and while many people are very familiar with your standard desktop printer, you know, the printers we sell really vary from a very small product we call Sprocket, it fits in your hand, battery-operated, to literally a web press that's bigger than your house and prints at hundreds of feet per minute. So, it's a very wide product line, and it has a lot of data collection. >> David: Do you have 3D printing as well? >> We do have 3D printing as well. That's an emergent area for us. I'm not super familiar with that. I'm mostly on the 2D side, but that's a very exciting space as well. >> So tell me about what kind of projects that you're working on that do require that kind of cross-team or cross-departmental cooperation. >> So, you know, in my talk, I talked about the Wild West Era of Big Data, and that was prior to 2015, and we had a lot of groups that were standing up all kinds of different big data infrastructures. And part of this stems from the fact that we were part of HP at the time, and we could buy servers and racks of servers at cost. Storage was cheap, all these things, so they sprouted up everywhere. And, around 2015, everybody started realizing, oh my God, this is completely fragmented. How do we pull things back together? And that's when a lot of groups started trying to develop platformish types of activities, and that's where we knew we needed to go, but there was even some disagreement from different groups, how do we move forward. So, there's been a lot of good work within HP in terms of creating a virtual community, and Spark really kind of caught on pretty quickly. Many people were really tired of kind of Hadoop. There were a lot of very opinionated models in Hadoop, where Spark opens up a lot more into the data science community. So, that went really well, and we made a big push into AWS for much of our cloud activities, and we really ended up then pretty quickly with Databricks as an enterprise partner for us. >> And so, George, you've done a lot of research. I'm sure you talked to enterprise companies along the way. Is this a common issue with big enterprises? >> Well, for most big data projects they've started, the ones we hear a lot about is there's a mandate from the CIO, we need a big data strategy, and so some of those, in the past, stand up five or 10-node Hadoop cluster and run some sort of pilot and say, this is our strategy. But is sounds like you herded a lot of cats... >> We had dozens of those small Hadoop clusters all around the company. (laughter) >> So, how did you go about converting that energy, that excess energy towards something more harmonized around Databricks? >> Well, a lot of people started recognizing we had a problem, and this really wasn't going to scale, and we really needed to come up with a broader way to share things across the organization. So, the timing was really right, and a lot of people were beginning to understand that. And, you know, we said for us, probably about five different kind of key decisions we ended up making. And part of the whole strategy was to empower the businesses. As I have mentioned, we are a very distributed organization, so, you can't really dictate the businesses. The businesses really need the owners' success. And one of the decisions that was made, it might be kind of controversial for many CIOs, is that we've made a big push on cloud-hosted and business-owned, not IT-owned. And one of the real big reasons for that is we were no longer viewing data and big data as kind of a business-intelligence activity or a standardized reporting activity. We really knew that, to be successful moving forward, is needed to be built into our products and services, and those products and services are managed by the businesses. So, it can't be something that would be tossed off to an IT organization. >> So that the IT organization, then, evolved into being more of an innovative entity versus a reactive or supportive entity for all those different distributing groups. >> Well, in our regard, we've ended up with AWS as part of our activity, and, really, much of our big data activities are driven by the businesses. The connections we have with IT are more related to CRM and product data master sheets and selling in channels and all that information. >> But if you take a bunch of business-led projects and then try and centralize some aspect of them, wouldn't IT typically become the sort of shared infrastructure architecture advisor for that, and then the businesses now have a harmonized platform on which they can build shared data sets? >> Actually, in our case, that's what we did. We had a lot of our businesses that already had significant services hosted in AWS. And those were very much part of the high-data generators. So, it became a very natural evolution to continue with some of our AWS relationships and continue on to Databricks. So, as an organization today, we have three kind of main buckets for our Databricks, but, you know, any business, they can get their accounts. We try and encourage everything to get into a data link, and that's three, and Parquet formats, one of the decisions that was adapted. And then, from there, people can begin to move. You know, you can get notebooks, you can share notebooks, you can look at those things. You know, the beauty of Databricks and AWS is instant on. If I want to play around with something with a half a dozen nodes, it's great. If I need a thousand for a workload, boom, I've got it! I know, kind of others, then, with this cost and the value returned, there's really no need for permissions or coordination with other entities, and that's kind of what we wanted the businesses to have that autonomy to drive their business success. >> But, does there not to be some central value added in the way of, say, data curation through a catalog or something like that? >> Yes, so, this is not necessarily a model where all the businesses are doing all kinds of crazy things. One of the things that we shepherded by one of our CTOs and the other functions, we ended up creating a virtual community within HP. This kind of started off with a lot of "tribal elders" or "tribal leaders." With this virtual community, today we get together every two weeks, and we have presentations and discussions on all things from data science into machine learning, and that's where a lot of this activity around how do we get better at sharing. And this is fostered, kind of splinters off for additional activity. So we have one on data telemetry within our organization. We're trying to standardize more data formats and schemas for those so we can have more broader sharing. So, these things have been occurring more organically as part of a developer enablement kind of moving up rather than more of kind of dictates moving down. >> That's interesting. Potentially, really important, when you say, you're trying to standardize some of the telemetry, what are you instrumenting. Is it just all the infrastructure or is it some of the products that HP makes? >> It's definitely the products and the software. You know, like I said, we manage a huge spectrum of print products, and my apologies if I'm focusing on it, but that is what I know the best. You know, we've actually been doing telemetry and analysis since the late 90s. You know, we wanted to understand use of supplies and usage so we could do our own forecasting, and that's really, really grown over the years. You know, now, we have parts of our services organization management services, where they're offering big data analytics as part of the package, and we provide information about predictive failure of parts. And that's going to be really valuable for some of our business partners that allows them. We have all kinds fancy algorithms that we work on. The customers have specific routes that they go for servicing, and we may be able to tell them, hey, in a certain time period, we think these devices in your field so you can coordinate your route to hit those on an efficient route rather than having to make a single truck roll for one repair, and do that before a customer experiences a problem. So, it's been kind of a great example of different ways that big data can impact the business. >> You know, I think Ali mentioned in the keynote this morning about the example of a customer getting a notification that their ink's going to run out, and the chance that you get to touch that customer and get them to respond and buy, you could make millions of dollar difference, right? Let's talk about some of the business outcomes and the impact that some of your workers have done, and what it means, really, to the business. >> Right now, we're trying to migrate a lot of legacy stuff, and you know, that's kind of boring. (laughs) It's just a lot of work, but there are things that need to happen. But there's really the power of the big data platform has been really great with Databricks. I know, John Landry, one of our CTOs, he's in the personal systems group. He had a great example on some problems they had with batteries and laptops, and, you know, they have a whole bunch of analytics. They've been monitoring batteries, and they found a collection of batteries that experienced very early failure rates. I happen to be able to narrow it down to specific lots from a specific supplier, and they were able to reach out to customers to get those batteries replaced before they died. >> So, a mini-recall instead of a massive PR failure. (laughs) >> You know, it was really focused on, you know, customers didn't even know they were going to have a problem with these batteries, that they were going to die early. You know, you got to them ahead of time, told them we knew this was going to be a problem and try to help them. I mean, what a great experience for a customer. (laughs) That's just great. >> So, once you had this telemetry, and it sounds like a bunch of shared repositories, not one intergalactic one. What were some of the other use cases like, you know, like the battery predictive failure type scenarios. >> So, you know, we have some very large gaps, or not gaps, with different categories. We have clearly consumer products. You know, you sell millions and millions of those, and we have little bit of telemetry with those. I think we want to understand failures and ink levels and some of these other things. But, on our commercial web presses, these very large devices, these are very sensitive. These things are down, they have a big problem. So, these things are generating all kinds of data. All right, we have systems on a premise with customers that are alerting them to potential failures, and there's more and more activity going on there to understand predictive failure and predictive kind of tolerance slippages. I'm not super familiar with that business, but I know some guys that they've started introducing more sensors into products, specifically so they can get more data, to understand things. You know, slight variations in tensioning and paper, you know, these things that are running hundreds of feet per minute can have a large impact. So, I think that's really where we see more and more of the value coming from is being able to return that value back to the customer, not just help us make better decisions, but to get that back to the customer. You know, we're talking about expanding more customer-facing analytics in these cases, or we'll expose to customers some of the raw data, and they can build their own dashboards. Some of these industries have traditionally been very analog, so this move to digital web process and this mountain of data is a little new for them, but HP can bring a lot to the table in terms of our experience in computing and big data to help them with their businesses. >> All right, great stuff. And we just got a minute to go before we're done. I have two questions for you, the first is an easy yes/no question. >> John: Okay. >> Is Purdue going to repeat as Big 10 champ in basketball? >> Oh, you know, I don't know. (laughs) I hope so! >> We both went to Purdue. >> I'm more focused on the Warriors winning. (laughter) >> All right, go Warriors! And, the real question is, what surprised you the most? This is your first Spark Summit. What surprised you the most about the event? >> So, you know, you see a lot of Internet-born companies, and it's amazing how many people have just gone fully native with Spark all over the place, and it's a beautiful thing to see. You know, in larger enterprises, that transition doesn't happen like that. I'm kind of jealous. (laughter) We have a lot more things slug through, but the excitement here and all the things that people are working on, you know, you can only see so many tracks. I'm going to have to spend two days when I get back, just watching the videos on all of the tracks I couldn't attend. >> All right, Internet-born companies versus the big enterprise. Good luck herding those cats, and thank you for sharing your story with us today and talking a little bit about the culture there at HP. >> John: Thank you very much. >> And thank you all for watching this segment of theCube. Stay with us, we're still covering Spark Summit 2017. This is Day Two, and we're not done yet. We'll see you in a few minutes. (theCube jingle)
SUMMARY :
covering Spark Summit 2017, brought to you by Databricks. Welcome back to theCube at Spark Summit 2017. all right, that's the guy who talks about herding cats, and is that related to maybe the organization at HP? and a allay many of the fears that they had and it has a lot of data collection. I'm mostly on the 2D side, that you're working on and we had a lot of groups that were standing up I'm sure you talked to enterprise companies along the way. the ones we hear a lot about is all around the company. and we really needed to come up with So that the IT organization, then, evolved and selling in channels and all that information. and Parquet formats, one of the decisions that was adapted. One of the things that we shepherded or is it some of the products that HP makes? and that's really, really grown over the years. and the chance that you get to touch that customer a lot of legacy stuff, and you know, that's kind of boring. So, a mini-recall instead of a massive PR failure. You know, it was really focused on, you know, What were some of the other use cases like, you know, and we have little bit of telemetry with those. And we just got a minute to go before we're done. Oh, you know, I don't know. I'm more focused on the Warriors winning. And, the real question is, what surprised you the most? and it's a beautiful thing to see. and thank you for sharing your story with us today And thank you all for watching this segment of theCube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Cavanaugh | PERSON | 0.99+ |
David | PERSON | 0.99+ |
John Landry | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
HP | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
2015 | DATE | 0.99+ |
millions | QUANTITY | 0.99+ |
two questions | QUANTITY | 0.99+ |
two days | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Ali | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
half a dozen | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
three | QUANTITY | 0.99+ |
late 90s | DATE | 0.98+ |
Warriors | ORGANIZATION | 0.98+ |
Spark Summit 2017 | EVENT | 0.98+ |
hundreds of feet | QUANTITY | 0.98+ |
dozens | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
both | QUANTITY | 0.96+ |
Spark Summit | EVENT | 0.96+ |
today | DATE | 0.96+ |
hundreds of feet per minute | QUANTITY | 0.94+ |
Spark | ORGANIZATION | 0.93+ |
single truck | QUANTITY | 0.93+ |
a thousand | QUANTITY | 0.92+ |
one repair | QUANTITY | 0.92+ |
five | QUANTITY | 0.91+ |
this morning | DATE | 0.89+ |
Purdue | ORGANIZATION | 0.87+ |
Sprocket | ORGANIZATION | 0.87+ |
2D | QUANTITY | 0.84+ |
Day Two | QUANTITY | 0.83+ |
every two weeks | QUANTITY | 0.81+ |
dollar | QUANTITY | 0.81+ |
three kind | QUANTITY | 0.81+ |
two large business segments | QUANTITY | 0.7+ |
Spark | TITLE | 0.69+ |
five different | QUANTITY | 0.64+ |
Herding Cats | ORGANIZATION | 0.64+ |
about | QUANTITY | 0.6+ |
10-node | QUANTITY | 0.57+ |
so many | QUANTITY | 0.51+ |
Purdue | EVENT | 0.48+ |
theCube | ORGANIZATION | 0.47+ |
Parquet | TITLE | 0.42+ |
eally | ORGANIZATION | 0.3+ |
10 | QUANTITY | 0.28+ |