John Cavanaugh, HP - #SparkSummit - #theCUBE
>> Announcer: Live from San Francisco, it's theCube, covering Spark Summit 2017, brought to you by Databricks. >> Welcome back to theCube at Spark Summit 2017. I don't know about you, George, I'm having a great time learning from all of our attendees. >> We've been absorbing now for almost two days. >> Yeah, well, and we're about to absorb a little bit more here, too, because the next guest, I looking forward to, I saw his name on the schedule, all right, that's the guy who talks about herding cats, it's John Cavanaugh, Master Architect from HP. John, welcome to the show. >> Great, thanks for being here. >> Well, I did see, I don't know if it's about cats in the Internet, but either cats or self-driving cars, one of the two in analogies. But talk to us about your session. Why did you call it Herding Cats, and is that related to maybe the organization at HP? >> Yeah, there's a lot of organizational dynamics as part of our migration at Spark. HP is a very distributed organization, and it has had a lot of distributed autonomy, so, you know, trying to get centralized activity is often a little challenging. You guys have often heard, you know, I am from the government, I'm here to help. That's often the kind of shields-up response you will get from folks, so we got a lot of dynamics in terms of trying to bring these distributed organizations on board to a new common platform, and a allay many of the fears that they had with making any kind of a change. >> So, are you centered at a specific division? >> So, yes, I'm the print platforms and future technology group. You know, there's two large business segments with HP. There's our personal systems group that produces everything from phones to business PCs to high-end gaming. But I'm in the printing group, and while many people are very familiar with your standard desktop printer, you know, the printers we sell really vary from a very small product we call Sprocket, it fits in your hand, battery-operated, to literally a web press that's bigger than your house and prints at hundreds of feet per minute. So, it's a very wide product line, and it has a lot of data collection. >> David: Do you have 3D printing as well? >> We do have 3D printing as well. That's an emergent area for us. I'm not super familiar with that. I'm mostly on the 2D side, but that's a very exciting space as well. >> So tell me about what kind of projects that you're working on that do require that kind of cross-team or cross-departmental cooperation. >> So, you know, in my talk, I talked about the Wild West Era of Big Data, and that was prior to 2015, and we had a lot of groups that were standing up all kinds of different big data infrastructures. And part of this stems from the fact that we were part of HP at the time, and we could buy servers and racks of servers at cost. Storage was cheap, all these things, so they sprouted up everywhere. And, around 2015, everybody started realizing, oh my God, this is completely fragmented. How do we pull things back together? And that's when a lot of groups started trying to develop platformish types of activities, and that's where we knew we needed to go, but there was even some disagreement from different groups, how do we move forward. So, there's been a lot of good work within HP in terms of creating a virtual community, and Spark really kind of caught on pretty quickly. Many people were really tired of kind of Hadoop. There were a lot of very opinionated models in Hadoop, where Spark opens up a lot more into the data science community. So, that went really well, and we made a big push into AWS for much of our cloud activities, and we really ended up then pretty quickly with Databricks as an enterprise partner for us. >> And so, George, you've done a lot of research. I'm sure you talked to enterprise companies along the way. Is this a common issue with big enterprises? >> Well, for most big data projects they've started, the ones we hear a lot about is there's a mandate from the CIO, we need a big data strategy, and so some of those, in the past, stand up five or 10-node Hadoop cluster and run some sort of pilot and say, this is our strategy. But is sounds like you herded a lot of cats... >> We had dozens of those small Hadoop clusters all around the company. (laughter) >> So, how did you go about converting that energy, that excess energy towards something more harmonized around Databricks? >> Well, a lot of people started recognizing we had a problem, and this really wasn't going to scale, and we really needed to come up with a broader way to share things across the organization. So, the timing was really right, and a lot of people were beginning to understand that. And, you know, we said for us, probably about five different kind of key decisions we ended up making. And part of the whole strategy was to empower the businesses. As I have mentioned, we are a very distributed organization, so, you can't really dictate the businesses. The businesses really need the owners' success. And one of the decisions that was made, it might be kind of controversial for many CIOs, is that we've made a big push on cloud-hosted and business-owned, not IT-owned. And one of the real big reasons for that is we were no longer viewing data and big data as kind of a business-intelligence activity or a standardized reporting activity. We really knew that, to be successful moving forward, is needed to be built into our products and services, and those products and services are managed by the businesses. So, it can't be something that would be tossed off to an IT organization. >> So that the IT organization, then, evolved into being more of an innovative entity versus a reactive or supportive entity for all those different distributing groups. >> Well, in our regard, we've ended up with AWS as part of our activity, and, really, much of our big data activities are driven by the businesses. The connections we have with IT are more related to CRM and product data master sheets and selling in channels and all that information. >> But if you take a bunch of business-led projects and then try and centralize some aspect of them, wouldn't IT typically become the sort of shared infrastructure architecture advisor for that, and then the businesses now have a harmonized platform on which they can build shared data sets? >> Actually, in our case, that's what we did. We had a lot of our businesses that already had significant services hosted in AWS. And those were very much part of the high-data generators. So, it became a very natural evolution to continue with some of our AWS relationships and continue on to Databricks. So, as an organization today, we have three kind of main buckets for our Databricks, but, you know, any business, they can get their accounts. We try and encourage everything to get into a data link, and that's three, and Parquet formats, one of the decisions that was adapted. And then, from there, people can begin to move. You know, you can get notebooks, you can share notebooks, you can look at those things. You know, the beauty of Databricks and AWS is instant on. If I want to play around with something with a half a dozen nodes, it's great. If I need a thousand for a workload, boom, I've got it! I know, kind of others, then, with this cost and the value returned, there's really no need for permissions or coordination with other entities, and that's kind of what we wanted the businesses to have that autonomy to drive their business success. >> But, does there not to be some central value added in the way of, say, data curation through a catalog or something like that? >> Yes, so, this is not necessarily a model where all the businesses are doing all kinds of crazy things. One of the things that we shepherded by one of our CTOs and the other functions, we ended up creating a virtual community within HP. This kind of started off with a lot of "tribal elders" or "tribal leaders." With this virtual community, today we get together every two weeks, and we have presentations and discussions on all things from data science into machine learning, and that's where a lot of this activity around how do we get better at sharing. And this is fostered, kind of splinters off for additional activity. So we have one on data telemetry within our organization. We're trying to standardize more data formats and schemas for those so we can have more broader sharing. So, these things have been occurring more organically as part of a developer enablement kind of moving up rather than more of kind of dictates moving down. >> That's interesting. Potentially, really important, when you say, you're trying to standardize some of the telemetry, what are you instrumenting. Is it just all the infrastructure or is it some of the products that HP makes? >> It's definitely the products and the software. You know, like I said, we manage a huge spectrum of print products, and my apologies if I'm focusing on it, but that is what I know the best. You know, we've actually been doing telemetry and analysis since the late 90s. You know, we wanted to understand use of supplies and usage so we could do our own forecasting, and that's really, really grown over the years. You know, now, we have parts of our services organization management services, where they're offering big data analytics as part of the package, and we provide information about predictive failure of parts. And that's going to be really valuable for some of our business partners that allows them. We have all kinds fancy algorithms that we work on. The customers have specific routes that they go for servicing, and we may be able to tell them, hey, in a certain time period, we think these devices in your field so you can coordinate your route to hit those on an efficient route rather than having to make a single truck roll for one repair, and do that before a customer experiences a problem. So, it's been kind of a great example of different ways that big data can impact the business. >> You know, I think Ali mentioned in the keynote this morning about the example of a customer getting a notification that their ink's going to run out, and the chance that you get to touch that customer and get them to respond and buy, you could make millions of dollar difference, right? Let's talk about some of the business outcomes and the impact that some of your workers have done, and what it means, really, to the business. >> Right now, we're trying to migrate a lot of legacy stuff, and you know, that's kind of boring. (laughs) It's just a lot of work, but there are things that need to happen. But there's really the power of the big data platform has been really great with Databricks. I know, John Landry, one of our CTOs, he's in the personal systems group. He had a great example on some problems they had with batteries and laptops, and, you know, they have a whole bunch of analytics. They've been monitoring batteries, and they found a collection of batteries that experienced very early failure rates. I happen to be able to narrow it down to specific lots from a specific supplier, and they were able to reach out to customers to get those batteries replaced before they died. >> So, a mini-recall instead of a massive PR failure. (laughs) >> You know, it was really focused on, you know, customers didn't even know they were going to have a problem with these batteries, that they were going to die early. You know, you got to them ahead of time, told them we knew this was going to be a problem and try to help them. I mean, what a great experience for a customer. (laughs) That's just great. >> So, once you had this telemetry, and it sounds like a bunch of shared repositories, not one intergalactic one. What were some of the other use cases like, you know, like the battery predictive failure type scenarios. >> So, you know, we have some very large gaps, or not gaps, with different categories. We have clearly consumer products. You know, you sell millions and millions of those, and we have little bit of telemetry with those. I think we want to understand failures and ink levels and some of these other things. But, on our commercial web presses, these very large devices, these are very sensitive. These things are down, they have a big problem. So, these things are generating all kinds of data. All right, we have systems on a premise with customers that are alerting them to potential failures, and there's more and more activity going on there to understand predictive failure and predictive kind of tolerance slippages. I'm not super familiar with that business, but I know some guys that they've started introducing more sensors into products, specifically so they can get more data, to understand things. You know, slight variations in tensioning and paper, you know, these things that are running hundreds of feet per minute can have a large impact. So, I think that's really where we see more and more of the value coming from is being able to return that value back to the customer, not just help us make better decisions, but to get that back to the customer. You know, we're talking about expanding more customer-facing analytics in these cases, or we'll expose to customers some of the raw data, and they can build their own dashboards. Some of these industries have traditionally been very analog, so this move to digital web process and this mountain of data is a little new for them, but HP can bring a lot to the table in terms of our experience in computing and big data to help them with their businesses. >> All right, great stuff. And we just got a minute to go before we're done. I have two questions for you, the first is an easy yes/no question. >> John: Okay. >> Is Purdue going to repeat as Big 10 champ in basketball? >> Oh, you know, I don't know. (laughs) I hope so! >> We both went to Purdue. >> I'm more focused on the Warriors winning. (laughter) >> All right, go Warriors! And, the real question is, what surprised you the most? This is your first Spark Summit. What surprised you the most about the event? >> So, you know, you see a lot of Internet-born companies, and it's amazing how many people have just gone fully native with Spark all over the place, and it's a beautiful thing to see. You know, in larger enterprises, that transition doesn't happen like that. I'm kind of jealous. (laughter) We have a lot more things slug through, but the excitement here and all the things that people are working on, you know, you can only see so many tracks. I'm going to have to spend two days when I get back, just watching the videos on all of the tracks I couldn't attend. >> All right, Internet-born companies versus the big enterprise. Good luck herding those cats, and thank you for sharing your story with us today and talking a little bit about the culture there at HP. >> John: Thank you very much. >> And thank you all for watching this segment of theCube. Stay with us, we're still covering Spark Summit 2017. This is Day Two, and we're not done yet. We'll see you in a few minutes. (theCube jingle)
SUMMARY :
covering Spark Summit 2017, brought to you by Databricks. Welcome back to theCube at Spark Summit 2017. all right, that's the guy who talks about herding cats, and is that related to maybe the organization at HP? and a allay many of the fears that they had and it has a lot of data collection. I'm mostly on the 2D side, that you're working on and we had a lot of groups that were standing up I'm sure you talked to enterprise companies along the way. the ones we hear a lot about is all around the company. and we really needed to come up with So that the IT organization, then, evolved and selling in channels and all that information. and Parquet formats, one of the decisions that was adapted. One of the things that we shepherded or is it some of the products that HP makes? and that's really, really grown over the years. and the chance that you get to touch that customer a lot of legacy stuff, and you know, that's kind of boring. So, a mini-recall instead of a massive PR failure. You know, it was really focused on, you know, What were some of the other use cases like, you know, and we have little bit of telemetry with those. And we just got a minute to go before we're done. Oh, you know, I don't know. I'm more focused on the Warriors winning. And, the real question is, what surprised you the most? and it's a beautiful thing to see. and thank you for sharing your story with us today And thank you all for watching this segment of theCube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Cavanaugh | PERSON | 0.99+ |
David | PERSON | 0.99+ |
John Landry | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
HP | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
2015 | DATE | 0.99+ |
millions | QUANTITY | 0.99+ |
two questions | QUANTITY | 0.99+ |
two days | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Ali | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
half a dozen | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
three | QUANTITY | 0.99+ |
late 90s | DATE | 0.98+ |
Warriors | ORGANIZATION | 0.98+ |
Spark Summit 2017 | EVENT | 0.98+ |
hundreds of feet | QUANTITY | 0.98+ |
dozens | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
both | QUANTITY | 0.96+ |
Spark Summit | EVENT | 0.96+ |
today | DATE | 0.96+ |
hundreds of feet per minute | QUANTITY | 0.94+ |
Spark | ORGANIZATION | 0.93+ |
single truck | QUANTITY | 0.93+ |
a thousand | QUANTITY | 0.92+ |
one repair | QUANTITY | 0.92+ |
five | QUANTITY | 0.91+ |
this morning | DATE | 0.89+ |
Purdue | ORGANIZATION | 0.87+ |
Sprocket | ORGANIZATION | 0.87+ |
2D | QUANTITY | 0.84+ |
Day Two | QUANTITY | 0.83+ |
every two weeks | QUANTITY | 0.81+ |
dollar | QUANTITY | 0.81+ |
three kind | QUANTITY | 0.81+ |
two large business segments | QUANTITY | 0.7+ |
Spark | TITLE | 0.69+ |
five different | QUANTITY | 0.64+ |
Herding Cats | ORGANIZATION | 0.64+ |
about | QUANTITY | 0.6+ |
10-node | QUANTITY | 0.57+ |
so many | QUANTITY | 0.51+ |
Purdue | EVENT | 0.48+ |
theCube | ORGANIZATION | 0.47+ |
Parquet | TITLE | 0.42+ |
eally | ORGANIZATION | 0.3+ |
10 | QUANTITY | 0.28+ |