StrongyByScience Podcast | Bill Schmarzo Part One
produced from the cube studios this is strong by science in-depth conversations about science based training sports performance and all things health and wellness here's your host max smart [Music] [Applause] [Music] all right thank you guys tune in today I have the one and only Dean of big data the man the myth the legend bill Schwarz oh also my dad is the CTO of Hitachi van Tara and IOC in analytics he has a very interesting background because he is the well he's known as the Dean of big data but also the king of the court and all things basketball related when it comes to our household and unlike most people in the data world and I want to say most as an umbrella term but a some big bill has an illustrious sports career playing at Coe College the Harvard of the Midwest my alma mater as well but I think having that background of not just being computer science but where you have multiple disciplines involved when it comes to your jazz career you had basketball career you have obviously the career Iran now all that plays a huge role in being able to interpret and take multiple domains and put it into one so thank you for being here dad yeah thanks max that's a great introduction I rep reciate that no it's it's wonderful to have you and for our listeners who are not aware bill is referring him is Bill like my dad but I call my dad the whole time is gonna drive me crazy bill has a mind that thinks not like most so he he sees things he thinks about it not just in terms of the single I guess trajectory that could be taken but the multiple domains that can go so both vertically and horizontally and when we talk about data data is something so commonly brought up in sports so commonly drop in performance and athletic development big data is probably one of the biggest guess catchphrases or hot words or sayings that people have nowadays but doesn't always have a lot of meaning to it because a lot of times we get the word big data and then we don't have action out of big data and bill specialty is not just big data but it's giving action out of big data with that going forward I think a lot of this talk to be talking about how to utilize Big Data how do you guys data in general how to organize it how to put yourself in a situation to get actionable insights and so just to start it off Becky talked a little bit on your background some of the things you've done and how you develop the insights that you have thanks max I have kind of a very nos a deep background but I've been doing data analytics a long time and I was very fortunate one of those you know Forrest Gump moments in life where in the late 1980s I was involved in a project at Procter & Gamble I ran the project where we brought in Walmart's point of sales data for the first time into a what we would now call a data warehouse and for many of this became the launching point of the data warehouse bi marketplace and we can trace the effect the origins of many of the BI players to that project at Procter & Gamble in 87 and 88 and I spent a big chunk of my life just a big believer in business intelligence and data warehousing and trying to amass data together and trying to use that data to report on what's going on and writing insights and I did that for 20 25 years of my life until as you probably remember max I was recruited out Business Objects where I was the vice president of analytic applications I was recruited out of there by Yahoo and Yahoo had a very interesting problem which is they needed to build analytics for their advertisers to help those advertisers to optimize or spend across the Yahoo ad network and what I learned there in fact what I unlearned there was that everything that I had learned about bi and data warehouse and how you constructed data warehouses how you were so schema centric how everything was evolved around tabular data at Yahoo there was an entirely different approach the of my first introduction to Hadoop and the concept of a data Lake that was my first real introduction into data science and how to do predictive analytics and prescriptive analytics and in fact it was it was such a huge change for me that I was I was asked to come back to the TD WI data world Institute right was teaching for many years and I was asked to do a keynote after being at Yahoo for a year or so to share sort of what were the observations what did I learn and I remember I stood up there in front of about 600 people and I started my presentation by saying everything I've taught you the past 20 years is wrong and it was well I didn't get invited back for 10 years so that probably tells you something but it was really about unlearning a lot about what I had learned before and probably max one of the things that was most one of the aha moments for me was bi was very focused on understanding the questions that people were trying to ask an answer davus science is about us to understand the decisions they're trying to take action on questions by their very nature our informative but decisions are actionable and so what we did at Yahoo in order to really drive the help our advertisers optimize your spend across the Yahoo ad network is we focus on identifying the decisions the media planners and buyers and the campaign managers had to make around running a campaign know what what how much money to allocate to what sides how much how many conversions do I want how many impressions do I want so all the decisions we built predictive analytics around so that we can deliver prescriptive actions to these two classes of stakeholders the media planners and buyers and the campaign managers who had no aspirations about being analysts they're trying to be the best digital marketing executives or you know or people they could possibly be they didn't want to be analysts so and that sort of leads me to where I am today and my my teaching my books my blogs everything I do is very much around how do we take data and analytics and help organizations become more effective so everything I've done since then the books I've written the teaching I do with University of San Francisco and next week at the National University of Ireland and Galway and all the clients I work with is really how do we take data and analytics and help organizations become more effective at driving the decisions that optimize their business and their operational models it's really about decisions and how do we leverage data and analytics to drive those decisions so what would how would you define the difference between a question that someone's trying to answer versus a decision but they're trying to be better informed on so here's what I'd put it I call it the Sam test I am and that is it strategic is it actionable is it material and so you can ask questions that are provocative but you might not fast questions that are strategic to the problems you're trying to solve you may not be able to ask questions that are actionable in a sense you know what to do and you don't necessarily ask questions that are material in the sense that the value of that question is greater than the cost of answering that question right and so if I think about the Sam test when I apply it to data science and decisions when I start mining the data so I know what decisions are most important I'm going through a process to identify to validate the value and prioritize those decisions right I understand what decisions are most important now when I start to dig through the data all this structured unstructured data across a number different data sources I'm looking for I'm trying to codify patterns and relationships buried in that data and I'm applying the Sam test is that against those insights is it strategic to the problem I'm trying to solve can I actually act on it and is it material in the sense that it's it's it's more valuable to act than it is to create the action around it so that's the to me that big difference is by their very nature decisions are actually trying to make a decision I'm going to take an action questions by their nature are informative interesting they could be very provocative you know questions have an important role but ultimately questions do not necessarily lead to actions so if I'm a a sport coach I'm writing a professional basketball team some of the decisions I'm trying to make are I'm deciding on what program best develops my players what metrics will help me decide who the best prospect is is that the right way of looking at it yeah so we did an exercise at at USF too to have the students go through an exercise - what question what decisions does Steve Kerr need to make over the next two games he's playing right and we go through an exercise of the identifying especially in game decisions exercise routes oh no how often are you gonna play somebody no how long are they gonna play what are the right combinations what are the kind of offensive plays that you're gonna try to run so there's a know a bunch of decisions that Steve Kerr is coach of the Warriors for example needs to make in the game to not only try to win the game but to also minimize wear and tear on his players and by the way that's a really good point to think about the decisions good decisions are always a conflict of other ideas right win the game while minimizing wear and tear on my players right there's there are there are all the important decisions in life have two three or four different variables that may not be exactly the same which is by this is where data science comes in the data science is going to look across those three or four very other metrics against what you're going to measure success and try to figure out what's the right balance of those given the situation I'm in so if going back to the decision about about playing time well think about all the data you might want to look at in order to optimize that so when's the next game how far are they in this in this in the season where do they currently sit ranking wise how many minutes per game has player X been playing looking over the past few years what's there you know what's their maximum point so there's there's a there's not a lot of decisions that people are trying to make and by the way the beauty of the decisions is the decisions really haven't changed in years right what's changed is not the decisions it's the answers and the answers have changed because we have this great bound of data available to us in game performance health data you know all DNA data all kinds of other data and then we have all these great advanced analytic techniques now neural networks and unstructured supervised machine learning on right all this great technology now that can help us to uncover those relationships and patterns that are buried in the data that we can use to help individualize those decisions one last point there the point there to me at the end when when people talk about Big Data they get fixated on the big part the volume part it's not the volume of big data that I'm going to monetize it's the granularity and what I mean by that is I now have the ability to build very detailed profiles going back to our basketball example I can build a very detailed performance profile on every one of my players so for every one of the players on the Warriors team I can build a very detailed profile it the details out you know what's their optimal playing time you know how much time should they spend before a break on the feet on the on the on the court right what are the right combinations of players in order to generate the most offense or the best defense I can build these very detailed individual profiles and then I can start mission together to find the right combination so when we talk about big it's not the volume it's interesting it's the granularity gotcha and what's interesting from my world is so when you're dealing with marketing and business a lot of that when you're developing whether it be a company that you're trying to find more out about your customers or your startup trying to learn about what product you should develop there's tons of unknowns and a lot of big data from my understanding it can help you better understand some patterns within customers how to market you know in your book you talk about oh we need to increase sales at Chipotle because we understand X Y & Z our current around us now in the sports science world we have our friend called science and science has helped us early identify certain metrics that are very important and correlated to different physiological outcomes so it almost gives us a shortcut because in the big data world especially when you're dealing with the data that you guys are dealing with and trying to understand customer decisions each customer is individual and you're trying to compile all together to find patterns no one's doing science on that right it's not like a lab work where someone is understanding muscle protein synthesis and the amount of nutrients you need to recover from it so in my position I have all these pillars that maybe exist already where I can begin my search there's still a bunch of unknowns with that kind of environment do you take a different approach or do you still go with the I guess large encompassing and collect everything you can and siphon after maybe I'm totally wrong I'll let you take it away no that's it's a it's a good question and what's interesting about that max is that the human body is governed by a series of laws we'll say in each me see ology and the things you've talked about physics they have laws humans as buyers you know shoppers travelers we have propensity x' we don't have laws right I have a propensity that I'm gonna try to fly United because I get easier upgrades but I might fly you know Southwest because of schedule or convenience right I have propensity x' I don't have laws so you have laws that work to your advantage what's interesting about laws that they start going into the world of IOT and this concept called digital twins they're governed by laws of physics I have a compressor or a chiller or an engine and it's got a bunch of components in it that have been engineered together and I can actually apply the laws I can actually run simulations against my digital twins to understand exactly when is something likely to break what's the remaining useful life in that product what's the severity of the the maintenance I need to do on that so the human body unlike the human psyche is governed by laws human behaviors are really hard right and we move the las vegas is built on the fact that human behaviors are so flawed but body mate but bat body physics like the physics that run these devices you can actually build models and one simulation to figure out exactly how you know what's the wear and tear and what's the extensibility of what you can operate in gotcha yeah so that's when from our world you start looking at subsystems and you say okay this is your muscular system this is your autonomic nervous system this is your central nervous system these are ways that we can begin to measure it and then we can wrote a blog on this that's a stress response model where you understand these systems and their inferences for the most part and then you apply a stress and you see how the body responds and even you determine okay well if I know the body I can only respond in a certain number of ways it's either compensatory it's gonna be you know returning to baseline and by the mal adaptation but there's only so many ways when you look at a cell at the individual level that that cell can actually respond and it's the aggregation of all these cellular responses that end up and manifest in a change in a subsystem and that subsystem can be measured inferential II through certain technology that we have but I also think at the same time we make a huge leap and that leap is the word inference right we're making an assumption and sometimes those assumptions are very dangerous and they lead to because that assumptions unknown and we're wrong on it then we kind of sway and missed a little bit on our whole projection so I like the idea of looking at patterns and look at the probabilistic nature of it and I'm actually kind of recently change my view a little bit from my room first I talked about this I was much more hardwired and laws but I think it's a law but maybe a law with some level of variation or standard deviation and it we have guardrails instead so that's kind of how I think about it personally is that something that you say that's on the right track for that or how would you approach it yeah actually there's a lot of similarities max so your description of the human body made up of subsystems when we talk to organizations about things like smart cities or smart malls or smart hospitals a smart city is comprised of a it's made up of a series of subsystems right I've got subsystems regarding water and wastewater traffic safety you know local development things like this look there's a bunch of subsystems that make a city work and each of those subsystems is comprised of a series of decisions or clusters of decisions with equal use cases around what you're trying to optimize so if I'm trying to improve traffic flow if one of my subsystems is practically flow there are a bunch of use cases there about where do I do maintenance where do I expand the roads you know where do I put HOV lanes right so and so you start taking apart the smart city into the subsystems and then know the subsystems are comprised of use cases that puts you into really good position now here's something we did recently with a client who is trying to think about building the theme park of the future and how do we make certain that we really have a holistic view of the use cases that I need to go after it's really easy to identify the use cases within your own four walls but digital transformation in particular happens outside the four walls of an organization and so what we what we're doing is a process where we're building journey maps for all their key stakeholders so you've got a journey map for a customer you have a journey map for operations you have a journey map for partners and such so you you build these journey maps and you start thinking about for example I'm a theme park and at some point in time my guest / customer is going to have a pity they want to go do something you want to go on vacation at that point in time that theme park is competing against not only all the other theme parks but it's competing against major league baseball who's got things it's competing against you know going to the beach in Sanibel Island just hanging around right there they're competing at that point and if they only start engaging the customer when the customers actually contacted them they must a huge part of the market they made you miss a huge chance to influence that person's agenda and so one of the things that think about I don't know how this applies to your space max but as we started thinking about smart entities we use design thinking and customer journey match there's a way to make certain that we're not fooling ourselves by only looking within the four walls of our organization that we're knocking those walls down making them very forest and we're looking at what happens before somebody engages it with us and even afterwards so again going back to the theme park example once they leave the theme park they're probably posting on social media what kind of fun they had or fun they didn't have they're probably making plans for next year they're talking to friends and other things so there's there's a bunch of stuff we're gonna call it afterglow that happens after event that you want to make certain that you're in part of influencing that so again I don't know how when you combined the data science of use cases and decisions with design thinking of journey Maps what that might mean to do that your business but for us in thinking about smart cities it's opened up all kinds of possibilities and most importantly for our customers it's opened up all kinds of new areas where they can create new sources of value so anyone listening to this need to understand that when the word client or customer is used it can be substituted for athlete and what I think is really important is that when we hear you talk about your the the amount of infrastructure you do for an idea when you approach a situation is something that sports science for in my opinion especially across multiple domains it's truly lacking what happens is we get a piece of technology and someone says go do science while you're taking the approach of let's actually think out what we're doing beforehand let's determine our key performance indicators let's understand maybe the journey that this piece of technology is going to take with the athlete or how the athletes going to interact with this piece of technology throughout their four years if you're in the private sector right that afterglow effect might be something that you refer to as a client retention and their ability to come back over and over and spread your own word for you if you're in the sector with student athletes maybe it's those athletes talking highly about your program to help with recruiting and understanding that developing athletes is going to help you know make that college more enticing to go to or that program or that organization but what really stood out was the fact that you have this infrastructure built beforehand and the example I give I spoke with a good number of organizations and teams about data utilization is that if if you're to all of a sudden be dropped in the middle of the woods and someone says go build a cabin now how was it a giant forest I could use as much wood as I want I could just keep chopping down trees until I had something that had with a shelter of some sort right even I could probably do that well if someone said you know what you have three trees to cut down to make a cabin you could become very efficient and you're going to think about each chop in each piece of wood and how it's going to be used and your interaction with that wood and conjunction with that woods interaction with yourself and so when we start looking at athlete development and we're looking at client retention or we're looking at general health and wellness it's not just oh this is a great idea right we want to make the world's greatest theme park and we want to make the world's greatest training facility but what infrastructure and steps you need to take and you said stakeholders so what individuals am i working with am I talking with the physical therapist am i talking with the athletic trainer am I talking with the skill coach how does the skill coach want the data presented to them maybe that's different than how the athletic trainer is going to have a day to present it to them maybe the sport coach doesn't want to see the data unless something a red flag comes up so now you have all these different entities just like how you're talking about developing this customer journey throughout the theme park and making sure that they have a you know an experience that's memorable and causes an afterglow and really gives that experience meaning how can we now take data and apply it in the same way so we get the most value like you said on the granular aspect of data and really turn that into something valuable max you said something really important and one of the things that let me share one of many horror stories that that that comes up in my daily life which is somebody walking up to me and saying hey I got a client here's their data you know go do some science on it like well well what the heck right so when we created this thing called the hypothesis development canvas our sales teams hate it or do the time our data science teams love it because we do all this pre work we just say we make sure we understand the problem we're going after the decision they're trying to make the KPI is it's what you're going to measure success in progress what are they the operational and financial business benefits what are the data sources we want to consider here's something by the way that's it's important that maybe I wish Boeing would have thought more about which is what are the costs of false positives and false negatives right do you really understand where your risks points are and the reason why false positive and false negatives are really important in data science because data size is making predictions and by virtue of making predictions we are never 100% certain that's right or not predictions hath me built on I'm good enough well when is good enough good enough and a lot of that determination as to when is good enough good enough is really around the cost of false positives and false negatives think about a professional athlete like the false the you know the ramifications of overtraining professional athlete like a Kevin Durant or Steph Curry and they're out for the playoffs as huge financial implications them personally and for the organization so you really need to make sure you understand exactly what's the cost of being wrong and so this hypothesis development canvas is we do a lot of this work before we ever put science to the data that yeah it's it's something that's lacking across not just sports science but many fields and what I mean by that is especially you referred to the hypothesis canvas it's a piece of paper that provides a common language right it's you can sit it out before and for listeners who aren't aware a hypothesis canvas is something bill has worked and developed with his team and it's about 13 different squares and boxes and you can manipulate it based on your own profession and what you're diving into but essentially it goes through the infrastructure that you need to have setup in order for this hypothesis or idea or decision to actually be worth a damn and what I mean by that is that so many times and I hate this but I'm gonna go in a little bit of a rant and I apologize that people think oh I get an idea and they think Thomas Edison all son just had an idea and he made a light bulb Thomas Edison's famous for saying you know I did you know make a light bulb I learned was a 9000 ways to not make a light bulb and what I mean by that is he set an environment that allowed for failure and allowed for learning but what happens often people think oh I have an idea they think the idea comes not just you know in a flash because it always doesn't it might come from some research but they also believe that it comes with legs and it comes with the infrastructure supported around it that's kind of the same way that I see a lot of the data aspect going in regards to our field is that we did an idea we immediately implement and we hope it works as opposed to set up a learning environment that allows you to go okay here's what I think might happen here's my hypothesis here's I'm going to apply it and now if I fail because I have the infrastructure pre mapped out I can look at my infrastructure and say you know what that support beam or that individual box itself was the weak link and we made a mistake here but we can go back and fix it
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve Kerr | PERSON | 0.99+ |
Kevin Durant | PERSON | 0.99+ |
Procter & Gamble | ORGANIZATION | 0.99+ |
Steph Curry | PERSON | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Sanibel Island | LOCATION | 0.99+ |
10 years | QUANTITY | 0.99+ |
Procter & Gamble | ORGANIZATION | 0.99+ |
Chipotle | ORGANIZATION | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
a year | QUANTITY | 0.99+ |
9000 ways | QUANTITY | 0.99+ |
Boeing | ORGANIZATION | 0.99+ |
Hitachi van Tara | ORGANIZATION | 0.99+ |
Bill Schmarzo | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
Becky | PERSON | 0.99+ |
Thomas Edison | PERSON | 0.99+ |
IOC | ORGANIZATION | 0.99+ |
each piece | QUANTITY | 0.99+ |
Warriors | ORGANIZATION | 0.99+ |
University of San Francisco | ORGANIZATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
each | QUANTITY | 0.99+ |
each chop | QUANTITY | 0.99+ |
next year | DATE | 0.98+ |
Thomas Edison | PERSON | 0.98+ |
four years | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
next week | DATE | 0.98+ |
today | DATE | 0.98+ |
bill | PERSON | 0.98+ |
late 1980s | DATE | 0.98+ |
Forrest Gump | PERSON | 0.98+ |
20 25 years | QUANTITY | 0.97+ |
first time | QUANTITY | 0.97+ |
two classes | QUANTITY | 0.97+ |
Harvard | ORGANIZATION | 0.97+ |
first introduction | QUANTITY | 0.96+ |
four different variables | QUANTITY | 0.96+ |
single | QUANTITY | 0.94+ |
Coe College | ORGANIZATION | 0.94+ |
each customer | QUANTITY | 0.94+ |
two games | QUANTITY | 0.94+ |
both | QUANTITY | 0.94+ |
Dean | PERSON | 0.93+ |
about 600 people | QUANTITY | 0.93+ |
years | QUANTITY | 0.92+ |
USF | ORGANIZATION | 0.92+ |
ta world Institute | ORGANIZATION | 0.92+ |
one | QUANTITY | 0.91+ |
one of my subsystems | QUANTITY | 0.9+ |
about 13 different squares | QUANTITY | 0.89+ |
a day | QUANTITY | 0.88+ |
Galway | LOCATION | 0.86+ |
88 | DATE | 0.86+ |
National University of Ireland | ORGANIZATION | 0.85+ |
StrongyByScience | TITLE | 0.82+ |
Bill | PERSON | 0.81+ |
Southwest | LOCATION | 0.81+ |
TD WI | ORGANIZATION | 0.81+ |
tons of unknowns | QUANTITY | 0.81+ |
Sam test | TITLE | 0.8+ |
bill Schwarz | PERSON | 0.8+ |
lot of times | QUANTITY | 0.78+ |
87 | DATE | 0.78+ |
three trees | QUANTITY | 0.78+ |
boxes | QUANTITY | 0.77+ |
many times | QUANTITY | 0.74+ |
United | ORGANIZATION | 0.72+ |
one last point | QUANTITY | 0.7+ |
one of the things | QUANTITY | 0.68+ |
past 20 years | DATE | 0.67+ |
Part One | OTHER | 0.67+ |
other metrics | QUANTITY | 0.65+ |
Iran | ORGANIZATION | 0.65+ |
four walls | QUANTITY | 0.63+ |
past few years | DATE | 0.62+ |
max | PERSON | 0.62+ |