Image Title

Search Results for ta world Institute:

StrongyByScience Podcast | Bill Schmarzo Part One


 

produced from the cube studios this is strong by science in-depth conversations about science based training sports performance and all things health and wellness here's your host max smart [Music] [Applause] [Music] all right thank you guys tune in today I have the one and only Dean of big data the man the myth the legend bill Schwarz oh also my dad is the CTO of Hitachi van Tara and IOC in analytics he has a very interesting background because he is the well he's known as the Dean of big data but also the king of the court and all things basketball related when it comes to our household and unlike most people in the data world and I want to say most as an umbrella term but a some big bill has an illustrious sports career playing at Coe College the Harvard of the Midwest my alma mater as well but I think having that background of not just being computer science but where you have multiple disciplines involved when it comes to your jazz career you had basketball career you have obviously the career Iran now all that plays a huge role in being able to interpret and take multiple domains and put it into one so thank you for being here dad yeah thanks max that's a great introduction I rep reciate that no it's it's wonderful to have you and for our listeners who are not aware bill is referring him is Bill like my dad but I call my dad the whole time is gonna drive me crazy bill has a mind that thinks not like most so he he sees things he thinks about it not just in terms of the single I guess trajectory that could be taken but the multiple domains that can go so both vertically and horizontally and when we talk about data data is something so commonly brought up in sports so commonly drop in performance and athletic development big data is probably one of the biggest guess catchphrases or hot words or sayings that people have nowadays but doesn't always have a lot of meaning to it because a lot of times we get the word big data and then we don't have action out of big data and bill specialty is not just big data but it's giving action out of big data with that going forward I think a lot of this talk to be talking about how to utilize Big Data how do you guys data in general how to organize it how to put yourself in a situation to get actionable insights and so just to start it off Becky talked a little bit on your background some of the things you've done and how you develop the insights that you have thanks max I have kind of a very nos a deep background but I've been doing data analytics a long time and I was very fortunate one of those you know Forrest Gump moments in life where in the late 1980s I was involved in a project at Procter & Gamble I ran the project where we brought in Walmart's point of sales data for the first time into a what we would now call a data warehouse and for many of this became the launching point of the data warehouse bi marketplace and we can trace the effect the origins of many of the BI players to that project at Procter & Gamble in 87 and 88 and I spent a big chunk of my life just a big believer in business intelligence and data warehousing and trying to amass data together and trying to use that data to report on what's going on and writing insights and I did that for 20 25 years of my life until as you probably remember max I was recruited out Business Objects where I was the vice president of analytic applications I was recruited out of there by Yahoo and Yahoo had a very interesting problem which is they needed to build analytics for their advertisers to help those advertisers to optimize or spend across the Yahoo ad network and what I learned there in fact what I unlearned there was that everything that I had learned about bi and data warehouse and how you constructed data warehouses how you were so schema centric how everything was evolved around tabular data at Yahoo there was an entirely different approach the of my first introduction to Hadoop and the concept of a data Lake that was my first real introduction into data science and how to do predictive analytics and prescriptive analytics and in fact it was it was such a huge change for me that I was I was asked to come back to the TD WI data world Institute right was teaching for many years and I was asked to do a keynote after being at Yahoo for a year or so to share sort of what were the observations what did I learn and I remember I stood up there in front of about 600 people and I started my presentation by saying everything I've taught you the past 20 years is wrong and it was well I didn't get invited back for 10 years so that probably tells you something but it was really about unlearning a lot about what I had learned before and probably max one of the things that was most one of the aha moments for me was bi was very focused on understanding the questions that people were trying to ask an answer davus science is about us to understand the decisions they're trying to take action on questions by their very nature our informative but decisions are actionable and so what we did at Yahoo in order to really drive the help our advertisers optimize your spend across the Yahoo ad network is we focus on identifying the decisions the media planners and buyers and the campaign managers had to make around running a campaign know what what how much money to allocate to what sides how much how many conversions do I want how many impressions do I want so all the decisions we built predictive analytics around so that we can deliver prescriptive actions to these two classes of stakeholders the media planners and buyers and the campaign managers who had no aspirations about being analysts they're trying to be the best digital marketing executives or you know or people they could possibly be they didn't want to be analysts so and that sort of leads me to where I am today and my my teaching my books my blogs everything I do is very much around how do we take data and analytics and help organizations become more effective so everything I've done since then the books I've written the teaching I do with University of San Francisco and next week at the National University of Ireland and Galway and all the clients I work with is really how do we take data and analytics and help organizations become more effective at driving the decisions that optimize their business and their operational models it's really about decisions and how do we leverage data and analytics to drive those decisions so what would how would you define the difference between a question that someone's trying to answer versus a decision but they're trying to be better informed on so here's what I'd put it I call it the Sam test I am and that is it strategic is it actionable is it material and so you can ask questions that are provocative but you might not fast questions that are strategic to the problems you're trying to solve you may not be able to ask questions that are actionable in a sense you know what to do and you don't necessarily ask questions that are material in the sense that the value of that question is greater than the cost of answering that question right and so if I think about the Sam test when I apply it to data science and decisions when I start mining the data so I know what decisions are most important I'm going through a process to identify to validate the value and prioritize those decisions right I understand what decisions are most important now when I start to dig through the data all this structured unstructured data across a number different data sources I'm looking for I'm trying to codify patterns and relationships buried in that data and I'm applying the Sam test is that against those insights is it strategic to the problem I'm trying to solve can I actually act on it and is it material in the sense that it's it's it's more valuable to act than it is to create the action around it so that's the to me that big difference is by their very nature decisions are actually trying to make a decision I'm going to take an action questions by their nature are informative interesting they could be very provocative you know questions have an important role but ultimately questions do not necessarily lead to actions so if I'm a a sport coach I'm writing a professional basketball team some of the decisions I'm trying to make are I'm deciding on what program best develops my players what metrics will help me decide who the best prospect is is that the right way of looking at it yeah so we did an exercise at at USF too to have the students go through an exercise - what question what decisions does Steve Kerr need to make over the next two games he's playing right and we go through an exercise of the identifying especially in game decisions exercise routes oh no how often are you gonna play somebody no how long are they gonna play what are the right combinations what are the kind of offensive plays that you're gonna try to run so there's a know a bunch of decisions that Steve Kerr is coach of the Warriors for example needs to make in the game to not only try to win the game but to also minimize wear and tear on his players and by the way that's a really good point to think about the decisions good decisions are always a conflict of other ideas right win the game while minimizing wear and tear on my players right there's there are there are all the important decisions in life have two three or four different variables that may not be exactly the same which is by this is where data science comes in the data science is going to look across those three or four very other metrics against what you're going to measure success and try to figure out what's the right balance of those given the situation I'm in so if going back to the decision about about playing time well think about all the data you might want to look at in order to optimize that so when's the next game how far are they in this in this in the season where do they currently sit ranking wise how many minutes per game has player X been playing looking over the past few years what's there you know what's their maximum point so there's there's a there's not a lot of decisions that people are trying to make and by the way the beauty of the decisions is the decisions really haven't changed in years right what's changed is not the decisions it's the answers and the answers have changed because we have this great bound of data available to us in game performance health data you know all DNA data all kinds of other data and then we have all these great advanced analytic techniques now neural networks and unstructured supervised machine learning on right all this great technology now that can help us to uncover those relationships and patterns that are buried in the data that we can use to help individualize those decisions one last point there the point there to me at the end when when people talk about Big Data they get fixated on the big part the volume part it's not the volume of big data that I'm going to monetize it's the granularity and what I mean by that is I now have the ability to build very detailed profiles going back to our basketball example I can build a very detailed performance profile on every one of my players so for every one of the players on the Warriors team I can build a very detailed profile it the details out you know what's their optimal playing time you know how much time should they spend before a break on the feet on the on the on the court right what are the right combinations of players in order to generate the most offense or the best defense I can build these very detailed individual profiles and then I can start mission together to find the right combination so when we talk about big it's not the volume it's interesting it's the granularity gotcha and what's interesting from my world is so when you're dealing with marketing and business a lot of that when you're developing whether it be a company that you're trying to find more out about your customers or your startup trying to learn about what product you should develop there's tons of unknowns and a lot of big data from my understanding it can help you better understand some patterns within customers how to market you know in your book you talk about oh we need to increase sales at Chipotle because we understand X Y & Z our current around us now in the sports science world we have our friend called science and science has helped us early identify certain metrics that are very important and correlated to different physiological outcomes so it almost gives us a shortcut because in the big data world especially when you're dealing with the data that you guys are dealing with and trying to understand customer decisions each customer is individual and you're trying to compile all together to find patterns no one's doing science on that right it's not like a lab work where someone is understanding muscle protein synthesis and the amount of nutrients you need to recover from it so in my position I have all these pillars that maybe exist already where I can begin my search there's still a bunch of unknowns with that kind of environment do you take a different approach or do you still go with the I guess large encompassing and collect everything you can and siphon after maybe I'm totally wrong I'll let you take it away no that's it's a it's a good question and what's interesting about that max is that the human body is governed by a series of laws we'll say in each me see ology and the things you've talked about physics they have laws humans as buyers you know shoppers travelers we have propensity x' we don't have laws right I have a propensity that I'm gonna try to fly United because I get easier upgrades but I might fly you know Southwest because of schedule or convenience right I have propensity x' I don't have laws so you have laws that work to your advantage what's interesting about laws that they start going into the world of IOT and this concept called digital twins they're governed by laws of physics I have a compressor or a chiller or an engine and it's got a bunch of components in it that have been engineered together and I can actually apply the laws I can actually run simulations against my digital twins to understand exactly when is something likely to break what's the remaining useful life in that product what's the severity of the the maintenance I need to do on that so the human body unlike the human psyche is governed by laws human behaviors are really hard right and we move the las vegas is built on the fact that human behaviors are so flawed but body mate but bat body physics like the physics that run these devices you can actually build models and one simulation to figure out exactly how you know what's the wear and tear and what's the extensibility of what you can operate in gotcha yeah so that's when from our world you start looking at subsystems and you say okay this is your muscular system this is your autonomic nervous system this is your central nervous system these are ways that we can begin to measure it and then we can wrote a blog on this that's a stress response model where you understand these systems and their inferences for the most part and then you apply a stress and you see how the body responds and even you determine okay well if I know the body I can only respond in a certain number of ways it's either compensatory it's gonna be you know returning to baseline and by the mal adaptation but there's only so many ways when you look at a cell at the individual level that that cell can actually respond and it's the aggregation of all these cellular responses that end up and manifest in a change in a subsystem and that subsystem can be measured inferential II through certain technology that we have but I also think at the same time we make a huge leap and that leap is the word inference right we're making an assumption and sometimes those assumptions are very dangerous and they lead to because that assumptions unknown and we're wrong on it then we kind of sway and missed a little bit on our whole projection so I like the idea of looking at patterns and look at the probabilistic nature of it and I'm actually kind of recently change my view a little bit from my room first I talked about this I was much more hardwired and laws but I think it's a law but maybe a law with some level of variation or standard deviation and it we have guardrails instead so that's kind of how I think about it personally is that something that you say that's on the right track for that or how would you approach it yeah actually there's a lot of similarities max so your description of the human body made up of subsystems when we talk to organizations about things like smart cities or smart malls or smart hospitals a smart city is comprised of a it's made up of a series of subsystems right I've got subsystems regarding water and wastewater traffic safety you know local development things like this look there's a bunch of subsystems that make a city work and each of those subsystems is comprised of a series of decisions or clusters of decisions with equal use cases around what you're trying to optimize so if I'm trying to improve traffic flow if one of my subsystems is practically flow there are a bunch of use cases there about where do I do maintenance where do I expand the roads you know where do I put HOV lanes right so and so you start taking apart the smart city into the subsystems and then know the subsystems are comprised of use cases that puts you into really good position now here's something we did recently with a client who is trying to think about building the theme park of the future and how do we make certain that we really have a holistic view of the use cases that I need to go after it's really easy to identify the use cases within your own four walls but digital transformation in particular happens outside the four walls of an organization and so what we what we're doing is a process where we're building journey maps for all their key stakeholders so you've got a journey map for a customer you have a journey map for operations you have a journey map for partners and such so you you build these journey maps and you start thinking about for example I'm a theme park and at some point in time my guest / customer is going to have a pity they want to go do something you want to go on vacation at that point in time that theme park is competing against not only all the other theme parks but it's competing against major league baseball who's got things it's competing against you know going to the beach in Sanibel Island just hanging around right there they're competing at that point and if they only start engaging the customer when the customers actually contacted them they must a huge part of the market they made you miss a huge chance to influence that person's agenda and so one of the things that think about I don't know how this applies to your space max but as we started thinking about smart entities we use design thinking and customer journey match there's a way to make certain that we're not fooling ourselves by only looking within the four walls of our organization that we're knocking those walls down making them very forest and we're looking at what happens before somebody engages it with us and even afterwards so again going back to the theme park example once they leave the theme park they're probably posting on social media what kind of fun they had or fun they didn't have they're probably making plans for next year they're talking to friends and other things so there's there's a bunch of stuff we're gonna call it afterglow that happens after event that you want to make certain that you're in part of influencing that so again I don't know how when you combined the data science of use cases and decisions with design thinking of journey Maps what that might mean to do that your business but for us in thinking about smart cities it's opened up all kinds of possibilities and most importantly for our customers it's opened up all kinds of new areas where they can create new sources of value so anyone listening to this need to understand that when the word client or customer is used it can be substituted for athlete and what I think is really important is that when we hear you talk about your the the amount of infrastructure you do for an idea when you approach a situation is something that sports science for in my opinion especially across multiple domains it's truly lacking what happens is we get a piece of technology and someone says go do science while you're taking the approach of let's actually think out what we're doing beforehand let's determine our key performance indicators let's understand maybe the journey that this piece of technology is going to take with the athlete or how the athletes going to interact with this piece of technology throughout their four years if you're in the private sector right that afterglow effect might be something that you refer to as a client retention and their ability to come back over and over and spread your own word for you if you're in the sector with student athletes maybe it's those athletes talking highly about your program to help with recruiting and understanding that developing athletes is going to help you know make that college more enticing to go to or that program or that organization but what really stood out was the fact that you have this infrastructure built beforehand and the example I give I spoke with a good number of organizations and teams about data utilization is that if if you're to all of a sudden be dropped in the middle of the woods and someone says go build a cabin now how was it a giant forest I could use as much wood as I want I could just keep chopping down trees until I had something that had with a shelter of some sort right even I could probably do that well if someone said you know what you have three trees to cut down to make a cabin you could become very efficient and you're going to think about each chop in each piece of wood and how it's going to be used and your interaction with that wood and conjunction with that woods interaction with yourself and so when we start looking at athlete development and we're looking at client retention or we're looking at general health and wellness it's not just oh this is a great idea right we want to make the world's greatest theme park and we want to make the world's greatest training facility but what infrastructure and steps you need to take and you said stakeholders so what individuals am i working with am I talking with the physical therapist am i talking with the athletic trainer am I talking with the skill coach how does the skill coach want the data presented to them maybe that's different than how the athletic trainer is going to have a day to present it to them maybe the sport coach doesn't want to see the data unless something a red flag comes up so now you have all these different entities just like how you're talking about developing this customer journey throughout the theme park and making sure that they have a you know an experience that's memorable and causes an afterglow and really gives that experience meaning how can we now take data and apply it in the same way so we get the most value like you said on the granular aspect of data and really turn that into something valuable max you said something really important and one of the things that let me share one of many horror stories that that that comes up in my daily life which is somebody walking up to me and saying hey I got a client here's their data you know go do some science on it like well well what the heck right so when we created this thing called the hypothesis development canvas our sales teams hate it or do the time our data science teams love it because we do all this pre work we just say we make sure we understand the problem we're going after the decision they're trying to make the KPI is it's what you're going to measure success in progress what are they the operational and financial business benefits what are the data sources we want to consider here's something by the way that's it's important that maybe I wish Boeing would have thought more about which is what are the costs of false positives and false negatives right do you really understand where your risks points are and the reason why false positive and false negatives are really important in data science because data size is making predictions and by virtue of making predictions we are never 100% certain that's right or not predictions hath me built on I'm good enough well when is good enough good enough and a lot of that determination as to when is good enough good enough is really around the cost of false positives and false negatives think about a professional athlete like the false the you know the ramifications of overtraining professional athlete like a Kevin Durant or Steph Curry and they're out for the playoffs as huge financial implications them personally and for the organization so you really need to make sure you understand exactly what's the cost of being wrong and so this hypothesis development canvas is we do a lot of this work before we ever put science to the data that yeah it's it's something that's lacking across not just sports science but many fields and what I mean by that is especially you referred to the hypothesis canvas it's a piece of paper that provides a common language right it's you can sit it out before and for listeners who aren't aware a hypothesis canvas is something bill has worked and developed with his team and it's about 13 different squares and boxes and you can manipulate it based on your own profession and what you're diving into but essentially it goes through the infrastructure that you need to have setup in order for this hypothesis or idea or decision to actually be worth a damn and what I mean by that is that so many times and I hate this but I'm gonna go in a little bit of a rant and I apologize that people think oh I get an idea and they think Thomas Edison all son just had an idea and he made a light bulb Thomas Edison's famous for saying you know I did you know make a light bulb I learned was a 9000 ways to not make a light bulb and what I mean by that is he set an environment that allowed for failure and allowed for learning but what happens often people think oh I have an idea they think the idea comes not just you know in a flash because it always doesn't it might come from some research but they also believe that it comes with legs and it comes with the infrastructure supported around it that's kind of the same way that I see a lot of the data aspect going in regards to our field is that we did an idea we immediately implement and we hope it works as opposed to set up a learning environment that allows you to go okay here's what I think might happen here's my hypothesis here's I'm going to apply it and now if I fail because I have the infrastructure pre mapped out I can look at my infrastructure and say you know what that support beam or that individual box itself was the weak link and we made a mistake here but we can go back and fix it

Published Date : Mar 25 2019

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

EntityCategoryConfidence
Steve KerrPERSON

0.99+

Kevin DurantPERSON

0.99+

Procter & GambleORGANIZATION

0.99+

Steph CurryPERSON

0.99+

YahooORGANIZATION

0.99+

Sanibel IslandLOCATION

0.99+

10 yearsQUANTITY

0.99+

Procter & GambleORGANIZATION

0.99+

ChipotleORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

threeQUANTITY

0.99+

a yearQUANTITY

0.99+

9000 waysQUANTITY

0.99+

BoeingORGANIZATION

0.99+

Hitachi van TaraORGANIZATION

0.99+

Bill SchmarzoPERSON

0.99+

twoQUANTITY

0.99+

100%QUANTITY

0.99+

fourQUANTITY

0.99+

BeckyPERSON

0.99+

Thomas EdisonPERSON

0.99+

IOCORGANIZATION

0.99+

each pieceQUANTITY

0.99+

WarriorsORGANIZATION

0.99+

University of San FranciscoORGANIZATION

0.99+

HadoopTITLE

0.99+

eachQUANTITY

0.99+

each chopQUANTITY

0.99+

next yearDATE

0.98+

Thomas EdisonPERSON

0.98+

four yearsQUANTITY

0.98+

firstQUANTITY

0.98+

next weekDATE

0.98+

todayDATE

0.98+

billPERSON

0.98+

late 1980sDATE

0.98+

Forrest GumpPERSON

0.98+

20 25 yearsQUANTITY

0.97+

first timeQUANTITY

0.97+

two classesQUANTITY

0.97+

HarvardORGANIZATION

0.97+

first introductionQUANTITY

0.96+

four different variablesQUANTITY

0.96+

singleQUANTITY

0.94+

Coe CollegeORGANIZATION

0.94+

each customerQUANTITY

0.94+

two gamesQUANTITY

0.94+

bothQUANTITY

0.94+

DeanPERSON

0.93+

about 600 peopleQUANTITY

0.93+

yearsQUANTITY

0.92+

USFORGANIZATION

0.92+

ta world InstituteORGANIZATION

0.92+

oneQUANTITY

0.91+

one of my subsystemsQUANTITY

0.9+

about 13 different squaresQUANTITY

0.89+

a dayQUANTITY

0.88+

GalwayLOCATION

0.86+

88DATE

0.86+

National University of IrelandORGANIZATION

0.85+

StrongyByScienceTITLE

0.82+

BillPERSON

0.81+

SouthwestLOCATION

0.81+

TD WIORGANIZATION

0.81+

tons of unknownsQUANTITY

0.81+

Sam testTITLE

0.8+

bill SchwarzPERSON

0.8+

lot of timesQUANTITY

0.78+

87DATE

0.78+

three treesQUANTITY

0.78+

boxesQUANTITY

0.77+

many timesQUANTITY

0.74+

UnitedORGANIZATION

0.72+

one last pointQUANTITY

0.7+

one of the thingsQUANTITY

0.68+

past 20 yearsDATE

0.67+

Part OneOTHER

0.67+

other metricsQUANTITY

0.65+

IranORGANIZATION

0.65+

four wallsQUANTITY

0.63+

past few yearsDATE

0.62+

maxPERSON

0.62+

Robert Walsh, ZeniMax | PentahoWorld 2017


 

>> Announcer: Live from Orlando, Florida it's theCUBE covering Pentaho World 2017. Brought to you by Hitachi Vantara. (upbeat techno music) (coughs) >> Welcome to Day Two of theCUBE's live coverage of Pentaho World, brought to you by Hitachi Vantara. I'm your host Rebecca Knight along with my co-host Dave Vellante. We're joined by Robert Walsh. He is the Technical Director Enterprise Business Intelligence at ZeniMax. Thanks so much for coming on the show. >> Thank you, good morning. >> Good to see ya. >> I should say congratulations is in order (laughs) because you're company, ZeniMax, has been awarded the Pentaho Excellence Award for the Big Data category. I want to talk about the award, but first tell us a little bit about ZeniMax. >> Sure, so the company itself, so most people know us by the games versus the company corporate name. We make a lot of games. We're the third biggest company for gaming in America. And we make a lot of games such as Quake, Fallout, Skyrim, Doom. We have game launching this week called Wolfenstein. And so, most people know us by the games versus the corporate entity which is ZeniMax Media. >> Okay, okay. And as you said, you're the third largest gaming company in the country. So, tell us what you do there. >> So, myself and my team, we are primarily responsible for the ingestion and the evaluation of all the data from the organization. That includes really two main buckets. So, very simplistically we have the business world. So, the traditional money, users, then the graphics, people, sales. And on the other side we have the game. That's where a lot of people see the fun in what we do, such as what people are doing in the game, where in the game they're doing it, and why they're doing it. So, get a lot of data on gameplay behavior based on our playerbase. And we try and fuse those two together for the single viewer or customer. >> And that data comes from is it the console? Does it come from the ... What's the data flow? >> Yeah, so we actually support many different platforms. So, we have games on the console. So, Microsoft, Sony, PlayStation, Xbox, as well as the PC platform. Mac's for example, Android, and iOS. We support all platforms. So, the big challenge that we have is trying to unify that ingestion of data across all these different platforms in a unified way to facilitate downstream the reporting that we do as a company. >> Okay, so who ... When it says you're playing the game on a Microsoft console, whose data is that? Is it the user's data? Is it Microsoft's data? Is it ZeniMax's data? >> I see. So, many games that we actually release have a service act component. Most of our games are actually an online world. So, if you disconnect today people are still playing in that world. It never ends. So, in that situation, we have all the servers that people connect to from their desktop, from their console. Not all but most data we generate for the game comes from the servers that people connect to. We own those. >> Dave: Oh, okay. >> Which simplifies greatly getting that data from the people. >> Dave: So, it's your data? >> Exactly. >> What is the data telling you these days? >> Oh, wow, depends on the game. I think people realize what people do in games, what games have become. So, we have one game right now called Elder Scrolls Online, and this year we released the ability to buy in-game homes. And you can buy furniture for your in-game homes. So, you can furnish them. People can come and visit. And you can buy items, and weapons, and pets, and skins. And what's really interesting is part of the reason why we exist is to look at patterns and trends based on people interact with that environment. So for example, we'll see America playerbase buy very different items compared to say the European playerbase, based on social differences. And so, that helps immensely for the people who continuously develop the game to add items and features that people want to see and want to leverage. >> That is fascinating that Americans and Europeans are buying different furniture for their online homes. So, just give us some examples of the difference that you're seeing between these two groups. >> So, it's not just the homes, it applies to everything that they purchase as well. It's quite interesting. So, when it comes to the Americans versus Europeans for example what we find is that Europeans prefer much more cosmetic, passive experiences. Whereas the Americans are much things that stand out, things that are ... I'm trying to avoid stereotypes right now. >> Right exactly. >> It is what it is. >> Americans like ostentatious stuff. >> Robert: Exactly. >> We get it. >> Europeans are a bit more passive in that regard. And so, we do see that. >> Rebecca: Understated maybe. >> Thank you, that's a much better way of putting it. But games often have to be tweaked based on the environment. A different way of looking at it is a lot of companies in career in Asia all of these games in the West and they will have to tweak the game completely before it releases in these environments. Because players will behave differently and expect different things. And these games have become global. We have people playing all over the world all at the same time. So, how do you facilitate it? How do you support these different users with different needs in this one environment? Again, that's why BI has grown substantially in the gaming industry in the past five, ten years. >> Can you talk about the evolution of how you've been able to interact and essentially affect the user behavior or response to that behavior. You mentioned BI. So, you know, go back ten years it was very reactive. Not a lot of real time stuff going on. Are you now in the position to effect the behavior in real time, in a positive way? >> We're very close to that. We're not quite there yet. So yes, that's a very good point. So, five, ten years ago most games were traditional boxes. You makes a game, you get a box, Walmart or Gamestop, and then you're finished. The relationship with the customer ends. Now, we have this concept that's used often is games as a service. We provide an online environment, a service around a game, and people will play those games for weeks, months, if not years. And so, the shift as well as from a BI tech standpoint is one item where we've been able to streamline the ingest process. So, we're not real time but we can be hourly. Which is pretty responsive. But also, the fact that these games have become these online environments has enabled us to get this information. Five years ago, when the game was in a box, on the shelf, there was no connective tissue between us and them to interact and facilitate. With the games now being online, we can leverage BI. We can be more real time. We can respond quicker. But it's also due to the fact that now games themselves have changed to facilitate that interaction. >> Can you, Robert, paint a picture of the data pipeline? We started there with sort of the different devices. And you're bringing those in as sort of a blender. But take us through the data pipeline and how you're ultimately embedding or operationalizing those analytics. >> Sure. So, the game theater, the game and the business information, game theater is most likely 90, 95% of our total data footprint. We generate a lot more game information than we do business information. It's just due to how much we can track. We can do so. And so, a lot of these games will generate various game events, game logs that we can ingest into a single data lake. And we can use Amazon S3 for that. But it's not just a game theater. So, we have databases for financial information, account users, and so we will ingest the game events as well as the databases into one single location. At that point, however, it's still very raw. It's still very basic. We enable the analysts to actually interact with that. And they can go in there and get their feet wet but it's still very raw. The next step is really taking that raw information that is disjointed and separated, and unifying that into a single model that they can use in a much more performant way. In that first step, the analysts have the burden of a lot of the ETL work, to manipulate the data, to transform it, to make it useful. Which they can do. They should be doing the analysis, not the ingesting the data. And so, the progression from there into our warehouse is the next step of that pipeline. And so in there, we create these models and structures. And they're often born out of what the analysts are seeing and using in that initial data lake stage. So, they're repeating analysis, if they're doing this on a regular basis, the company wants something that's automated and auditable and productionized, then that's a great use case for promotion into our warehouse. You've got this initial staging layer. We have a warehouse where it's structured information. And we allow the analysts into both of those environments. So, they can pick their poison in respects. Structured data over here, raw and vast over here based on their use case. >> And what are the roles ... Just one more follow up, >> Yeah. >> if I may? Who are the people that are actually doing this work? Building the models, cleaning the data, and shoring data. You've got data scientists. You've got quality engineers. You got data engineers. You got application developers. Can you describe the collaboration between those roles? >> Sure. Yeah, so we as a BI organization we have two main groups. We have our engineering team. That's the one I drive. Then we have reporting, and that's a team. Now, we are really one single unit. We work as a team but we separate those two functions. And so, in my organization we have two main groups. We have our big data team which is doing that initial ingestion. Now, we ingest billions of troves of data a day. Terabytes a data a day. And so, we have a team just dedicated to ingestion, standardization, and exposing that first stage. Then we have our second team who are the warehouse engineers, who are actually here today somewhere. And they're the ones who are doing the modeling, the structuring. I mean the data modeling, making the data usable and promoting that into the warehouse. On the reporting team, basically we are there to support them. We provide these tool sets to engage and let them do their work. And so, in that team they have a very split of people do a lot of report development, visualization, data science. A lot of the individuals there will do all those three, two of the three, one of the three. But they do also have segmentation across your day to day reporting which has to function as well as the more deep analysis for data science or predictive analysis. >> And that data warehouse is on-prem? Is it in the cloud? >> Good question. Everything that I talked about is all in the cloud. About a year and a half, two years ago, we made the leap into the cloud. We drunk the Kool-Aid. As of Q2 next year at the very latest, we'll be 100% cloud. >> And the database infrastructure is Amazon? >> Correct. We use Amazon for all the BI platforms. >> Redshift or is it... >> Robert: Yes. >> Yeah, okay. >> That's where actually I want to go because you were talking about the architecture. So, I know you've mentioned Amazon Redshift. Cloudera is another one of your solutions provider. And of course, we're here in Pentaho World, Pentaho. You've described Pentaho as the glue. Can you expand on that a little bit? >> Absolutely. So, I've been talking about these two environments, these two worlds data lake to data warehouse. They're both are different in how they're developed, but it's really a single pipeline, as you said. And so, how do we get data from this raw form into this modeled structure? And that's where Pentaho comes into play. That's the glue. If the glue between these two environments, while they're conceptually very different they provide a singular purpose. But we need a way to unify that pipeline. And so, Pentaho we use very heavily to take this raw information, to transform it, ingest it, and model it into Redshift. And we can automate, we can schedule, we can provide error handling. And so it gives us the framework. And it's self-documenting to be able to track and understand from A to B, from raw to structured how we do that. And again, Pentaho is allowing us to make that transition. >> Pentaho 8.0 just came out yesterday. >> Hmm, it did? >> What are you most excited about there? Do you see any changes? We keep hearing a lot about the ability to scale with Pentaho World. >> Exactly. So, there's three things that really appeal to me actually on 8.0. So, things that we're missing that they've actually filled in with this release. So firstly, we on the streaming component from earlier the real time piece we were missing, we're looking at using Kafka and queuing for a lot of our ingestion purposes. And Pentaho in releasing this new version the mechanism to connect to that environment. That was good timing. We need that. Also too, get into more critical detail, the logs that we ingest, the data that we handle we use Avro and Parquet. When we can. We use JSON, Avro, and Parquet. Pentaho can handle JSON today. Avro, Parquet are coming in 8.0. And then lastly, to your point you made as well is where they're going with their system, they want to go into streaming, into all this information. It's very large and it has to go big. And so, they're adding, again, the ability to add worker nodes and scale horizontally their environment. And that's really a requirement before these other things can come into play. So, those are the things we're looking for. Our data lake can scale on demand. Our Redshift environment can scale on demand. Pentaho has not been able to but with this release they should be able to. And that was something that we've been hoping for for quite some time. >> I wonder if I can get your opinion on something. A little futures-oriented. You have a choice as an organization. You could just take roll your own opensource, best of breed opensource tools, and slog through that. And if you're an internet giant or a huge bank, you can do that. >> Robert: Right. >> You can take tooling like Pentaho which is end to end data pipeline, and this dramatically simplifies things. A lot of the cloud guys, Amazon, Microsoft, I guess to a certain extent Google, they're sort of picking off pieces of the value chain. And they're trying to come up with as a service fully-integrated pipeline. Maybe not best of breed but convenient. How do you see that shaking out generally? And then specifically, is that a challenge for Pentaho from your standpoint? >> So, you're right. That why they're trying to fill these gaps in their environment. To what Pentaho does and what they're offering, there's no comparison right now. They're not there yet. They're a long way away. >> Dave: You're saying the cloud guys are not there. >> No way. >> Pentaho is just so much more functional. >> Robert: They're not close. >> Okay. >> So, that's the first step. However, though what I've been finding in the cloud, there's lots of benefits from the ease of deployment, the scaling. You use a lot of dev ops support, DBA support. But the tools that they offer right now feel pretty bare bones. They're very generic. They have a place but they're not designed for singular purpose. Redshift is the only real piece of the pipeline that is a true Amazon product, but that came from a company called Power Excel ten years ago. They licensed that from a separate company. >> Dave: What a deal that was for Amazon! (Rebecca and Dave laugh) >> Exactly. And so, we like it because of the functionality Power Excel put in many year ago. Now, they've developed upon that. And it made it easier to deploy. But that's the core reason behind it. Now, we use for our big data environment, we use Data Breaks. Data Breaks is a cloud solution. They deploy into Amazon. And so, what I've been finding more and more is companies that are specialized in application or function who have their product support cloud deployment, is to me where it's a sweet middle ground. So, Pentaho is also talking about next year looking at Amazon deployment solutioning for their tool set. So, to me it's not really about going all Amazon. Oh, let's use all Amazon products. They're cheap and cheerful. We can make it work. We can hire ten engineers and hack out a solution. I think what's more applicable is people like Pentaho, whatever people in the industry who have the expertise and are specialized in that function who can allow their products to be deployed in that environment and leverage the Amazon advantages, the Elastic Compute, storage model, the deployment methodology. That is where I see the sweet spot. So, if Pentaho can get to that point, for me that's much more appealing than looking at Amazon trying to build out some things to replace Pentaho x years down the line. >> So, their challenge, if I can summarize, they've got to stay functionally ahead. Which they're way ahead now. They got to maintain that lead. They have to curate best of breed like Spark, for example, from Databricks. >> Right. >> Whatever's next and curate that in a way that is easy to integrate. And then look at the cloud's infrastructure. >> Right. Over the years, these companies that have been looking at ways to deploy into a data center easily and efficiently. Now, the cloud is the next option. How do they support and implement into the cloud in a way where we can leverage their tool set but in a way where we can leverage the cloud ecosystem. And that's the gap. And I think that's what we look for in companies today. And Pentaho is moving towards that. >> And so, that's a lot of good advice for Pentaho? >> I think so. I hope so. Yeah. If they do that, we'll be happy. So, we'll definitely take that. >> Is it Pen-ta-ho or Pent-a-ho? >> You've been saying Pent-a-ho with your British accent! But it is Pen-ta-ho. (laughter) Thank you. >> Dave: Cheap and cheerful, I love it. >> Rebecca: I know -- >> Bless your cotton socks! >> Yes. >> I've had it-- >> Dave: Cord and Bennett. >> Rebecca: Man, okay. Well, thank you so much, Robert. It's been a lot of fun talking to you. >> You're very welcome. >> We will have more from Pen-ta-ho World (laughter) brought to you by Hitachi Vantara just after this. (upbeat techno music)

Published Date : Oct 27 2017

SUMMARY :

Brought to you by Hitachi Vantara. He is the Technical Director for the Big Data category. Sure, so the company itself, gaming company in the country. And on the other side we have the game. from is it the console? So, the big challenge that Is it the user's data? So, many games that we actually release from the people. And so, that helps examples of the difference So, it's not just the homes, And so, we do see that. We have people playing all over the world affect the user behavior And so, the shift as well of the different devices. We enable the analysts to And what are the roles ... Who are the people that are and promoting that into the warehouse. about is all in the cloud. We use Amazon for all the BI platforms. You've described Pentaho as the glue. And so, Pentaho we use very heavily about the ability to scale the data that we handle And if you're an internet A lot of the cloud So, you're right. Dave: You're saying the Pentaho is just So, that's the first step. of the functionality They have to curate best of breed that is easy to integrate. And that's the gap. So, we'll definitely take that. But it is Pen-ta-ho. It's been a lot of fun talking to you. brought to you by Hitachi

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Rebecca KnightPERSON

0.99+

RebeccaPERSON

0.99+

Robert WalshPERSON

0.99+

RobertPERSON

0.99+

DavePERSON

0.99+

PentahoORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

AsiaLOCATION

0.99+

WalmartORGANIZATION

0.99+

AmericaLOCATION

0.99+

ZeniMax MediaORGANIZATION

0.99+

ZeniMaxORGANIZATION

0.99+

Power ExcelTITLE

0.99+

second teamQUANTITY

0.99+

GoogleORGANIZATION

0.99+

twoQUANTITY

0.99+

two main groupsQUANTITY

0.99+

two groupsQUANTITY

0.99+

WolfensteinTITLE

0.99+

oneQUANTITY

0.99+

Orlando, FloridaLOCATION

0.99+

SonyORGANIZATION

0.99+

two functionsQUANTITY

0.99+

threeQUANTITY

0.99+

bothQUANTITY

0.99+

90, 95%QUANTITY

0.99+

next yearDATE

0.99+

Kool-AidORGANIZATION

0.99+

100%QUANTITY

0.99+

iOSTITLE

0.99+

todayDATE

0.99+

DoomTITLE

0.99+

yesterdayDATE

0.99+

Hitachi VantaraORGANIZATION

0.99+

two main bucketsQUANTITY

0.98+

GamestopORGANIZATION

0.98+

FalloutTITLE

0.98+

two environmentsQUANTITY

0.98+

first stepQUANTITY

0.98+

one itemQUANTITY

0.98+

Five years agoDATE

0.98+

AndroidTITLE

0.98+

one gameQUANTITY

0.98+

Pentaho WorldTITLE

0.98+

three thingsQUANTITY

0.98+

first stageQUANTITY

0.98+

Pen-ta-ho WorldORGANIZATION

0.98+

Pentaho Excellence AwardTITLE

0.98+

this yearDATE

0.98+