Breaking Analysis: Living Digital: New Rules for Technology Events
from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation you know for years marketers marketers have been pushing for more digital especially with their big conferences I heard forward-thinking CMO say the war will be won in digital but the sales teams love the belly-to-belly interaction so every year once or even sometimes more often big corporations have hosted gatherings of thousands or even tens of thousands of attendees these events were like rock concerts they had DJs in the hallway thumping music giant screens beautiful pitches highly produced videos thing a technical breakouts Food lines private dinners etc all come on it culminating in a customer appreciation event with a big-name band physical events are expensive but they generate tons of leads for the host companies and their partner ecosystems well then BOOM coronavirus hits and the marketing teams got what they wished for right overnight virtual events became a mandate if you don't have a solution you were in big trouble because your leads from these large events just dried up hello everyone this is Dave Allen day and welcome to this week's cube insights powered by ETR ETR is entering its quiet period and I won't be able to share any new data for a couple of weeks so rather than look back at the April survey in this breaking analysis we thought we'd take a pause and really talk about the virtual event landscape and just a few of the things that we've learned in the past 120 days now this isn't meant to be an exhaustive list but we do want to call out a few important items that we see is critical in this new digital world in the isolation economy every company scrambled they took one of three paths first companies either postpone their events to buy some time think like Dell technology world Google cloud next cube convey my MIT CBO event etc or to some companies flat-out canceled their events for the year until next year like snowflake and uipath forth number three they scrambled to deploy a virtual event and they went forward IBM think did this HPE discover Susac on AWS summits docker convey Monde a peggle world Vertica big data conference octane sa P sapphire and hundreds of others pushed forward so when this braking analysis I want to share some data from the cube what we've learned not only in the last hundred and twenty days but in ten years of doing events mostly physical and we want to share the new rules of events and event marketing and beyond so let's get right into it everyone knows events events have gone virtual and there are tons of people who could give you advice on approving your digital events including us and and I will in this segment but the first thing that everyone found out is they're going to attract far more people online with a free virtual event than they do with a paid physical event so removing time timing in the expensive travel dramatically increases the participation Tam the total available market here's a tweet from docker CEO Scott Johnson he says that he's looking forward to welcoming 50,000 people to his event this is based on registration data somewhere around 30,000 people logged into the live event so docker got 60% of the pre event registrants to actually log in which is outstanding but there's a lot more to this story I'll share some other stats that are worth mentioning by the way I got permission from docker to to share these numbers not surprising because the event was it was a huge success for such a small company in the end they got nearly 83,000 registrations and they continue to come in weeks after the event which was held in late May now marketers generally will cite 2 to 3 minutes as a respect-- respectable time on site for a web property docker logged in users averaged almost four and a half minutes on site that's the average the bell curve sauce superfans like this guy who was binge watching so this brings me to rule number one it's actually really easy to get people to sign up for free online events but it's not so easy to keep them there now I could talk all day about what docker did right and I'm gonna bring some examples in during this except this segment but the one thing docker did was they did a call for papers or a call for sessions and that's a lot of work but if you look at the docket on speaker list the content is all community driven not all but mostly community driven talker had to break some eggs and reject some folks but it also had a sponsor track so it gave folks another avenue to participate so big success for docker they definitely did it right which brings us to new rule number two attention is precious you got to create high-quality content and realize that you have much less time with participants than if they were in person now unfortunately the doctor docker example is a bit of an outlier it hasn't always been this pretty remember that scene in the social network the movie when a duardo pulled the funding on the servers just to get marks attention remember how Jesse Eisenberg the actor who played Zuckerberg reacted everybody else we don't crash ever if the server's are down for even a day our entire reputation is irreversibly destroyed the whole point well some of the big tech companies crashed their servers and they say there's no such thing as bad press but look at look what happened to s AP and s AP apologized publicly and its CEO told people that they made a mistake in outsourcing their event platform so this brings us to new rule number three don't crash now I come back to Dhaka Khan for a second here's a tweet from a developer who shared the network traffic profile of his network before and during docker con you can see no glitches I mean I don't mean to pick on sa P they they owned the problem and look s AP had a huge attention attendance at its digital event more than 200,000 people and over a million views so Wow you'll kill me with that problem but it underscores the importance of scaling and s AP you have to say was not alone there have been lots of fails from much smaller events here's an example that was really frustrating you try to log in at 7:59 but the event doesn't start until 8:00 sharp really come on back in 60 seconds and in another example there was a slide failure I mean many of these virtual events are glorified webinars so if you're going to rely on slide where make sure the slides will render its scale you maybe embed them into the video you know but at least this company had a back-up plan here's another example and I've redacted the email because I'm not here to throw anyone under the bus well you know kind of but but no reason to name names you know who they are but in this case an old legacy webinar platform failed and they had to move to WebEx and again at least there was a back-up plan so you know it's been tough in a lot of these cases here's a tweet from Jason Reed it kind of summed sums it up now what does he mean by vendors are not getting the job done not enough creativity well not only were platforms failing they weren't performing adequately but the virtual experience is leaving many users unenthused they're they're just one alt-tab away from something better if the virtual event fails to engage them so new rule number four is virtual events that look like webinars actually our webinar webinars I mean in fairness you know the industry had to pivot with no notice but this is why I always tell people start with the outcome that you want and work backwards that'll inform you as to the content strategy the new roles you need to assign and make no mistakes there are new rules you know there's no site inspection virtual and then you got to figure out what you want to use your experience to be there's a whole lot to figure out and this next next one is a bit of a throwaway because yeah it's so obvious and everyone talks about it but I want to bring it up because it's important because I'm amazed at how many virtual event speakers really haven't thought through their setup you can look good you know or at least less bad get those things called books and raise up the laptop figure out some better audio your better yet get a good kit send it to their home with a nice camera and a solid mic maybe you know a clearer IFB comms for the ear spend some money to look good just as you might go and buy a nice outfit even if you're a developer put on a clean t-shirt so rule number five don't cheap out on production value get your guests a good set up and coach them up it doesn't have to be over the top no just a bit thought out okay one of the biggest mistakes I've seen is event organisers they become enamored with a platform and the features of that platform that really don't support their objectives kind of feature creep or they have so many competing objectives and masters that they're serving that they lose sight of the user experience and then the event becomes a buffet of unused features rather than a buffet of engaging content now many have told me that Dave these virtual events are too long there's too much content now I don't necessarily agree I really think if you have something to say you should say it as long as you do it right and you keep people engaged so I want to talk a little bit about a to of the meteor events that we attended one was octane twenty20 hosted by octo the identity management security player and then IBM think 2020 they called it the the think digital event experience and they both had multi day events with lots of content they both organized sessions by topic and made it pretty easy to find stuff and all assessing sessions had a reasonably consistent look and feel to them which kind of helped the production value IBM had content organized and categorized which made things easy to find and they both had good search and with IBM you could go directly from the list of topics right into the videos which I really liked very easy and intuitive and as you can see here in this octane video they had a nice and very ambitious agenda that was really quite well organized and things were pretty easy to find as you can see with this crisp filtering on the left hand side and in really nice search but one of the things that has been frustrating with most of the events that I've watched is you can't get to the sessions directly from the agenda you got to go back out for some linear path and find the content and it's somewhat confusing so I want to come back to the docker count example because I think there were two things that I found interesting and useful with docker con you know this got George nailed it when he said this is how you display a virtual conference what's relevant about this picture is you have multiple simultaneous sessions running live and concurrently and you can pop in and out of them you can easily see the sessions and this tile and there's a red line this linear clock that's running in real time to show you where you are in the event agenda versus in a time of day so I felt like with docker that as a user user you're really connected to the event you come to the site and there's a hero video very easy to find the content and in fact you can't miss it it's not a sales pitch to get to the content and then I really liked what what George change was talking about in terms of the agenda and the tile layout you can see they ran simultaneous sessions and at one point up to seven at once and they gave their sponsors a track on the agenda which is very easy to navigate but what I really like as well is when you click on a tile it takes you directly to the session video and you can see the chat which docker preserved in the PO event mode and you have this easy-to-follow agenda and again you can go directly to the session video and in the chat from the agenda so many paths to find the content I mean something so simple is navigating directly from the agenda to the session most events haven't done that they make you back out and then what I call this linear manner and then go forward and find the sessions that you want and then dive in now maybe they're trying to simulate walking to a session in a Las Vegas Convention Center because it takes about that long to figure out where most of these events in these sessions live so rule number six is make it easy to discover and consume content sounds so simple why is it not happening in most events okay I'm running out of time so I want to encapsulate a number of items in one idea that we talk about all the time at the cube I ran a little survey of the day and someone asked does it really make sense to cram educational content product content partner content customer content rally content and leadership content into the constrain confines of an arbitrary one or two-day window I thought that was an interesting comment now it doesn't necessarily mean shorten up the virtual event which a lot of people think should happen people complain that these things are too long well let me leave you with this it's actually not just about events what do I mean by that well you know how everyone says that all companies are software companies or every company is a SAS company well guess what we believe that every company is a media company in 2004 at the low point of its reputation Microsoft launched channel 9 it was named after the United Airlines channel 9 that lets you listen in to the pilots and their unfiltered conversations kind of cool Microsoft understood that having an authentic voice with which to communicate to developers and serve its community was a smart thing to do and that is the key point channel 9 is about community it's not about audience metrics or lead generation both important things but Microsoft they launched this site understanding the leverage it gets out of its community of developers and instead of treating them like leads they created a site to help developers learn so rule number seven is get your best media mojo on one of the biggest failures I see with physical events and it's clearly carrying over to digital is the failure to optimize the post-event opportunity and experience so just like physical events when the event is over I see companies and their employees they're so burnt out after a virtual event because they feel like they've just given birth and what do they do now after the event they take some time off they got to recharge and when they come back they're swamped and so they're on to the next project it might be another event it might be a webinar series or some regional summits or whatever now it's interesting it feels like all tech companies talk about these days is breaking down silos but most of these parent and child events are disconnected silos sure maybe the data around the events is consolidated into a marketing cloud maybe so that you can nurture leads okay that's fine but what about the community kovat has given us a great opportunity to reimagine how we serve communities and one thing I'm certain about is that physical events they're going to come back at some point in some form but when they do there's gonna be a stronger digital component attached to them hybrids will emerge and some will serve communities better than others and in our opinion the ones that do the best job in digital and serving their communities are gonna win the marketing Wars so ask yourself how are you serving your community are you serving the best way that you can is a lead conversion your number one metric that's okay there's nothing wrong with that but how are your content consumption metrics looking what are you measuring what does your Arc of content look like what's your content and an organic media strategy what does your media stack look like media stack you ask what do you mean Dave well you nailed physical and then you were forced to do virtual overnight eventually there's going to be a hybrid that emerges so there's physical at the bottom and then there's a virtual layer and then you get this hybrid layer at some point on top of that at the very top of the stack you got apps social media you got corporate content you got TV like channel 9 you have video library's website you have tools for agile media you got media production and distribution tooling remember customers will be entering from any one of these layers of that stack and they'll be looking to you for guidance inspiration learning vision product knowledge how to's etc and you'd be delivering that primarily through content so your media stack should be designed to serve your community events software yeah sure but it's much more than that we believe that this stack will emerge not as a monolithic beast but rather as a set of scalable cloud services and api's think of paths for media that you can skin yes of course but also one that you can control add value to integrate with other platforms and fit your business as your community demands and remember new roles are emerging as a result of this pandemic and the pivot to digital the things are different really mostly from from most physical events is that it's very important to think about these roles and one of the important roles is this designer or UX developer that can actually do some coding and API integration think of it as a DevOps for digital organizations that's emerging organizations like yours will want self-service and sometimes out-of-the-box functionality and features for sure no question but we believe that as a media producer you will want to customize your media experience for your community and this work will require new skills that you haven't really prioritized in the past what what do you think what's your vision as to how this will all play out and unfold do you buy that all companies must become media companies or at least media savvy not in the sense of Corp comms but really as an organic media producer tweet me at devonté or email me at David Galante at Silicon angle comm or comment on my LinkedIn post who would react next week with some data from et our survey sphere thanks for watching this wiki bond cube insights powered by ETR this is Dave Volante we'll see you next time [Music]
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volante | PERSON | 0.99+ |
Jason Reed | PERSON | 0.99+ |
2004 | DATE | 0.99+ |
60% | QUANTITY | 0.99+ |
United Airlines | ORGANIZATION | 0.99+ |
Jesse Eisenberg | PERSON | 0.99+ |
2 | QUANTITY | 0.99+ |
ten years | QUANTITY | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Zuckerberg | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Scott Johnson | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Dhaka Khan | LOCATION | 0.99+ |
50,000 people | QUANTITY | 0.99+ |
David Galante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
next week | DATE | 0.99+ |
April | DATE | 0.99+ |
60 seconds | QUANTITY | 0.99+ |
more than 200,000 people | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
late May | DATE | 0.99+ |
3 minutes | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
Las Vegas Convention Center | LOCATION | 0.99+ |
Boston | LOCATION | 0.99+ |
tons of people | QUANTITY | 0.98+ |
hundreds | QUANTITY | 0.98+ |
over a million views | QUANTITY | 0.98+ |
first thing | QUANTITY | 0.98+ |
nearly 83,000 registrations | QUANTITY | 0.98+ |
one idea | QUANTITY | 0.98+ |
both | QUANTITY | 0.97+ |
two-day | QUANTITY | 0.97+ |
octane twenty20 | EVENT | 0.97+ |
tons of leads | QUANTITY | 0.97+ |
almost four and a half minutes | QUANTITY | 0.97+ |
AP | ORGANIZATION | 0.97+ |
Dell | ORGANIZATION | 0.96+ |
one | QUANTITY | 0.96+ |
SAS | ORGANIZATION | 0.95+ |
around 30,000 people | QUANTITY | 0.94+ |
docker | ORGANIZATION | 0.93+ |
channel 9 | ORGANIZATION | 0.93+ |
this week | DATE | 0.93+ |
thousands | QUANTITY | 0.91+ |
one point | QUANTITY | 0.9+ |
CEO | PERSON | 0.9+ |
devonté | PERSON | 0.89+ |
first companies | QUANTITY | 0.88+ |
a day | QUANTITY | 0.88+ |
pandemic | EVENT | 0.88+ |
kovat | ORGANIZATION | 0.87+ |
8:00 | DATE | 0.87+ |
WebEx | TITLE | 0.86+ |
number three | QUANTITY | 0.86+ |
rule number three | OTHER | 0.84+ |
MIT CBO | EVENT | 0.83+ |
ORGANIZATION | 0.82+ | |
tens of thousands of attendees | QUANTITY | 0.82+ |
one thing | QUANTITY | 0.82+ |
agile | TITLE | 0.81+ |
six | OTHER | 0.8+ |
every year | QUANTITY | 0.8+ |
7:59 | DATE | 0.79+ |
one | OTHER | 0.78+ |
Dave Allen | PERSON | 0.78+ |
multi day | QUANTITY | 0.75+ |
rule number | QUANTITY | 0.75+ |
couple of weeks | QUANTITY | 0.74+ |
docker | TITLE | 0.73+ |
ETR | ORGANIZATION | 0.73+ |
rule number four | QUANTITY | 0.73+ |
a lot of work | QUANTITY | 0.71+ |
rule number seven | QUANTITY | 0.71+ |
up to seven | QUANTITY | 0.7+ |
IBM DataOps in Action Panel | IBM DataOps 2020
from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation hi buddy welcome to this special noob digital event where we're focusing in on data ops data ops in Acton with generous support from friends at IBM let me set up the situation here there's a real problem going on in the industry and that's that people are not getting the most out of their data data is plentiful but insights perhaps aren't what's the reason for that well it's really a pretty complicated situation for a lot of organizations there's data silos there's challenges with skill sets and lack of skills there's tons of tools out there sort of a tools brief the data pipeline is not automated the business lines oftentimes don't feel as though they own the data so that creates some real concerns around data quality and a lot of finger-point quality the opportunity here is to really operationalize the data pipeline and infuse AI into that equation and really attack their cost-cutting and revenue generation opportunities that are there in front of you think about this virtually every application this decade is going to be infused with AI if it's not it's not going to be competitive and so we have organized a panel of great practitioners to really dig in to these issues first I want to introduce Victoria Stassi with who's an industry expert in a top at Northwestern you two'll very great to see you again thanks for coming on excellent nice to see you as well and Caitlin Alfre is the director of AI a vai accelerator and also part of the peak data officers organization at IBM who has actually eaten some of it his own practice what a creep let me say it that way Caitlin great to see you again and Steve Lewis good to see you again see vice president director of management associated a bank and Thompson thanks for coming on thanks Dave make speaker alright guys so you heard my authority with in terms of operationalizing getting the most insight hey data is wonderful insights aren't but getting insight in real time is critical in this decade each of you is a sense as to where you are on that journey or Victoria your taste because you're brand new to Northwestern Mutual but you have a lot of deep expertise in in health care and manufacturing financial services but where you see just the general industry climate and we'll talk about the journeys that you are on both personally and professionally so it's all fair sure I think right now right again just me going is you need to have speech insight right so as I experienced going through many organizations are all facing the same challenges today and a lot of those pounds is hard where do my to live is my data trust meaning has a bank curated has been Clinton's visit qualified has a big a lot of that is ready what we see often happen is businesses right they know their KPIs they know their business metrics but they can't find where that data Linda Barragan asked there's abundant data disparity all over the place but it is replicated because it's not well managed it's a lot of what governance in the platform of pools that governance to speak right offer fact it organizations pay is just that piece of it I can tell you where data is I can tell you what's trusted that when you can quickly access information and bring back answers to business questions that is one answer not many answers leaving the business to question what's the right path right which is the correct answer which which way do I go at the executive level that's the biggest challenge where we want the industry to go moving forward right is one breaking that down along that information to be published quickly and to an emailing data virtualization a lot of what you see today is most businesses right it takes time to build out large warehouses at an enterprise level we need to pivot quicker so a lot of what businesses are doing is we're leaning them towards taking advantage of data virtualization allowing them to connect to these data sources right to bring that information back quickly so they don't have to replicate that information across different systems or different applications right and then to be able to provide that those answers back quickly also allowing for seamless access to from the analysts that are running running full speed right try and find the answers as quickly as they find great okay and I want to get into that sort of how news Steve let me go to you one of the things that we talked about earlier was just infusing this this mindset of a data cult and thinking about data as a service so talk a little bit about how you got started what was the starting NICUs through that sure I think the biggest thing for us there is to change that mindset from data being just for reporting or things that have happened in the past to do some insights on us and some data that already existed well we've tried to shift the mentality there is to start to use data and use that into our actual applications so that we're providing those insight in real time through the applications as they're consumed helping with customer experience helping with our personalization and an optimization of our application the way we've started down that path or kind of the journey that we're still on was to get the foundation laid birch so part of that has been making sure we have access to all that data whether it's through virtualization like vic talked about or whether it's through having more of the the data selected in a data like that that where we have all of that foundational data available as opposed to waiting for people to ask for it that's been the biggest culture shift for us is having that availability of data to be ready to be able to provide those insights as opposed to having to make the businesses or the application or asked for that day Oh Kailyn when I first met into pulp andari the idea wobble he paid up there yeah I was asking him okay where does a what's the role of that at CBO and and he mentioned a number of things but two of the things that stood out is you got to understand how data affect the monetization of your company that doesn't mean you know selling the data what role does it play and help cut cost or ink revenue or productivity or no customer service etc the other thing he said was you've got a align with the lines of piss a little sounded good and this is several years ago and IBM took it upon itself Greek its own champagne I was gonna say you know dogfooding whatever but it's not easy just flip a switch and an infuse a I and automate the data pipeline you guys had to go you know some real of pain to get there and you did you were early on you took some arrows and now you're helping your customers better on thin debt but talk about some of the use cases that where you guys have applied this obviously the biggest organization you know one of the biggest in the world the real challenge is they're sure I'm happy today you know we've been on this journey for about four years now so we stood up our first book to get office 2016 and you're right it was all about getting what data strategy offered and executed internally and we want to be very transparent because as you've mentioned you know a lot of challenges possible think differently about the value and so as we wrote that data strategy at that time about coming to enterprise and then we quickly of pivoted to see the real opportunity and value of infusing AI across all of our needs were close to your question on a couple of specific use cases I'd say you know we invested that time getting that platform built and implemented and then we were able to take advantage of that one particular example that I've been really excited about I have a practitioner on my team who's a supply chain expert and a couple of years ago he started building out supply chain solution so that we can better mitigate our risk in the event of a natural disaster like the earthquake hurricane anywhere around the world and be cuz we invest at the time and getting the date of pipelines right getting that all of that were created and cleaned and the quality of it we were able to recently in recent weeks add the really critical Kovach 19 data and deliver that out to our employees internally for their preparation purposes make that available to our nonprofit partners and now we're starting to see our first customers take advantage too with the health and well-being of their employees mine so that's you know an example I think where and I'm seeing a lot of you know my clients I work with they invest in the data and AI readiness and then they're able to take advantage of all of that work work very quickly in an agile fashion just spin up those out well I think one of the keys there who Kaelin is that you know we can talk about that in a covet 19 contact but it's that's gonna carry through that that notion of of business resiliency is it's gonna live on you know in this post pivot world isn't it absolutely I think for all of us the importance of investing in the business continuity and resiliency type work so that we know what to do in the event of either natural disaster or something beyond you know it'll be grounded in that and I think it'll only become more important for us to be able to act quickly and so the investment in those platforms and approach that we're taking and you know I see many of us taking will really be grounded in that resiliency so Vic and Steve I want to dig into this a little bit because you know we use this concept of data op we're stealing from DevOps and there are similarities but there are also differences now let's talk about the data pipeline if you think about the data pipeline as a sort of quasi linear process where you're investing data and you might be using you know tools but whether it's Kafka or you know we have a favorite who will you have and then you're transforming that that data and then you got a you know discovery you got to do some some exploration you got to figure out your metadata catalog and then you're trying to analyze that data to get some insights and then you ultimately you want to operationalize it so you know and and you could come up with your own data pipeline but generally that sort of concept is is I think well accepted there's different roles and unlike DevOps where it might be the same developer who's actually implementing security policies picking it the operations in in data ops there might be different roles and fact very often are there's data science there's may be an IT role there's data engineering there's analysts etc so Vic I wonder if you could you could talk about the challenges in in managing and automating that data pipeline applying data ops and how practitioners can overcome them yeah I would say a perfect example would be a client that I was just recently working for where we actually took a team and we built up a team using agile methodologies that framework right we're rapidly ingesting data and then proving out data's fit for purpose right so often now we talk a lot about big data and that is really where a lot of industries are going they're trying to add an enrichment to their own data sources so what they're doing is they're purchasing these third-party data sets so in doing so right you make that initial purchase but what many companies are doing today is they have no real way to vet that so they'll purchase the information they aren't going to vet it upfront they're going to bring it into an environment there it's going to take them time to understand if the data is of quality or not and by the time they do typically the sales gone and done and they're not going to ask for anything back but we were able to do it the most recent claim was use an instructure data source right bring that and ingest that with modelers using this agile team right and within two weeks we were able to bring the data in from the third-party vendor what we considered rapid prototyping right be able to profile the data understand if the data is of quality or not and then quickly figure out that you know what the data's not so in doing that we were able to then contact the vendor back tell them you know it sorry the data set up to snuff we'd like our money back we're not gonna go forward with it that's enabling businesses to be smarter with what they're doing with 30 new purchases today as many businesses right now um as much as they want to rely on their own data right they actually want to rely on cross the data from third-party sources and that's really what data Ops is allowing us to do it's allowing us to think at a broader a higher level right what to bring the information what structures can we store them in that they don't necessarily have to be modeled because a modeler is great right but if we have to take time to model all the information before we even know we want to use it that's gonna slow the process now and that's slowing the business down the business is looking for us to speed up all of our processes a lot of what we heard in the past raised that IP tends to slow us down and that's where we're trying to change that perception in the industry is no we're actually here to speed you up we have all the tools and technologies to do so and they're only getting better I would say also on data scientists right that's another piece of the pie for us if we can bring the information in and we can quickly catalog it in a metadata and burn it bring in the information in the backend data data assets right and then supply that information back to scientists gone are the days where scientists are going and asking for connections to all these different data sources waiting days for access requests to be approved just to find out that once they figure out how it with them the relationship diagram right the design looks like in that back-end database how to get to it write the code to get to it and then figure out this is not the information I need that Sally next to me right fold me the wrong information that's where the catalog comes in that's where due to absent data governance having that catalog that metadata management platform available to you they can go into a catalog without having to request access to anything quickly and within five minutes they can see the structures what if the tables look like what did the fields look like are these are these the metrics I need to bring back answers to the business that's data apps it's allowing us to speed up all of that information you know taking stuff that took months now down two weeks down two days down two hours so Steve I wonder if you could pick up on that and just help us understand what data means you we talked about earlier in our previous conversation I mentioned it upfront is this notion of you know the demand for for data access is it was through the roof and and you've gone from that to sort of more of a self-service environment where it's not IT owning the data it's really the businesses owning the data but what what is what is all this data op stuff meaning in your world sure I think it's very similar it's it's how do we enable and get access to that clicker showing the right controls showing the right processes and and building that scalability and agility and into all of it so that we're we're doing this at scale it's much more rapidly available we can discover new data separately determine if it's right or or more importantly if it's wrong similar to what what Vic described it's it's how do we enable the business to make those right decisions on whether or not they're going down the right path whether they're not the catalog is a big part of that we've also introduced a lot of frameworks around scale so just the ability to rapidly ingest data and make that available has been a key for us we've also focused on a prototyping environment so that sandbox mentality of how do we rapidly stand those up for users and and still provide some controls but have provide that ability for people to do that that exploration what we're finding is that by providing the platform and and the foundational layers that were we're getting the use cases to sort of evolve and come out of that as opposed to having the use cases prior to then go build things from we're shifting the mentality within the organization to say we don't know what we need yet let's let's start to explore that's kind of that data scientist mentality and culture it more of a way of thinking as opposed to you know an actual project or implement well I think that that cultural aspect is important of course Caitlin you guys are an AI company or at least that you know part of what you do but you know you've you for four decades maybe centuries you've been organized around different things by factoring plant but sales channel or whatever it is but-but-but-but how has the chief data officer organization within IBM been able to transform itself and and really infuse a data culture across the entire company one of the approaches you know we've taken and we talk about sort of the blueprint to drive AI transformation so that we can achieve and deliver these really high value use cases we talked about the data the technology which we've just pressed on with organizational piece of it duration are so important the change management enabling and equipping our data stewards I'll give one a civic example that I've been really excited about when we were building our platform and starting to pull districting structured unstructured pull it in our ADA stewards are spending a lot of time manually tagging and creating business metadata about that data and we identified that that was a real pain point costing us a lot of money valuable resources so we started to automate the metadata and doing that in partnership with our deep learning practitioners and some of the models that they were able to build that capability we pushed out into our contacts our product last year and one of the really exciting things for me to see is our data stewards who be so value exporters and the skills that they bring have reported that you know it's really changed the way they're able to work it's really sped up their process it's enabled them to then move on to higher value to abilities and and business benefits so they're very happy from an organizational you know completion point of view so I think there's ways to identify those use cases particularly for taste you know we drove some significant productivity savings we also really empowered and hold our data stewards we really value to make their job you know easier more efficient and and help them move on to things that they are more you know excited about doing so I think that's that you know another example of approaching taken yes so the cultural piece the people piece is key we talked a little bit about the process I want to get into a little bit into the tech Steve I wonder if you could tell us you know what's it what's the tech we have this bevy of tools I mentioned a number of them upfront you've got different data stores you've got open source pooling you've got IBM tooling what are the critical components of the technology that people should be thinking about tapping in architecture from ingestion perspective we're trying to do a lot of and a Python framework and scaleable ingestion pipe frameworks on the catalog side I think what we've done is gone with IBM PAC which provides a platform for a lot of these tools to stay integrated together so things from the discovery of data sources the cataloging the documentation of those data sources and then all the way through the actual advanced analytics and Python models and our our models and the open source ID combined with the ability to do some data prep and refinery work having that all in an integrated platform was a key to us for us that the rollout and of more of these tools in bulk as opposed to having the point solutions so that's been a big focus area for us and then on the analytic side and the web versus IDE there's a lot of different components you can go into whether it's meal soft whether it's AWS and some of the native functionalities out there you mentioned before Kafka and Anissa streams and different streaming technologies those are all the ones that are kind of in our Ketil box that we're starting to look at so and one of the keys here is we're trying to make decisions in as close to real time as possible as opposed to the business having to wait you know weeks or months and then by the time they get insights it's late and really rearview mirror so Vic your focus you know in your career has been a lot on data data quality governance master data management data from a data quality standpoint as well what are some of the key tools that you're familiar with that you've used that really have enabled you operationalize that data pipeline you know I would say I'm definitely the IBM tools I have the most experience with that also informatica though as well those are to me the two top players IBM definitely has come to the table with a suite right like Steve said cloud pack for data is really a one-stop shop so that's allowing that quick seamless access for business user versus them having to go into some of the previous versions that IBM had rolled out where you're going into different user interfaces right to find your information and that can become clunky it can add the process it can also create almost like a bad taste and if in most people's mouths because they don't want to navigate from system to system to system just to get their information so cloud pack to me definitely brings everything to the table in one in a one-stop shop type of environment in for me also though is working on the same thing and I would tell you that they haven't come up with a solution that really comes close to what IBM is done with cloud pack for data I'd be interested to see if they can bring that on the horizon but really IBM suite of tools allows for profiling follow the analytics write metadata management access to db2 warehouse on cloud those are the tools that I've worked in my past to implement as well as cloud object store to bring all that together to provide that one stop that at Northwestern right we're working right now with belieber I think calibra is a great set it pool are great garments catalog right but that's really what it's truly made for is it's a governance catalog you have to bring some other pieces to the table in order for it to serve up all the cloud pack does today which is the advanced profiling the data virtualization that cloud pack enables today the machine learning at the level where you can actually work with our and Python code and you put our notebooks inside of pack that's some of this the pieces right that are missing in some of the under vent other vendor schools today so one of the things that you're hearing here is the theme of openness others addition we've talked about a lot of tools and not IBM tools all IBM tools there there are many but but people want to use what they want to use so Kaitlin from an IBM perspective what's your commitment the openness number one but also to you know we talked a lot about cloud packs but to simplify the experience for your client well and I thank Stephen Victoria for you know speaking to their experience I really appreciate feedback and part of our approach has been to really take one the challenges that we've had I mentioned some of the capabilities that we brought forward in our cloud platform data product one being you know automating metadata generation and that was something we had to solve for our own data challenges in need so we will continue to source you know our use cases from and grounded from a practitioner perspective of what we're trying to do and solve and build and the approach we've really been taking is co-creation line and that we roll these capability about the product and work with our customers like Stephen light victorious you really solicit feedback to product route our dev teams push that out and just be very open and transparent I mean we want to deliver a seamless experience we want to do it in partnership and continue to solicit feedback and improve and roll out so no I think that will that has been our approach will continue to be and really appreciate the partnerships that we've been able to foster so we don't have a ton of time but I want to go to practitioners on the panel and ask you about key key performance indicators when I think about DevOps one of the things that we're measuring is the elapsed time the deploy applications start finished where we're measuring the amount of rework that has to be done the the quality of the deliverable what are the KPIs Victoria that are indicators of success in operationalizing date the data pipeline well I would definitely say your ability to deliver quickly right so how fast can you deliver is that is that quicker than what you've been able to do in the past right what is the user experience like right so have you been able to measure what what the amount of time was right that users are spending to bring information to the table in the past versus have you been able to reduce that time to delivery right of information business answers to business questions those are the key performance indicators to me that tell you that the suite that we've put in place today right it's providing information quickly I can get my business answers quickly but quicker than I could before and the information is accurate so being able to measure is it quality that I've been giving that I've given back or is this not is it the wrong information and yet I've got to go back to the table and find where I need to gather that from from somewhere else that to me tells us okay you know what the tools we've put in place today my teams are working quicker they're answering the questions they need to accurately that is when we know we're on the right path Steve anything you add to that I think she covered a lot of the people components the around the data quality scoring right for all the different data attributes coming up with a metric around how to measure that and and then showing that trend over time to show that it's getting better the other one that we're doing is just around overall date availability how how much data are we providing to our users and and showing that trend so when I first started you know we had somewhere in the neighborhood of 500 files that had been brought into the warehouse and and had been published and available in the neighborhood of a couple thousand fields we've grown that into weave we have thousands of cables now available so it's it's been you know hundreds of percent in scale as far as just the availability of that data how much is out there how much is is ready and available for for people to just dig in and put into their their analytics and their models and get those back into the other application so that's another key metric that we're starting to track as well so last question so I said at the top that every application is gonna need to be infused with AI this decade otherwise that application not going to be as competitive as it could be and so for those that are maybe stuck in their journey don't really know where to get started I'll start with with Caitlin and go to Victoria and then and then even bring us home what advice would you give the people that need to get going on this my advice is I think you pull the folks that are either producing or accessing your data and figure out what the rate is between I mentioned some of the data management challenges we were seeing this these processes were taking weeks and prone to error highly manual so part was ripe for AI project so identifying those use cases I think that are really causing you know the most free work and and manual effort you can move really quickly and as you build this platform out you're able to spin those up on an accelerated fashion I think identifying that and figuring out the business impact are able to drive very early on you can get going and start really seeing the value great yeah I would actually say kids I hit it on the head but I would probably add to that right is the first and foremost in my opinion right the importance around this is data governance you need to implement a data governance at an enterprise level many organizations will do it but they'll have silos of governance you really need an interface I did a government's platform that consists of a true framework of an operational model model charters right you have data domain owners data domain stewards data custodians all that needs to be defined and while that may take some work in in the beginning right the payoff down the line is that much more it's it it's allowing your business to truly own the data once they own the data and they take part in classifying the data assets for technologists and for analysts right you can start to eliminate some of the technical debt that most organizations have acquired today they can start to look at what are some of the systems that we can turn off what are some of the systems that we see valium truly build out a capability matrix we can start mapping systems right to capabilities and start to say where do we have wares or redundancy right what can we get rid of that's the first piece of it and then the second piece of it is really leveraging the tools that are out there today the IBM tools some of the other tools out there as well that enable some of the newer next-generation capabilities like unit nai right for example allowing automation for automation which right for all of us means that a lot of the analysts that are in place today they can access the information quicker they can deliver the information accurately like we've been talking about because it's been classified that pre works being done it's never too late to start but once you start that it just really acts as a domino effect to everything else where you start to see everything else fall into place all right thank you and Steve bring us on but advice for your your peers that want to get started sure I think the key for me too is like like those guys have talked about I think all everything they said is valid and accurate thing I would add is is from a starting perspective if you haven't started start right don't don't try to overthink that over plan it it started just do something and and and start the show that progress and value the use cases will come even if you think you're not there yet it's amazing once you have the national components there how some of these things start to come out of the woodwork so so it started it going may have it have that iterative approach to this and an open mindset it's encourage exploration and enablement look your organization in the eye to say why are their silos why do these things like this what are our problem what are the things getting in our way and and focus and tackle those those areas as opposed to trying to put up more rails and more boundaries and kind of encourage that silo mentality really really look at how do you how do you focus on that enablement and then the last comment would just be on scale everything should be focused on scale what you think is a one-time process today you're gonna do it again we've all been there you're gonna do it a thousand times again so prepare for that prepare forever that you're gonna do everything a thousand times and and start to instill that culture within your organization a great advice guys data bringing machine intelligence an AI to really drive insights and scaling with a cloud operating model no matter where that data live it's really great to have have three such knowledgeable practitioners Caitlyn Toria and Steve thanks so much for coming on the cube and helping support this panel all right and thank you for watching everybody now remember this panel was part of the raw material that went into a crowd chat that we hosted on May 27th Crouch at net slash data ops so go check that out this is Dave Volante for the cube thanks for watching [Music]
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve Lewis | PERSON | 0.99+ |
Caitlyn Toria | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
Linda Barragan | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
two weeks | QUANTITY | 0.99+ |
Victoria Stassi | PERSON | 0.99+ |
Caitlin Alfre | PERSON | 0.99+ |
two hours | QUANTITY | 0.99+ |
Vic | PERSON | 0.99+ |
two days | QUANTITY | 0.99+ |
May 27th | DATE | 0.99+ |
500 files | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Python | TITLE | 0.99+ |
five minutes | QUANTITY | 0.99+ |
30 new purchases | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Caitlin | PERSON | 0.99+ |
Clinton | PERSON | 0.99+ |
first piece | QUANTITY | 0.99+ |
first book | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
second piece | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
Sally | PERSON | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
hundreds of percent | QUANTITY | 0.98+ |
Stephen Victoria | PERSON | 0.98+ |
one | QUANTITY | 0.98+ |
Northwestern Mutual | ORGANIZATION | 0.98+ |
Kaitlin | PERSON | 0.97+ |
four decades | QUANTITY | 0.97+ |
first | QUANTITY | 0.97+ |
two top players | QUANTITY | 0.97+ |
several years ago | DATE | 0.96+ |
about four years | QUANTITY | 0.96+ |
first customers | QUANTITY | 0.95+ |
tons of tools | QUANTITY | 0.95+ |
Kailyn | PERSON | 0.95+ |
both | QUANTITY | 0.95+ |
two | QUANTITY | 0.94+ |
Northwestern | ORGANIZATION | 0.94+ |
Northwestern | LOCATION | 0.93+ |
each | QUANTITY | 0.91+ |
Crouch | PERSON | 0.91+ |
CBO | ORGANIZATION | 0.91+ |
DevOps | TITLE | 0.91+ |
two of | QUANTITY | 0.89+ |
AI | ORGANIZATION | 0.87+ |
things | QUANTITY | 0.87+ |
three such knowledgeable practitioners | QUANTITY | 0.87+ |
Bob Parr & Sreekar Krishna, KPMG US | MIT CDOIQ 2019
>> from Cambridge, Massachusetts. It's the Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back to Cambridge, Massachusetts. Everybody watching the Cuban leader live tech coverage. We here covering the M I t CDO conference M I t CEO Day to wrapping up. Bob Parr is here. He's a partner in principle at KPMG, and he's joined by Streetcar Krishna, who is the managing director of data science. Aye, aye. And innovation at KPMG. Gents, welcome to the Cube. Thank >> thank you. Let's start with your >> roles. So, Bob, where do you focus >> my focus? Ah, within KPMG, we've got three main business lines audit tax, an advisory. And so I'm the advisory chief date officer. So I'm more focused on how we use data competitively in the market. More the offense side of our focus. So, you know, how do we make sure that our teams have the data they need to deliver value? Uh, much as possible working concert with the enterprise? CDO uh, who's more focused on our infrastructure, Our standards, security, privacy and those >> you've focused on making KPMG better A >> supposed exactly clients. OK, >> I also have a second hat, and I also serve financial service is si Dios as well. So Okay, so >> get her out of a dual role. I got sales guys in >> streetcar. What was your role? >> Yeah, You know, I focus a lot on data science, artificial intelligence and overall innovation s o my reaction. I actually represent a centre of >> excellence within KPMG that focuses on the I machine learning natural language processing. And I work with Bob's Division to actually advance the data site off the store because all the eye needs data. And without data, there's no algorithms, So we're focusing a lot on How do we use a I to make data Better think about their equality. Think about data lineage. Think about all of the problems that data has. How can we make it better using algorithms? And I focused a lot on that working with Bob, But no, it's it's customers and internal. I mean, you know, I were a horizontal within the form, So we help customers. We help internal, we focus a lot on the market. >> So, Bob, you mentioned used data offensively. So 10 12 years ago, it was data was a liability. You had to get rid of it. Keep it no longer than you had to, because you're gonna get soon. So email archives came in and obviously thinks flipped after the big data. But so what do you What are you seeing in terms of that shift from From the defense data to the offensive? >> Yeah, and it's it's really you know, when you think about it and let me define sort of offense versus defense. Who on the defense side, historically, that's where most of CEOs have played. That's risk regulatory reporting, privacy, um, even litigation support those types of activities today. Uh, and really, until about a year and 1/2 ago, we really saw most CEOs still really anchored in that I run a forum with a number of studios and financial service is, and every year we get them together and asked him the same set of questions. This was the first year where they said that you know what my primary focus now is. Growth. It's bringing efficiency is trying to generate value on the offensive side. It's not like the regulatory work's going away, certainly in the face of some of the pending privacy regulation. But you know, it's It's a sign that the volume of use cases as the investments in their digital transformations are starting to kick out, as well as the volumes of data that are available. The raw material that's available to them in terms of third party data in terms of the the just the general volumes that that exist that are streaming into the organization and the overall literacy in the business units are creating this, this massive demand. And so they're having to >> respond because of getting a handle on the data they're actually finding. Word is, they're categorizing it there, there, >> yeah, organizing that. That is still still a challenge. Um, I think it's better with when you have a very narrow scope of critical data elements going back to the structure data that we're talking it with the regulatory reporting when you start to get into the three offense, the generating value, getting the customer experience, you know, really exploring. You know that side of it. There's there's a ton of new muscle that has to be built new muscle in terms of data quality, new muscle in terms of um, really more scalable operating model. I think that's a big issue right now with Si Dios is, you know, we've got ah, we're used to that limited swath of CDs and they've got Stewardship Network. That's very labor intensive. A lot of manual processes still, um, and and they have some good basic technology, but it's a lot of its rules based. And when you do you think about those how that constraints going to scale when you have all of this demand. You know, when you look at the customer experience analytics that they want to do when you look at, you know, just a I applied to things like operations. The demand on the focus there is is is gonna start to create a fundamental shift >> this week are one of things that I >> have scene, and maybe it's just my small observation space. But I wonder, if you could comment Is that seems like many CBO's air not directly involved in the aye aye initiatives. Clearly, the chief digital officer is involved, but the CDO zehr kind of, you know, in the background still, you see that? >> That's a fantastic question, and I think this is where we're seeing some off the cutting it change that is happening in the industry. And when Barbara presenter idea that we can often civilly look at data, this is what it is that studios for a long time have become more reactive in their roles. And that is that is starting to come forefront now. So a lot of institutions were working with are asking What's the next generation Roll off a CDO and why are they in the background and why are they not in the foreground? And this is when you become more often they were proactive with data and the digital officers are obviously focused on, you know, the transformation that has to happen. But the studios are their backbone in order to make the transformation. Really. And if the CDO started, think about their data as an asset did as a product did us a service. The judicial officers are right there because those are the real, you know, like the data data they're living so CDO can really become from my back office to really become a business line. We've >> seen taking the reins in machine learning in machine learning projects and cos you work with. Who >> was driving that? Yeah. Great question. So we are seeing, like, you know, different. I would put them in buckets, right? There is no one mortal fits all. We're seeing different generations within the company's. Some off. The ones were just testing out the market. There's two keeping it in their technology space in their back office. Take idea and, you know, in in forward I d let me call them where they are starting to experiment with this. But you see, the mature organizations on the other end of the spectrum, they are integrating action, learning and a I right into the business line because they want to see ex souls having the technology right by their side so they can lead leverage. Aye, aye. And machine learning spot right for the business right there. And that is where we're seeing know some of the new models. Come on. >> I think the big shift from a CDO perspective is using a i to prep data for a That's that's fundamentally where you know, where the data science was distributed. Some of that data science has to come back and free the integration for equality for data prepping because you've got all this data third party and other from customer streaming into the organization. And you know, the work that you're doing around, um, anomaly detection is it transcends developing the rules, doing the profiling, doing the rules. You know, the very manual, the very labor intensive process you've got to get away from that >> is used in order for this to be scale goes and a I to figure out which out goes to apply t >> clean to prepare the data toe, see what algorithms we can use. So it's basically what we're calling a eye for data rather than just data leading into a I. So it's I mean, you know, you developed a technology for one off our clients and pretty large financial service. They were getting closer, like 1,000,000,000 data points every day. And there was no way manually, you could go through the same quality controls and all of those processes. So we automated it through algorithms, and these algorithms are learning the behavior of data as they flow into the organization, and they're able to proactively tell their problems are starting very much. And this is the new face that we see in in the industry, you cannot scale the traditional data governance using manual processes, we have to go to the next generation where a i natural language processing and think about on structure data, right? I mean, that is, like 90% off. The organization is unstructured data, and we have not talked about data quality. We have not talked about data governance. For a lot of these sources of information, now is the time. Hey, I can do it. >> And I think that raised a great question. If you look at unstructured and a lot of the data sources, as you start to take more of an offensive stance will be unstructured. And the data quality, what it means to apply data quality isn't the the profiling and the rules generation the way you would with standard data. So the teams, the skills that CEOs have in their organizations, have to change. You have to start to, and, you know, it's a great example where, you know, you guys were ingesting documents and there was handwriting all over the documents, you know, and >> yeah, you know, you're a great example, Bob. Like you no way would ask the client, like, you know, is this document gonna scanned into the system so my algorithm can run and they're like, Yeah, everything is good. I mean, the deal is there, but when you then start scanning it, you realize there's handwriting and the information is in the handwriting. So all the algorithms breakdown now >> tribal knowledge striving Exactly. >> Exactly. So that's what we're seeing. You know, if I if we talk about the digital transformation in data in the city organization, it is this idea dart. Nothing is left unseen. Some algorithm or some technology, has seen everything that is coming into. The organization has has has a para 500. So you can tell you where the problems are. And this is what algorithms do. This scale beautifully. >> So the data quality approaches are evolving, sort of changing. So rather than heavy, heavy emphasis on masking or duplication and things like that, you would traditionally think of participating the difficult not that that goes away. But it's got to evolve to use machine >> intelligence. Exactly what kind of >> skill sets people need thio achieve that Is it Is it the same people or do we need to retrain them or bring in new skills. >> Yeah, great question. And I can talk from the inspector off. Where is disrupting every industry now that we know, right? But we knew when you look at what skills are >> required, all of the eye, including natural language processing, machine learning, still require human in the loop. And >> that is the training that goes in there. And who do you who are the >> people who have that knowledge? It is the business analyst. It's the data analyst who are the knowledge betters the C suite and the studios. They are able to make decisions. But the day today is still with the data analyst. >> Those s Emmys. Those sm >> means So we have to obscure them to really start >> interacting with these new technologies where they are the leaders, rather than just waiting for answers to come through. And >> when that happens now being as a data scientist, my job is easy because they're Siamese, are there? I deploy the technology. They're semi's trained algorithms on a regular basis. Then it is a fully fungible model which is evolving with the business. And no longer am I spending time re architect ing my rules. And like my, you know, what are the masking capabilities I need to have? It is evolving us. >> Does that change the >> number one problem that you hear from data scientists, which is the 80% of the time >> spent on wrangling cleaning data 10 15 20% run into sm. He's being concerned that they're gonna be replaced by the machine. Their training. >> I actually see them being really enabled now where they're spending 80% of the time doing boring job off, looking at data. Now they're spending 90% of their time looking at the elements future creative in which requires human intelligence to say, Hey, this is different because off X, >> y and Z so let's let's go out. It sounds like a lot of what machine learning is being used for now in your domain is clean things up its plumbing. It's basic foundation work. So go out. Three years after all that work has been done and the data is clean. Where are your clients talking about going next with machine learning? Bob, did you want? >> I mean, it's a whole. It varies by by industry, obviously, but, um but it covers the gamut from, you know, and it's generally tied to what's driving their strategies. So if you look at a financial service is organization as an example today, you're gonna have, you know, really a I driving a lot of the behind the scenes on the customer experience. It's, you know, today with your credit card company. It's behind the scenes doing fraud detection. You know, that's that's going to continue. So it's take the critical functions that were more data. It makes better models that, you know, that that's just going to explode. And I think they're really you can look across all the functions, from finance to to marketing to operations. I mean, it's it's gonna be pervasive across, you know all of that. >> So if I may, I don't top award. While Bob was saying, I think what's gonna what What our clients are asking is, how can I exhilarate the decision making? Because at the end of the day on Lee, all our leaders are focused on making decisions, and all of this data science is leading up to their decision, and today you see like you know what you brought up, like 80% of the time is wasted in cleaning the data. So only 20% time was spent in riel experimentation and analytics. So your decision making time was reduced to 20% off the effort that I put in the pipeline. What if now I can make it 80% of the time? They're I put in the pipeline, better decisions are gonna come on the train. So when I go into a meeting and I'm saying like, Hey, can you show me what happened in this particular region or in this particular part of the country? Previously, it would have been like, Oh, can you come back in two weeks? I will have the data ready, and I will tell you the answer. But in two weeks, the business has ran away and the CDO know or the C Street doesn't require the same answer. But where we're headed as as the data quality improves, you can get to really time questions and decisions. >> So decision, sport, business, intelligence. Well, we're getting better. Isn't interesting to me. Six months to build a cube, we'd still still not good enough. Moving too fast. As the saying goes, data is plentiful. Insights aren't Yes, you know, in your view, well, machine intelligence. Finally, close that gap. Get us closer to real time decision >> making. It will eventually. But there's there's so much that we need to. Our industry needs to understand first, and it really ingrained. And, you know, today there is still a fundamental trust issues with a I you know, it's we've done a lot of work >> watch Black box or a part of >> it. Part of it. I think you know, the research we've done. And some of this is nine countries, 2400 senior executives. And we asked some, ah, a lot of questions around their data and trusted analytics, and 92% of them came back with. They have some fundamental trust issues with their data and their analytics and and they feel like there's reputational risk material reputational risk. This isn't getting one little number wrong on one of the >> reports about some more of an >> issue, you know, we also do a CEO study, and we've done this many years in a row going back to 2017. We started asked them okay, making a lot of companies their data driven right. When it comes to >> what they say they're doing well, They say they're day driven. That's the >> point. At the end of the day, they making strategic decisions where you have an insight that's not intuitive. Do you trust your gut? Go with the analytics back then. You know, 67% said they go with their gut, So okay, this is 2017. This industry's moving quickly. There's tons and tons of investment. Look at it. 2018 go down. No, went up 78%. So it's not aware this issue there is something We're fundamentally wrong and you hit it on. It's a part of its black box, and part of it's the date equality and part of its bias. And there's there's all of these things flowing around it. And so when we dug into that, we said, Well, okay, if that exists, how are we going to help organizations get their arms around this issue and start digging into that that trust issue and really it's the front part is, is exactly what we're talking about in terms of data quality, both structured more traditional approaches and unstructured, using the handwriting example in those types of techniques. But then you get into the models themselves, and it's, you know, the critical thing she had to worry about is, you know, lineage. So from an integrity perspective, where's the data coming from? Whether the sources for the change controls on some of that, they need to look at explain ability, gain at the black box part where you can you tell me the inferences decisions are those documented. And this is important for this me, the human in the loop to get confidence in the algorithm as well as you know, that executive group. So they understand there's a structure set of processes around >> Moneyball. Problem is actually pretty confined. It's pretty straightforward. Dono 32 teams are throwing minor leagues, but the data models pretty consistent through the problem with organizations is I didn't know data model is consistent with the organization you mentioned, Risk Bob. The >> other problem is organizational inertia. If they don't trust it, what is it? What is a P and l manage to do when he or she wants to preserve? Yeah, you know, their exit position. They attacked the data. You know, I don't believe that well, which which is >> a fundamental point, which is culture. Yes. I mean, you can you can have all the data, science and all the governance that you want. But if you don't work culture in parallel with all this, it's it's not gonna stick. And and that's, I think the lot of the leading organisations, they're starting to really dig into this. We hear a lot of it literacy. We hear a lot about, you know, top down support. What does that really mean? It means, you know, senior executives are placing bats around and linking demonstrably linking the data and the role of data days an asset into their strategies and then messaging it out and being specific around the types of investments that are going to reinforce that business strategy. So that's absolutely critical. And then literacy absolutely fundamental is well, because it's not just the executives and the data scientists that have to get this. It's the guy in ops that you're trying to get you. They need to understand, you know, not only tools, but it's less about the tools. But it's the techniques, so it's not. The approach is being used, are more transparent and and that you know they're starting to also understand, you know, the issues of privacy and data usage rights. That's that's also something that we can't leave it the curb. With all this >> innovation, it's also believing that there's an imperative. I mean, there's a lot of for all the talk about digital transformation hear it everywhere. Everybody's trying to get digital, right? But there's still a lot of complacency in the organization in the lines of business in operation to save. We're actually doing really well. You know, we're in financial service is health care really hasn't been disrupted. This is Oh, it's coming, it's coming. But there's still a lot of I'll be retired by then or hanging. Actually, it's >> also it's also the fact that, you know, like in the previous generation, like, you know, if I had to go to a shopping, I would go into a shop and if I wanted by an insurance product, I would call my insurance agent. But today the New world, it's just a top off my screen. I have to go from Amazon, so some other some other app, and this is really this is what is happening to all of our kind. Previously that they start their customers, pocketed them in different experience. Buckets. It's not anymore that's real in front of them. So if you don't get into their digital transformation, a customer is not going to discount you by saying, Oh, you're not Amazon. So I'm not going to expect that you're still on my phone and you're only two types of here, so you have to become really digital >> little surprises that you said you see the next. The next stage is being decision support rather than customer experience, because we hear that for CEOs, customer experience is top of mind right now. >> No natural profile. There are two differences, right? One is external facing is absolutely the customer internal facing. It's absolutely the decision making, because that's how they're separating. The internal were, says the external, and you know most of the meetings that we goto Customer insight is the first place where analytics is starting where data is being cleaned up. Their questions are being asked about. Can I master my customer records? Can I do a good master off my vendor list? That is where they start. But all of that leads to good decision making to support the customers. So it's like that external towards internal view well, back >> to the offense versus defense and the shift. I mean, it absolutely is on the offense side. So it is with the customer, and that's a more directly to the business strategy. So it's get That's the area that's getting the money, the support and people feel like it's they're making an impact with it there. When it's it's down here in some admin area, it's below the water line, and, you know, even though it's important and it flows up here, it doesn't get the VIN visibility. So >> that's great conversation. You coming on? You got to leave it there. Thank you for watching right back with our next guest, Dave Lot. Paul Gillen from M I t CDO I Q Right back. You're watching the Cube
SUMMARY :
Brought to you by We here covering the M I t CDO conference M I t CEO Day to wrapping Let's start with your So, Bob, where do you focus And so I'm the advisory chief date officer. I also have a second hat, and I also serve financial service is si Dios as well. I got sales guys in What was your role? Yeah, You know, I focus a lot on data science, artificial intelligence and I mean, you know, I were a horizontal within the form, So we help customers. seeing in terms of that shift from From the defense data to the offensive? Yeah, and it's it's really you know, when you think about it and let me define sort of offense versus respond because of getting a handle on the data they're actually finding. getting the customer experience, you know, really exploring. if you could comment Is that seems like many CBO's air not directly involved in And this is when you become more often they were proactive with data and the digital officers seen taking the reins in machine learning in machine learning projects and cos you work with. So we are seeing, like, you know, different. And you know, the work that you're doing around, um, anomaly detection is So it's I mean, you know, you developed a technology for one off our clients and pretty and the rules generation the way you would with standard data. I mean, the deal is there, but when you then start scanning it, So you can tell you where the problems are. So the data quality approaches are evolving, Exactly what kind of do we need to retrain them or bring in new skills. And I can talk from the inspector off. machine learning, still require human in the loop. And who do you who are the But the day today is still with the data Those s Emmys. And And like my, you know, what are the masking capabilities I need to have? He's being concerned that they're gonna be replaced by the machine. 80% of the time doing boring job off, looking at data. the data is clean. And I think they're really you and all of this data science is leading up to their decision, and today you see like you know what you brought Insights aren't Yes, you know, fundamental trust issues with a I you know, it's we've done a lot of work I think you know, the research we've done. issue, you know, we also do a CEO study, and we've done this many years That's the in the algorithm as well as you know, that executive group. is I didn't know data model is consistent with the organization you mentioned, Yeah, you know, science and all the governance that you want. the organization in the lines of business in operation to save. also it's also the fact that, you know, like in the previous generation, little surprises that you said you see the next. The internal were, says the external, and you know most of the meetings it's below the water line, and, you know, even though it's important and it flows up here, Thank you for
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Barbara | PERSON | 0.99+ |
KPMG | ORGANIZATION | 0.99+ |
Bob | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
80% | QUANTITY | 0.99+ |
Bob Parr | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
Dave Lot | PERSON | 0.99+ |
2018 | DATE | 0.99+ |
67% | QUANTITY | 0.99+ |
nine countries | QUANTITY | 0.99+ |
92% | QUANTITY | 0.99+ |
Cambridge, Massachusetts | LOCATION | 0.99+ |
2400 senior executives | QUANTITY | 0.99+ |
Six months | QUANTITY | 0.99+ |
three offense | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Paul Gillen | PERSON | 0.99+ |
Lee | PERSON | 0.99+ |
today | DATE | 0.99+ |
78% | QUANTITY | 0.99+ |
Sreekar Krishna | PERSON | 0.99+ |
two types | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
32 teams | QUANTITY | 0.98+ |
second hat | QUANTITY | 0.98+ |
Three years | QUANTITY | 0.98+ |
two differences | QUANTITY | 0.98+ |
10 | DATE | 0.98+ |
both | QUANTITY | 0.97+ |
two | QUANTITY | 0.97+ |
two weeks | QUANTITY | 0.97+ |
this week | DATE | 0.96+ |
one | QUANTITY | 0.95+ |
M I t CDO | EVENT | 0.95+ |
C Street | ORGANIZATION | 0.93+ |
M I t CEO Day | EVENT | 0.93+ |
Streetcar Krishna | PERSON | 0.92+ |
about a year and | DATE | 0.91+ |
2019 | DATE | 0.9+ |
Cuban | OTHER | 0.9+ |
CBO | ORGANIZATION | 0.88+ |
first year | QUANTITY | 0.88+ |
Si Dios | ORGANIZATION | 0.87+ |
12 years ago | DATE | 0.86+ |
10 | QUANTITY | 0.84+ |
Risk | PERSON | 0.81+ |
1,000,000,000 data points | QUANTITY | 0.8+ |
CDO | TITLE | 0.8+ |
Parr | PERSON | 0.79+ |
Cube | ORGANIZATION | 0.79+ |
1/2 ago | DATE | 0.78+ |
CDO | ORGANIZATION | 0.78+ |
tons and | QUANTITY | 0.76+ |
dual | QUANTITY | 0.72+ |
15 | QUANTITY | 0.71+ |
Dono | ORGANIZATION | 0.7+ |
one little number | QUANTITY | 0.69+ |
MIT | ORGANIZATION | 0.67+ |
three | QUANTITY | 0.64+ |
500 | OTHER | 0.63+ |
box | TITLE | 0.61+ |
M I T. | EVENT | 0.6+ |
Cube Bob | ORGANIZATION | 0.59+ |
Reynold Xin, Databricks - #Spark Summit - #theCUBE
>> Narrator: Live from San Francisco, it's theCUBE, covering Spark Summit 2017. Brought to you by Databricks. >> Welcome back we're here at theCube at Spark Summit 2017. I'm David Goad here with George Gilbert, George. >> Good to be here. >> Thanks for hanging with us. Well here's the other man of the hour here. We just talked with Ali, the CEO at Databricks and now we have the Chief Architect and co-founder at Databricks, Reynold Xin. Reynold, how are you? >> I'm good. How are you doing? >> David: Awesome. Enjoying yourself here at the show? >> Absolutely, it's fantastic. It's the largest Summit. It's a lot interesting things, a lot of interesting people with who I meet. >> Well I know you're a really humble guy but I had to ask Ali what should I ask Reynold when he gets up here. Reynold is one of the biggest contributors to Spark. And you've been with us for a long time right? >> Yes, I've been contributing for Spark for about five or six years and that's probably the most number of commits to the project and lately more I'm working with other people to help design the roadmap for both Spark and Databricks with them. >> Well let's get started talking about some of the new developments that you want maybe our audience at theCUBE hasn't heard here in the keynote this morning. What are some of the most exciting new developments? >> So, I think in general if we look at Spark, there are three directions I would say we doubling down. One the first direction is the deep learning. Deep learning is extremely hot and it's very capable but as we alluded to earlier in a blog post, deep learning has reached sort of a mass produced point in which it shows tremendous potential but the tools are very difficult to use. And we are hoping to democratize deep learning and do what Spark did to big data, to deep learning with this new library called deep learning pipelines. What it does, it integrates different deep learning libraries directly in Spark and can actually expose models in sequel. So, even the business analysts are capable of leveraging that. So, that one area, deep learning. The second area is streaming. Streaming, again, I think that a lot of customers have aspirations to actually shorten the latency and increase the throughput in streaming. So, the structured streaming effort is going to be generally available and last month alone on Databricks platform, I think out customers processed three trillion records, last month alone using structured streaming. And we also have a new effort to actually push down the latency all the way to some millisecond range. So, you can really do blazingly fast streaming analytics. And last but not least is the SEQUEL Data Warehousing area, Data warehousing I think that it's a very mature area from the outset of big data point of view, but from a big data one it's still pretty new and there's a lot of use cases that's popping up there. And Spark with approaches like the CBO and also impact here in the database runtime with DBIO, we're actually substantially improving the performance and the capabilities of data warehousing futures. >> We're going to dig in to some of those technologies here in just a second with George. But have you heard anything here so far from anyone that's changed your mind maybe about what to focus on next? So, one thing I've heard from a few customers is actually visibility and debugability of the big data jobs. So many of them are fairly technical engineers and some of them are less sophisticated engineers and they have written jobs and sometimes the job runs slow. And so the performance engineer in me would think so how do I make the job run fast? The different way to actually solve that problem is how can we expose the right information so the customer can actually understand and figure it out themselves. This is why my job is slow and this how I can tweak it to make it faster. Rather than giving people the fish, you actually give them the tools to fish. >> If you can call that bugability. >> Reynold: Yeah, Debugability. >> Debugability. >> Reynold: And visibility, yeah. >> Alright, awesome, George. >> So, let's go back and unpack some of those kind of juicy areas that you identified, on deep learning you were able to distribute, if I understand things right, the predictions. You could put models out on a cluster but the really hard part, the compute intensive stuff, was training across a cluster. And so Deep Learning, 4J and I think Intel's BigDL, they were written for Spark to do that. But with all the excitement over some of the new frameworks, are they now at the point where they are as good citizens on Spark as they are on their native environments? >> Yeah so, this is a very interesting question, obviously a lot of other frameworks are becoming more and more popular, such as TensorFlow, MXNet, Theano, Keras and Office. What the Deep Learning Pipeline library does, is actually exposes all these single note Deep Learning tools as highly optimized for say even GPUs or CPUs, to be available as a estimator or like a module in a pipeline of the machine learning pipeline library in spark. So, now users can actually leverage Spark's capability to, for example, do hyper parameter churning. So, when you're building a machine learning model, it's fairly rare that you just run something once and you're good with it. Usually have to fiddle with a lot of the parameters. For example, you might run over a hundred experiments to actually figure out what is the best model I can get. This is where actually Spark really shines. When you combine Spark with some deep learning library be it BigDL or be it MXNet, be it TensorFlow, you could be using Spark to distribute that training and then do cross validation on it. So you can actually find the best model very quickly. And Spark takes care of all the job scheduling, all the tolerance properties and how do you read data in from different data sources. >> And without my dropping too much in the weeds, there was a version of that where Spark wouldn't take care of all the communications. It would maybe distribute the models and then do some of the averaging of what was done out on the cluster. Are you saying that all that now can be managed by Spark? >> In that library, Spark will be able to actually take care of picking the best model out of it. And there are different ways you an design how do you define the best. The best could be some average of some different models. The best could be just pick one out of this. The best could be maybe there's a tree of models that you classify it on. >> George: And that's a hyper parameter configuration choice? >> So that is actually building functionality in Sparks machine learning pipeline. And now what we're doing is now you can actually plug all those deep learning libraries directly into that as part of the pipeline to be used. Another maybe just to add, >> Yeah, yeah, >> Another really cool functionality of the deep learning pipeline is transfer learning. So as you said, deep learning takes a very long time, it's very computationally demanding. And it takes a lot of resources, expertise to train. But with transfer learning what we allow the customers to do is they can take an existing deep learning model as well train in a different domain and they we'd retrain it on a very small amount of data very quickly and they can adapt it to a different domain. That's how sort of the demo on the James Bond car. So there is a general image classifier that we train it on probably just a few thousand images. And now we can actually detect whether a car is James Bond's car or not. >> Oh, and the implications there are huge, which is you don't have to have huge training data sets for modifying a model of a similar situation. I want to, in the time we have, there's always been this debate about whether Sparks should manage state, whether it's database, key value store. Tell us how the thinking about that has evolved and then how the integration interfaces for achieving that have evolved. >> One of the, I would say, advantages of Spark is that it's unbiased and works with a variety of storage systems, be it Cassandra, be it Edgebase, be it HDFS, be is S3. There is a metadata management functionality in Spark which is the catalog of tables that customers can define. But the actual storage sits somewhere else. And I don't think that will change in the near future because we do see that the storage systems have matured significantly in the last few years and I just wrote blog post last week about the advantage of S3 over HDFS for example. The storage price is being driven down by almost a factor of 10X when you go to the cloud. I just don't think it makes sense at this point to be building storage systems for analytics. That said, I think there's a lot of building on top of existing storage system. There's actually a lot of opportunities for optimization on how you can leverage the specific properties of the underlying storage system to get to maximum performance. For example, how are you doing intelligent caching, how do you start thinking about building indexes actually against the data that's stored for scanned workloads. >> With Tungsten's, you take advantage of the latest hardware and where we get more memory intensive systems and now that the Catalyst Optimizer has a cost based optimizer or will be, and large memory. Can you change how you go about knowing what data you're managing in the underlying system and therefore, achieve a tremendous acceleration in performance? >> This is actually one area we invested in the DBIO module as part of Databricks Runtime, and what DBIO does, a lot of this are still in progress, but for example, we're adding some form of indexing capability to add to the system so we can quickly skip and prune out all the irrelevant data when the user is doing simple point look-ups. Or if the user is doing a scan heavy workload with some predicates. That actually has to do with how we think about the underlying data structure. The storage system is still the same storage system, like S3, but were adding actually indexing functionalities on top of it as part of DBIO. >> And so what would be the application profiles? Is it just for the analytic queries or can you do the point look-ups and updates in that sort of scenario too? >> So it's interesting you're talking about updates. Updates is another thing that we've got a lot of future requests on. We're actively thinking about how we will support update workload. Now, that said, I just want to emphasize for both use case of doing point look-ups and updates, we're still talking about in the context of analytic environment. So we would be talking about for example maybe bulk updates or low throughput updates rather than doing transactional updates in which every time you swipe a credit card, some record gets updated. That's probably more belongs on the transactional databases like Oracle or my SEQUEL even. >> What about when you think about people who are going to run, they started out with Spark on prem, they realize they're going to put much more of their resources in the cloud, but with IIOT, industrial IOT type applications they're going to have Spark maybe in a gateway server on the edge? What do you think that configuration looks like? >> Really interesting, it's kind of two questions maybe. The first is the hybrid on prem, cloud solution. Again, so one of the nice advantage of Spark is the couple of storage and compute. So when you want to move for example, workloads from one prem to the cloud, the one you care the most about is probably actually the data 'cause the compute, it doesn't really matter that much where you run it but data's the one that's hard to move. We do have customers that's leveraging Databricks in the cloud but actually reading data directly from on prem the reliance of the caching solution we have that minimize the data transfer over time. And is one route I would say it's pretty popular. Another on is, with Amazon you can literally give them just a show ball of functionality. You give them hard drive with trucks, the trucks will ship your data directly put in a three. With IOT, a common pattern we see is a lot of the edge devices, would be actually pushing the data directly into some some fire hose like Kinesis or Kafka or, I'm sure Google and Microsoft both have their own variance of that. And then you use Spark to directly subscribe to those topics and process them in real time with structured streaming. >> And so would Spark be down, let's say at the site level. if it's not on the device itself? >> It's a interesting thought and maybe one thing we should actually consider more in the future is how do we push Spark to the edges. Right now it's more of a centralized model in which the devices push data into Spark which is centralized somewhere. I've seen for example, I don't remember exact the use case but it has to do with some scientific experiment in the North Pole. And of course there you don't have a great uplink of all the data connecting transferring back to some national lab and rather they would do a smart parsing there and then ship the aggregated result back. There's another one but it's less common. >> Alright well just one minute now before the break so I'm going to give you a chance to address the Spark community. What's the next big technical challenge you hope people will work on for the benefit of everybody? >> In general Spark came along with two focuses. One is performance, the other one's ease of use. And I still think big data tools are too difficult to use. Deep learning tools, even harder. The barrier to entry is very high for office tools. I would say, we might have already addressed performance to a degree that I think it's actually pretty usable. The systems are fast enough. Now, we should work on actually make (mumbles) even easier to use. It's what also we focus a lot on at Databricks here. >> David: Democratizing access right? >> Absolutely. >> Alright well Reynold, I wish we could talk to you all day. This is great. We are out of time now. Want to appreciate you coming by theCUBE and sharing your insights and good luck with the rest of the show. >> Thank you very much David and George. >> Thank you all for watching here were at theCUBE at Sparks Summit 2017. Stay tuned, lots of other great guests coming up today. We'll see you in a few minutes.
SUMMARY :
Brought to you by Databricks. I'm David Goad here with George Gilbert, George. Well here's the other man of the hour here. How are you doing? David: Awesome. It's the largest Summit. Reynold is one of the biggest contributors to Spark. and that's probably the most number of the new developments that you want So, the structured streaming effort is going to be And so the performance engineer in me would think kind of juicy areas that you identified, all the tolerance properties and how do you read data of the averaging of what was done out on the cluster. And there are different ways you an design as part of the pipeline to be used. of the deep learning pipeline is transfer learning. Oh, and the implications there are huge, of the underlying storage system and now that the Catalyst Optimizer The storage system is still the same storage system, That's probably more belongs on the transactional databases the one you care the most about if it's not on the device itself? And of course there you don't have a great uplink so I'm going to give you a chance One is performance, the other one's ease of use. Want to appreciate you coming by theCUBE Thank you all for watching here were at theCUBE
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Reynold | PERSON | 0.99+ |
Ali | PERSON | 0.99+ |
David | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
David Goad | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
North Pole | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Reynold Xin | PERSON | 0.99+ |
last month | DATE | 0.99+ |
10X | QUANTITY | 0.99+ |
two questions | QUANTITY | 0.99+ |
three trillion records | QUANTITY | 0.99+ |
second area | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
last week | DATE | 0.99+ |
Spark | TITLE | 0.99+ |
Spark Summit 2017 | EVENT | 0.99+ |
first direction | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
James Bond | PERSON | 0.98+ |
Spark | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Tungsten | ORGANIZATION | 0.98+ |
two focuses | QUANTITY | 0.97+ |
three directions | QUANTITY | 0.97+ |
one minute | QUANTITY | 0.97+ |
one area | QUANTITY | 0.96+ |
three | QUANTITY | 0.96+ |
about five | QUANTITY | 0.96+ |
DBIO | ORGANIZATION | 0.96+ |
six years | QUANTITY | 0.95+ |
one thing | QUANTITY | 0.94+ |
over a hundred experiments | QUANTITY | 0.94+ |
Oracle | ORGANIZATION | 0.92+ |
Theano | TITLE | 0.92+ |
single note | QUANTITY | 0.91+ |
Intel | ORGANIZATION | 0.91+ |
one route | QUANTITY | 0.89+ |
theCUBE | ORGANIZATION | 0.88+ |
Office | TITLE | 0.87+ |
TensorFlow | TITLE | 0.87+ |
S3 | TITLE | 0.87+ |
MXNet | TITLE | 0.85+ |
Alex "Sandy" Pentland - MIT CDOIQ Symposium 2015 - theCUBE - #MITIQ
[Music] live from cambridge massachusetts extracting the signal from the noise it's the cube covering the MIT chief data officer and information quality symposium now your host dave Volante and paul Gillett hi buddy welcome back to Cambridge Massachusetts we're at MIT Paul Gillan and myself are here for two days and we're really pleased to have sandy Pentland on he's the director of MIT Media labs entrepreneurship program just coming off a keynote mr. Alex sandy Pentland Spellman thanks for coming with you how'd you get that name sandy was that the color you know my dad was named Alex too so I had to get the diminutive so Alexander turns into Zander or Sasha or sandy ah excellent so man it's stuck so we learned from your keynote today that like your mom said hey if every other kid jumps off the bridge do you and the answer should be yes why is that well if your other friends or presumably as rational as you and have same sort of values as you and if they're doing something that looks crazy they must have a piece of information you don't like maybe Godzilla is coming bridges come and it really is time to get off but and so so while it's used as a metaphor for doing the irrational things it's actually shows that using your social context can be most rational because it's a way of getting information that you don't otherwise have so you broke down your talk to chief data officers and new types of analysis smarter organizations smarter networks and then really interesting new new architecture if we could sort of break those down sure you talked about sort of networks not individual nodes as really should be the focus to understand behavior can you unpack that a little well it's a little bit like the bridge or metaphor you know a lot of what we learn a lot of our behavior comes from watching other people we're not even conscious of it but you know if everybody else starts you know wearing a certain sort of shoe or or you know acting in a certain or using a phrase in business like all these new sort of buzz phrases like oh you have to - because it's to fit in it means something it's it's part of being hyper formants and being part of your group but that's not in data analytics today today what they look at is just your personal properties not what you're exposed to and the group that you're part of so they would look at the guy on the bridge and they say he's not going to jump because he doesn't have that information but on the other hand if all of other people who like him are making a different decision he probably is going to jump and your research has been you dig into organizations and you've found the relationship between productivity and this type of analysis has been pretty substantial very substantive offenses a ssin and outside of the organization dealing with customers so people focus on things like personality history various sort of training things like that what we find is compared to the pattern of interaction with other people so who do you talk to when and what situations those other factors are tiny they're often a whole order of magnitude less important than just do you talk to all the people in your group do you talk outside of your group do if you violate the org chart and talk to other people if you do you're almost certainly one of the high productivity high innovation people so what what impact does this have or were the implications of this on organizations which historically have been have been highly Madonn hierarchies reporting structures all of these institutions that we evolved in the post-world War two ERA is this working against their productivity well what they did is is they set some simple rules in that they could deal with and wrap their head around but what we find is that those simple rules are exactly the opposite of what you need for innovation and because really what they're doing is they're enforcing silos they're enforcing atomization of the work and everybody talks about we need to be more fluid we need to be more innovative we need to be able to move faster and what that requires is better communication habits and so what we find when we measure the communication habits is that that's exactly right better communication habits lead to more innovative organizations what's really amazing is almost no organization does it so people don't know does everybody talk to everybody in this group do they talk outside of the group there's no graphic there's no visualization and when you give a group a visualization of their pattern of organization of communication they change it and they become more innovative they become more productive I'm sure you're familiar with holacracy this idea that of doing away with with organizational boundaries and sort of do titles and sure everybody talks to everyone is that in your view a better way to structure an organization think that's too extreme but it's headed in the right direction I mean so what we're talking first of all people try to do this without any data so you know everybody's the same well everybody really isn't the same and how would you know if you're behaving as well as the same as other people or I mean there's no data so so what I'm suggesting is something that's sort of halfway between the two yeah you can have leaders you can have organization in there but you also have to have good flow of ideas and what that means is you have to make talking outside your org chart a value it's something you're rewarded for it means that including everybody in the loop in your organization is something you ought to be rewarded for and of course that requires data so the sorts of things we do with peoples we make displays could just be piece of paper that shows the patterns of communication and we give it to everybody and you know what people actually know what to do with it when you give it to them they say well gee you know this group of people is all talking to each other but they're not talking to that group maybe they ought to talk to each other it's that simple but in the lack of data you can't feel so you instrumented people essentially with let's badges and you could measure conversations at the watercooler yeah they're their frequency their duration not the content not the content just that's the activity just is it happening right and is it happening between groups just just people from this group go to that other groups water cooler stuff like that and that actually is enough to really make a substantial difference in the corporation and you gave an example of you were able to predict trending stories on Twitter better than the internal mechanism and Twitter did I understand that Kerina so what we've done by studying organizations like this and coming up with these sort of rules of how people behave so the notion that people learn from each other and that it's the patterns of communication that matter you can encode that along with machine learning and suddenly you get something that looks like machine learning but in many ways it's more powerful and more reliable and so we have a spin-out called Endor and what that does is it lets your average guy who can use a spreadsheet do something that's really competitive with the best machine learning groups in the world and that's pretty exciting because everybody has these reams of data but what they don't have is a whole bunch of PhDs who can study it for six months and and come up with a machine learning algorithm to do it they have a bunch of guys that are smart know the business but they don't know the machine learning so it endured doesn't supply something like a spreadsheet to be able to allow the normal guy to do as good as the machine learning guys there's a lot of focus right now on anticipating predicting customer behavior better a lot of us been focused on on individuals understanding individuals better is that wrongheaded I mean should marketers be looking more at this group theory and treating customers more as buckets of similar behaviors it's not it's not buckets but treating people as individuals is is a mistake because while people do have individual preferences most of those preferences are learned from other people it's keeping up with the Jones it's fitting in its it's learning what the best practice is so you can predict people better from the company they keep than you can from their demographics always virtually every single time you can do better from the company they keep than from the standard sort of data so what that means is when you do analysis you need to look at the relationships between people and at one level it's sort of obvious you analyze somebody personally without knowing something about their relationships right about you know the type of things they do the places they go those are important but they're usually not in the data and what I find is I do this with a lot of big organizations and what I find is you look at their data analytics it's all based on individuals and it's not based on the context to those individuals absolutely I want to ask you further about that because when I think of the surveys that I fill out they're always about my personal preference Yahoo I want to do I can't remember ever filling out a survey that asked me about what my peer group does are you saying that those are the questions we should be asking yeah exactly right and of course you want to get data about that you want to know if if you go to these locations all the time to go to that restaurant you go to this sort of entertainment who else goes there what are they by what's trending in your group because it's not the general population and these not necessary people I know but they're people I identify with Yammer haps that's why I go to certain restaurants not because my friends go there but because people who I aspire to be like yeah there yeah and and the other way around you go there and you say well gosh these other people are like me because they go here too and I see that they're you know wearing different sort of clothes or they're by or the simplest thing you go to restaurants you see other people all buying the mushi yes maybe I should try the mushi I usually don't like it but seems to work well and this is I like this restaurant and everybody else who comes here likes it so I'll try it right it's that simple so it's important to point out we're talking about the predictive analytics Capas they're probably people watching might say this Sandi's crazy we mean we don't want it personalized we want to personalize the customer experience still I'm presuming sure but when we're talking about predictive analytics you're saying the the community the peer group is a much better predictor than the individual that's right yeah okay so I want to come back to the the org chart these are you saying that org charts shouldn't necessarily change but the incentives should or your previous thing to do is you have an org chart but the incentives that are across the entire organization is good communication within the box you're in and good communication outside of the box and to put those incentives in place you need to have data you need to be able to have some way of estimating does everybody talk to each other do they talk to the rest of the organization and there's a variety of ways you can do that we do it with little badges we do it by analyzing phone call data email is not so good because email is not really a social relationship it's just this this little formal thing you do often but by using things like the badges like the phone calls surveys for that matter right you can give people feedback about are they communicating in the right way are they communicating with other parts of the organization and by visualizing that to people they'll begin to do the right thing you had this notion of network tuning oh you don't want an insufficiently diverse network but you don't want a network that's too dense you might find the sweet spot in the middle desert how do you actually implement that that tuning well the first thing is is you have to measure okay you have to know how dense is the social interaction the communication pattern because if you don't know that there's nothing to - right and then what you want to ask is you want to ask the signal property of something being two dances the same ideas go around then around and around so you look at the graph that you get from this data and you ask you know this Joe talked to Bob talk to Mary talk to Joe talk to you know is it full of cycles like that and if it's too full of cycles then that's a problem right because it's the same people talking to each other same ideas going around and there's some nice mathematical formulas for major in it they're sort of hard to put into English but it has to do with if you look at the flow of ideas are you getting a sufficiently diverse set of ideas coming to you or is it just the same people all talking to each other so are you sort of cut off from the rest of the world in your book social physics you talk about rewards and incentives isms and one of the things that struck me as you say that that rewards that people are actually more motivated by rewards for others than for themselves correct me if I'm wrong if paraphrasing you wrong there but but there's but but rewarding the group or or doing something good for somebody else is actually a powerful incentive is it is that the true the case well you said it almost right so so if you want to change behavior these social incentives are more powerful than financial incentives so if you have everybody in a group let's say and people are rewarded by the behavior of the other people in the group what will they do well they'll talk to the other people about doing the right thing because their reward my reward depends on your behavior so I'm gonna talk to you about it okay and your reward depends on it you'll talk and I don't know so what we're doing is we're creating much more communication around this problem and social pressure because you know if you don't do it you're screwing me and and you know I may not be a big thing but you're gonna think twice about that whereas some small financial award usually it's not such a big thing for people so if you think people talk a lot about you know persona persona marketing when I first met John Fourier he had this idea of affinity rank which was his version of you know peer group PageRank hmm do you do you hear a lot about you know get a lot of questions about persona persona marketing and and what does your research show in terms of how we should be appealing to that persona so sorry good questions about that some time and I don't know what he really originally intended but the way people often imply it is very static you have a particular persona that's fixed for all parts of your life well that's not true I mean you could be a baseball coach for your kid and a banker during the day and a member of a church and those are three different personas and what defines those personas it's the group that you're interacting with it's it's the the people you learn with and try and fit in so your persona is a variable thing and the thing that's the key to it is what are the groups that you're you're interacting with so if I analyzed your groups of interactions I'd see three different clusters I'd see the baseball one I'd say the banking one I'd see the church group one and then I would know that you have three personas and I could tell which one you're in typically by seeing who you're spending time with right now is the risk of applying this idea of behaviors influenced by groups is there the risk of falling over into profiling and essentially treating people anticipating behaviors based upon characteristics that may not be indicative of how any individual might act back credit alcoholics as you example right I don't get a job because people like people who are similar to me tend to be alcoholics let's say this is different though so this is not people who are similar to you if you hang out with alcoholics all the time then they're really eyes are good on that you're an alcoholic it may not be yes and there is a risk of over identifying or or extrapolating but it's different than people like me I mean if you go to the you know the dingy bars were beers or a buck and everybody gets wasted and you do that repeatedly you're talking about behaviors rather than characteristics behaviors rather than characteristics right I mean you know if you drink a lot maybe you drink a lot so we have a question from the crowd so it says real time makes persona very difficult yeah so it was come back to furriers premise was I was Twitter data you know such is changing very rapidly so are there social platforms that you see that can inform in real time to help us sort of get a better understanding of persona and affinity group affinity well there are data sources that do that right so first as if I look at telephone data or credit card data even for that matter sure this geo-located I can ask but what sort of people buy here or what sort of people are in this bar or restaurant and I can look at their demographics and where they go to I showed an example of that in San Francisco using data from San Francisco so there is this data which means that any app that's interested in it that has sufficient breadth and although sufficient adoption can do these sorts of analyses can you give an example of how you're working with the many organizations now I'm sure you can't name them but can you give an example of how you're applying these principles practically now whether it's in law enforcement or in consumer marketing how are you putting these to work well there's a bunch of different things that that go together with this view of you know it's the flow of ideas that's the important thing not the demographics so talk about behavior change and we're working with a small country to change their traffic safety by enrolling people in small groups where you know the benefit I get for driving right depends on your safety and we're good buddies we know that that's how you sign up sign up with your buddies and what that means is I'm going to talk to you about your driving if you're driving in a dangerous way and that we've seen in small experiments is a lot more effective than giving you points on your driver's license or discount on your insurance the social relationships so so that's an example another example is we're beginning a project to look at unemployment and what we see is is that people have a hard time getting re-employed don't have diverse enough social networks and it sounds kind of common sense but they don't physically get out enough compared to the people that do get jobs so what's the obvious thing well you encouraged them to get out more you make it easier for them to get out more so those are some examples when you talk about health care what you can do is you can say well look you know I don't know particular things you're doing but based on the behavior that you show right and the behavior of the people you hang with you may be at much higher risk of diabetes and it's not any particular behavior this is the way medical stuff is always pitched is you know it's this behavior that beer every combination of things all right and so you're not really aware that you're doing anything bad but if all your buds are at risk of it then you probably are too because you're probably doing a lot of the same sort of behaviors and medicine is a place where people are willing to give up some of the privacy because the consequences are so important so we're looking at people who are interested in personalized medicine and are willing to you know share their data about where they go and what they spend time doing in order to get statistics back from the people they spend time with about what are the risk factors they pick up from the people around them and the behaviors they engage in um your message this to the cdos today was you know you were sort of joking you're measuring that right and a lot of times they weren't a lot of the non-intuitive things your research has found so I wanna talk about the data and access to the data and how the CBO can you know affect change in their organization a lot of the data lives in silos I mean if they certainly think of social data Facebook LinkedIn yeah Twitter you mentioned credit card data is that a problem or is data becoming more accessible through api's or is it still just sort of a battle to get that data architecture running well it's a it's a battle and in fact actually it's a political and very passionate battle and it revolves around who controls the data and privacy is a big part of that so one of the messages is that to be able to get really ditch data sources you have to engage with the customer a lot so people are more than willing our research we've set up you know entire cities where we've changed the rules and we've found that people are more than willing to volunteer very detailed personal data under two conditions one is they have to know that it's safe so you're not reselling it you're handling it in a secure way it's not going to get out in some way and the other is that they get value for it and they can see the value so it's not spreading out and they're part of the discussion so you know you want more personalized medicine people are willing to share right because it's important to them or for their family you know if you want to share we're willing to share very personal stuff about their kids they would never do that but if it results in the kid getting a better education more opportunity yeah they're absolutely willing so that leads to a great segue into enigma yeah you talked about enigma as a potential security layer for the internet but also potential privacy yeah solution so talk about enigma where it's at yeah what it is where it's at and how it potentially could permeate yeah so we've been building architectures and working with this sort of problem this conundrum basically datas and silos people feel paranoid and probably correctly about their data leaky now companies don't have access to data don't know what to do with it and a lot of it has to do with safe sharing another aspect of this problem is cybersecurity you're getting increasing the amount of attacks done stuff bad for companies bad for people it's just going to get worse and we actually know what the answers to these things are the answers our data is encrypted all the time everywhere you do the computation on encrypted data you never transmit it you never unencrypted it to be able to do things we also know that in terms of control of the data is possible to build fairly simple permission mechanisms so that you know the computer just won't share it in the wrong places and if it does you know skyrockets go up and the cop scum you can build systems like that today but the part that's never been able never allowed that to happen is you need to keep track of a lot of things in a way that's not hackable you need to know that somebody doesn't just short-circuit it or take it out the back and what's interesting is the mechanisms that are in Bitcoin give you exactly that power so you whatever you feel about Bitcoin you know it's speculative bubble or whatever the blockchain which is part of it is this open ledger that is unhackable and and has the following characteristics that's amazing it's called trustless what that means is you can work with a bunch of crooks and still know that the ledger that you're keeping is correct because it doesn't require trusting people to work with them it's something where everybody has to agree to be able to get things and it works it works in Bitcoin at scale over the whole world and so what we've done is adapted that technology to be able to build a system called enigma which takes data in an encrypted form computes on it in an encrypted form transmits it according to the person's permissions and only that way in an encrypted form and you know it provides this layer of security and privacy that we've never had before there have been some projects that come close to this but know we're pretty excited about this and and what I think you're going to see is you're going to see some of the big financial institutions trying to use it among themselves some of the big logistics some of the big medical things trying to use it in in hotspots where they have real problems but the hope is is that it gets spread among the general population so it becomes quite literally the privacy and security level that doesn't have Warren Buffett might be right that it might fail as a currency but the technology has really inspired some new innovations that's right so so it's essentially a distributed it's not a walled garden it's a distributed black box that's what you're describing you never exposed the data that's right you don't need a trusted third party that's getting attacked that's right nobody has to stamp that this is correct because the moment you do that first of all other people are controlling you and the second thing is is there a point of attack so it gets rid of that trusted third party centralization makes it distributed you can have again a bunch of bad actors in the system it doesn't hurt it's peer-to-peer where you have to have 51% of the people being bad before things really go bad how do you solve the problem of performing calculations on encrypted data because they're classic techniques actually it's been known for over 20 years how to do that but there are two pieces missing one piece is it wasn't efficient it scaled really poorly and what we did is came up with a way of solving that by making it essentially multi scale so it's it's a distributed solution for this that brings the cost down to something that's linear in the number of elements which is a real change and the second is keeping track of all of the stuff in a way that's secure it's fine to have an addition that's secure you know but if that isn't better than a whole system that secure it doesn't do you any good and so that's where the blockchain comes in it gives you this accounting mechanism for knowing which computations are being done who has access to them what the keys are things like that so Google glass was sort of incubated in MIT Media labs and well before yeah my group you go right in your group and yeah it didn't take off me because it's just not cool it looks kind of goofy but now enigma has a lot of potential solving a huge problem are you can open-source it what do you yeah it's an open-source system we hope to get more people involved in it and right now we're looking for some test beds to show how well it works and make sure that all the things are dotted and crossed and so forth and where can people learn more about it oh go to a nygma dot media dot mit.edu all right sandy we're way over our time so obviously you were interesting so thanks keep right there buddy Paul and I we right back with our next guest we're live from see this is the cube right back [Music]
SUMMARY :
the two yeah you can have leaders you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joe | PERSON | 0.99+ |
51% | QUANTITY | 0.99+ |
Paul | PERSON | 0.99+ |
Mary | PERSON | 0.99+ |
Bob | PERSON | 0.99+ |
Warren Buffett | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
two pieces | QUANTITY | 0.99+ |
John Fourier | PERSON | 0.99+ |
six months | QUANTITY | 0.99+ |
paul Gillett | PERSON | 0.99+ |
two days | QUANTITY | 0.99+ |
sandy | PERSON | 0.99+ |
MIT | ORGANIZATION | 0.99+ |
sandy Pentland | PERSON | 0.99+ |
Alex | PERSON | 0.99+ |
one piece | QUANTITY | 0.99+ |
Sasha | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Zander | PERSON | 0.99+ |
Alexander | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Paul Gillan | PERSON | 0.99+ |
over 20 years | QUANTITY | 0.99+ |
three personas | QUANTITY | 0.98+ |
twice | QUANTITY | 0.98+ |
Yahoo | ORGANIZATION | 0.98+ |
second thing | QUANTITY | 0.98+ |
three different personas | QUANTITY | 0.98+ |
dave Volante | PERSON | 0.98+ |
two conditions | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
MIT Media | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
Cambridge Massachusetts | LOCATION | 0.98+ |
today | DATE | 0.98+ |
Alex "Sandy" Pentland | PERSON | 0.97+ |
two | QUANTITY | 0.96+ |
second | QUANTITY | 0.95+ |
two dances | QUANTITY | 0.95+ |
English | OTHER | 0.95+ |
first | QUANTITY | 0.95+ |
first thing | QUANTITY | 0.94+ |
enigma | TITLE | 0.93+ |
Google glass | COMMERCIAL_ITEM | 0.93+ |
one level | QUANTITY | 0.86+ |
nygma dot media dot mit.edu | OTHER | 0.84+ |
Sandi | PERSON | 0.84+ |
War two ERA | EVENT | 0.79+ |
single time | QUANTITY | 0.78+ |
Alex sandy Pentland | PERSON | 0.77+ |
#MITIQ | ORGANIZATION | 0.76+ |
CBO | ORGANIZATION | 0.76+ |
a lot of things | QUANTITY | 0.75+ |
three different clusters | QUANTITY | 0.74+ |
Bitcoin | OTHER | 0.74+ |
cambridge massachusetts | LOCATION | 0.7+ |
Spellman | PERSON | 0.65+ |
MIT CDOIQ Symposium 2015 | EVENT | 0.65+ |
things | QUANTITY | 0.65+ |
a lot of | QUANTITY | 0.62+ |
lot | QUANTITY | 0.55+ |
every | QUANTITY | 0.53+ |
world | EVENT | 0.49+ |
Jones | PERSON | 0.46+ |
Endor | TITLE | 0.43+ |
Kerina | ORGANIZATION | 0.39+ |
theCUBE | ORGANIZATION | 0.34+ |