Image Title

Search Results for Zeppelin:

Breaking Analysis: A Digital Skills Gap Signals Rebound in IT Services Spend


 

from the cube studios in palo alto in boston bringing you data driven insights from the cube and etr this is breaking analysis with dave vellante recent survey data from etr shows that enterprise tech spending is tracking with projected u.s gdp growth at six to seven percent this year many markers continue to point the way to a strong recovery including hiring trends and the loosening of frozen it project budgets however skills shortages are blocking progress at some companies which bodes well for an increased reliance on external i.t services moreover while there's much to talk about well there's much talk about the rotation out of work from home plays and stocks such as video conferencing vdi and other remote worker tech we see organizations still trying to figure out the ideal balance between funding headquarter investments that have been neglected and getting hybrid work right in particular the talent gap combined with a digital mandate means companies face some tough decisions as to how to fund the future while serving existing customers and transforming culturally hello everyone and welcome to this week's wikibon cube insights powered by etr in this breaking analysis we welcome back eric porter bradley of etr who will share fresh data perspectives and insights from the latest survey data eric great to see you welcome thank you very much dave always good to see you and happy to be on the show again okay we're going to share some macro data and then we're going to dig into some highlights from etr's most recent march covid survey and also the latest april data so eric the first chart that we want to show it shows cio and it buyer responses to expected i.t spend for each quarter of 2021 versus 2020. and you can see here a steady quarterly improvement eric what are the key takeaways from your perspective sure well first of all for everyone out there this particular survey had a record-setting number of uh participation we had uh 1 500 i.t decision makers participate and we had over half of the fortune 500 and over a fifth of the global 1000. so it was a really good survey this is the seventh iteration of the covet impact survey specifically and this is going to transition to an over large macro survey going forward so we could continue it and you're 100 right what we've been tracking here since uh march of last year was how is spending being impacted because of covid where is it shifting and what we're seeing now finally is that there is a real re-acceleration in spend i know we've been a little bit more cautious than some of the other peers out there that just early on slapped an eight or a nine percent number but what we're seeing is right now it's at a midpoint of over six uh about six point seven percent and that is accelerating so uh we are still hopeful that that will continue uh really that spending is going to be in the second half of the year as you can see on the left part of this chart that we're looking at uh it was about 1.7 versus 3 for q1 spending year over year so that is starting to accelerate through the back half you know i think it's prudent to be be cautious relative because normally you'd say okay tech is going to grow a couple of points higher than gdp but it's it's really so hard to predict this year okay the next chart is here that we want to show you is we ask respondents to indicate what strategies they're employing in the short term as a result of coronavirus and you can see a few things that i'll call out and then i'll ask eric to chime in first there's been no meaningful change of course no surprise in tactics like remote work and halting travel however we're seeing very positive trends in other areas trending downward like hiring freezes and freezing i.t deployments downward trend in layoffs and we also see an increase in the acceleration of new i.t deployments and in hiring eric what are your key takeaways well first of all i think it's important to point out here that uh we're also capturing that people believe remote work productivity is still increasing now the trajectory might be coming down a little bit but that is really key i think to the backdrop of what's happening here so people have a perception that productivity of remote work is better than hybrid work and that's from the i.t decision makers themselves um but what we're seeing here is that uh most importantly these organizations are citing plans to increase hiring and that's something that i think is really important to point out it's showing a real thawing and to your point in right in the beginning of the intro uh we are seeing deployments stabilize versus prior survey levels which means early on they had no plans to launch new tech deployments then they said nope we're going to start and now that's stalling and i think it's exactly right what you said is there's an i.t skills shortage so people want to continue to do i.t deployments because they have to support work from home and a hybrid back return to the office but they just don't have the skills to do so and i think that's really probably the most important takeaway from this chart um is that stalling and to really ask why it's stalling yeah so we're going to get into that for sure and and i think that's a really key point is that that that accelerating it deployments is some it looks like it's hit a wall in the survey and so but before before we get deep into the skills let's let's take a look at this next chart and we're asking people here how a return to the new normal if you will and back to offices is going to change spending with on-prem architectures and applications and so the first two bars they're cloud-friendly if you add them up at 63 percent of the respondents say that either they'll stay in the cloud for the most part or they're going to lower the on-prem spend when they go back to the office the next three bars are on-prem friendly if you add those up as 29 percent of the respondents say their on-prem spend is going to bounce back to pre-covert levels or actually increase and of course 12 percent of that number by the way say they they've never altered their on-prem spend so eric no surprise but this bodes well for cloud but but it it isn't it also a positive for on-prem this we've had this dual funding premise meaning cloud continues to grow but neglected data center spend also gets a boost what's your thoughts you know really it's interesting it's people are spending on all fronts you and i were talking in a prep it's like you know we're we're in battle and i've got naval i've got you know air i've got land uh i've got to spend on cloud and digital transformation but i also have to spend for on-prem uh the hybrid work is here and it needs to be supported so this spending is going to increase you know when you look at this chart you're going to see though that roughly 36 percent of all respondents say that their spending is going to remain mostly on cloud so this you know that is still the clear direction uh digital transformation is still happening covid accelerated it greatly um you know you and i as journalists and researchers already know this is where the puck is going uh but spend has always lagged a little bit behind because it just takes some time to get there you know inversely 27 said that their on-prem spending will decrease so when you look at those two i still think that the trend is the friend for cloud spending uh even though yes they do have to continue spending on hybrid some of it's been neglected there are refresh cycles coming up so overall it just points to more and more spending right now it really does seem to be a very strong backdrop for it growth so i want to talk a little bit about the etr taxonomy before we bring up the next chart we get a lot of questions about this and of course when you do a massive survey like you're doing you have to have consistency for time series so you have to really think through what that what the buckets look like if you will so this next chart takes a look at the etr taxonomy and it breaks it down into simple to understand terms so the green is the portion of spending on a vendor's tech within a category that is accelerating and the red is the portion that is decelerating so eric what are the key messages in this data well first of all dave thank you so much for pointing that out we used to do uh just what we call a next a net score it's a proprietary formula that we use to determine the overall velocity of spending some people found it confusing um our data scientists decided to break this sector breakdown into what you said which is really more of a mode analysis in that sector how many of the vendors are increasing versus decreasing so again i just appreciate you bringing that up and allowing us to explain the the the reasoning behind our analysis there but what we're seeing here uh goes back to something you and i did last year when we did our predictions and that was that it services and consulting was going to have a true rebound in 2021 and that's what this is showing right here so in this chart you're going to see that consulting and services are really continuing their recovery uh 2020 had a lot of declines and they have the biggest sector over year-over-year acceleration sector-wise the other thing to point out in this which we'll get to again later is that the inverse analysis is true for video conferencing uh we will get to that so i'm going to leave a little bit of ammunition behind for that one but what we're seeing here is it consulting services being the real favorable and video conferencing uh having a little bit more trouble great okay and then let's let's take a look at that services piece and this next chart really is a drill down into that space and emphasizes eric what you were just talking about and we saw this in ibm's earnings where still more than 60 percent of ibm's business comes from services and the company beat earnings you know in part due to services outperforming expectations i think it had a somewhat easier compare and some of this pen-up demand that we've been talking about bodes well for ibm and in other services companies it's not just ibm right eric no it's not but again i'm going to point out that you and i did point out ibm in our in our predictions one we did in late december so it is nice to see one of the reasons we don't have a more favorable rating on ibm at the moment is because they are in the the process of spinning out uh this large unit and so there's a little bit of you know corporate action there that keeps us off on the sideline but i would also want to point out here uh tata infosys and cognizant because they're seeing year-over-year acceleration in both it consulting and outsourced i t services so we break those down separately and those are the three names that are seeing acceleration in both of those so again a tata emphasis and cognizant are all looking pretty well positioned as well so we've been talking a little bit about this skill shortage and this is what's i think so hard for for forecasters um is that you know on the one hand there's a lot of pent up demand you know it's like scott gottlieb said it's like woodstock coming out of the covid uh but on the other hand if you have a talent gap you've got to rely on external services so there's a learning curve there's a ramp up it's an external company and so it takes time to put those together so so this data that we're going to show you next uh is is really important in my view and ties what we're saying we're saying at the top it asks respondents to comment on their staffing plans the light blue is we're increasing staff the gray is no change in the magenta or whatever whatever color that is that sort of purplish color anyway that color is is decreasing and the picture is very positive across the board full-time staff offshoring contract employees outsourced professional services all up trending upwards and this eric is more evidence of the services bounce back yeah it certainly is david and what happened is when we caught this trend we decided to go one level deeper and say all right we're seeing this but we need to know why and that's what we always try to do here data will tell you what's happening it doesn't always tell you why and that's one of the things that etr really tries to dig in with through the insights interviews panels and also going direct with these more custom survey questions uh so in this instance i think the real takeaway is that 30 of the respondents said that their outsourced and managed services are going to increase over the next three months that's really powerful that's a large portion of organizations in a very short time period so we're capturing that this acceleration is happening right now and it will be happening in real time and i don't see it slowing down you and i are speaking about we have to you know increase cloud spend we have to increase hybrid spend there are refresh cycles coming up and there's just a real skill shortage so this is a long-term setup that bodes very well for it services and consulting you know eric when i came out of college i somebody told me read read read read as much as you can and and so i would and they said read the wall street journal every day and i so i did it and i would read the tech magazines and back then it was all paper and what happens is you begin to connect the dots and so the reason i bring that up is because i've now been had taken a bath in the etr data for the better part of two years and i'm beginning to be able to connect the dots you know the data is not always predictive but many many times it is and so this next data gets into the fun stuff where we name names a lot of times people don't like it because the marketing people and organizations say well the data's wrong of course that's the first thing they do is attack the data but you and i know we've made some really great calls work from home for sure you're talking about the services bounce back uh we certainly saw the rise of crowdstrike octa zscaler well before people were talking about that same thing with video conferencing and so so anyway this is the fun stuff and it looks at positive versus negative sentiment on on companies so first how does etr derive this data and how should we interpret it and what are some of your takeaways [Music] sure first of all how we derive the data or systematic um survey responses that we do on a quarterly basis and we standardize those responses to allow for time series analysis so we can do trend analysis as well we do find that our data because it's talking about forward-looking spending intentions is really more predictive because we're talking about things that might be happening six months three months in the future not things that a lot of other competitors and research peers are looking at things that already happened uh they're looking in the past etr really likes to look into the future and our surveys are set up to do so so thank you for that question it's an enjoyable lead-in but to get to the fun stuff like you said uh what we do here is we put ratings on the data sets i do want to put the caveat out there that our spending intentions really only captures top-line revenue it is not indicative of profit margin or any other line items so this is only going to be viewed as what we are rating the data set itself not the company um you know that's not what we're in the game of doing so i think that's very important for the marketing and the vendors out there themselves when they when they take a look at this we're just talking about what we can control which is our data we're going to talk about a few of the names here on this highlighted vendors list one we're going to go back to that you and i spoke about i guess about six months ago or maybe even earlier which was the observability space um you and i were noticing that it was getting very crowded a lot of new entrants um there was a lot of acquisition from more of the legacy or standard entrance players in the space and that is continuing so i think in a minute we're going to move into that observability space but what we're seeing there is that it's becoming incredibly crowded and we're possibly seeing signs of them cannibalizing each other uh we're also going to move on a little bit into video conferencing where we're capturing some spend deceleration and then ultimately we're going to get into a little bit of a storage refresh cycle and talk about that but yeah these are the highlighted vendors for april um we usually do this once a quarter and they do change based on the data but they're not usually whipsawed around the data doesn't move that quickly yeah so you can see the some of the big names on the left-hand side some of the sas companies that have momentum obviously servicenow has been doing very very well we've talked a lot about snowflake octa crowdstrike z scalar in all very positive as well as you know several others i i guess i'd add some some things i mean i think if thinking about the next decade it's it's cloud which is not going to be like the same cloud as last decade a lot of machine learning and deep learning and ai and the cloud is extending to the edge in the data center data obviously very important data is decentralized and distributed so data architectures are changing a lot of opportunities to connect across clouds and actually create abstraction layers and then something that we've been covering a lot is processor performance is actually accelerating relative to moore's law it's probably instead of doubling every two years it's quadrupling every two years and so that is a huge factor especially as it relates to powering ai and ai inferencing at the edge this is a whole new territory custom silicon is is really becoming in vogue uh and so we're something that we're watching very very closely yeah i completely completely agree on that and i do think that the the next version of cloud will be very different another thing to point out on that too is you can't do anything that you're talking about without collecting the data and and organizations are extremely serious about that now it seems it doesn't matter what industry they're in every company is a data company and that also bodes well for the storage call we do believe that there is going to just be a huge increase in the need for storage um and yes hopefully that'll become portable across multi-cloud and hybrid as well now as eric said the the etr data's it's it's really focused on that top line spend so if you look at the uh on on the right side of that chart you saw you know netapp was kind of negative was very negative right but there's a company that's in in transformation now they've lowered expectations and they've recently beat expectations that's why the stock has been doing better but but at the macro from a spending standpoint it's still challenged so you have big footprint companies like netapp and oracle is another one oracle's stock is at an all-time high but the spending relative to sort of previous cycles or relative to you know like for instance snowflake much much smaller not as high growth but they're managing expectations they're managing their transition they're managing profitability zoom is another one zoom looking looking negative but you know zoom's got to use its market cap now to to transform and increase its tam uh and then splunk is another one we're going to talk about splunk is in transition it acquired signal fx it just brought on this week teresa carlson who was the head of aws public sector she's the president and head of sales so they've got a go to market challenge and they brought in teresa carlson to really solve that but but splunk has been trending downward we called that you know several quarters ago eric and so i want to bring up the data on splunk and this is splunk eric in analytics and it's not trending in the right direction the green is accelerating span the red is and the bars is decelerating spend the top blue line is spending velocity or net score and the yellow line is market share or pervasiveness in the data set your thoughts yeah first i want to go back is a great point dave about our data versus a disconnect from an equity analysis perspective i used to be an equity analyst that is not what we do here and you you may the main word you said is expectations right stocks will trade on how they do compared to the expectations that are set uh whether that's buy side expectations sell side expectations or management's guidance themselves we have no business in tracking any of that what we are talking about is top line acceleration or deceleration so uh that was a great point to make and i do think it's an important one for all of our listeners out there now uh to move to splunk yes i've been capturing a lot of negative commentary on splunk even before the data turned so this has been about a year-long uh you know our analysis and review on this name and i'm dating myself here but i know you and i are both rock and roll fans so i'm gonna point out a led zeppelin song and movie and say that the song remains the same for splunk we are just seeing uh you know recent spending intentions are taking yet another step down both from prior survey levels from year ago levels uh this we're looking at in the analytics sector and spending intentions are decelerating across every single customer group if we went to one of our other slide analysis um on the etr plus platform and you do by customer sub sample in analytics it's dropping in every single vertical it doesn't matter which one uh it's really not looking good unfortunately and you had mentioned this as an analytics and i do believe the next slide is an information security yeah let's bring that up and it's unfortunately it's not doing much better so this is specifically fortune 500 accounts and information security uh you know there's deep pockets in the fortune 500 but from what we're hearing in all the insights and interviews and panels that i personally moderate for etr people are upset they didn't like the the strong tactics that splunk has used on them in the past they didn't like the ingestion model pricing the inflexibility and when alternatives came along people are willing to look at the alternatives and that's what we're seeing in both analytics and big data and also for their sim in security yeah so i think again i i point to teresa carlson she's got a big job but she's very capable she's gonna she's gonna meet with a lot of customers she's a go to market pro she's gonna have to listen hard and i think you're gonna you're gonna see some changes there um okay so there's more sorry there's more bad news on splunk so bring this up is is is net score for splunk in elastic accounts uh this is for analytics so there's 106 elastic accounts that uh in the data set that also have splunk and it's trending downward for splunk that's why it's green for elastic and eric the important call out from etr here is how splunk's performance in elastic accounts compares with its performance overall the elk stack which obviously elastic is a big part of that is causing pain for splunk as is data dog and you mentioned the pricing issue uh is it is it just well is it pricing in your assessment or is it more fundamental you know it's multi-level based on the commentary we get from our itdms that take the survey so yes you did a great job with this analysis what we're looking at is uh the spending within shared accounts so if i have splunk already how am i spending i'm sorry if i have elastic already how is my spending on splunk and what you're seeing here is it's down to about a 12 net score whereas splunk overall has a 32 net score among all of its customers so what you're seeing there is there is definitely a drain that's happening where elastic is draining spend from splunk and usage from them uh the reason we used elastic here is because all observabilities the whole sector seems to be decelerating splunk is decelerating the most but elastic is the only one that's actually showing resiliency so that's why we decided to choose these two but you pointed out yes it's also datadog datadog is cloud native uh they're more devops oriented they tend to be viewed as having technological lead as compared to splunk so that's a really good point a dynatrace also is expanding their abilities and splunk has been making a lot of acquisitions to push their cloud services they are also changing their pricing model right they're they're trying to make things a little bit more flexible moving off ingestion um and moving towards uh you know consumption so they are trying and the new hires you know i'm not gonna bet against them because the one thing that splunk has going for them is their market share in our survey they're still very well entrenched so they do have a lot of accounts they have their foothold so if they can find a way to make these changes then they you know will be able to change themselves but the one thing i got to say across the whole sector is competition is increasing and it does appear based on commentary and data that they're starting to cannibalize themselves it really seems pretty hard to get away from that and you know there are startups in the observability space too that are going to be you know even more disruptive i think i think i want to key on the pricing for a moment and i've been pretty vocal about this i think the the old sas pricing model where essentially you essentially lock in for a year or two years or three years pay up front or maybe pay quarterly if you're lucky that's a one-way street and i think it's it's a flawed model i like what snowflake's doing i like what datadog's doing look at what stripe is doing look what twilio is doing these are cons you mentioned it because it's consumption based pricing and if you've got a great product put it out there and you know damn the torpedoes and i think that is a game changer i i look at for instance hpe with green lake i look at dell with apex they're trying to mimic that model you know they're there and apply it to to infrastructure it's much harder with infrastructure because you got to deploy physical infrastructure but but that is a model that i think is going to change and i think all of the traditional sas pricing is going to is going to come under disruption over the next you know better part of the decades but anyway uh let's move on we've we've been covering the the apm space uh pretty extensively application performance management and this chart lines up some of the big players here comparing net score or spending momentum from the april 20th survey the gray is is um is sorry the the the gray is the april 20th survey the blue is jan 21 and the yellow is april 21. and not only are elastic and data dog doing well relative to splunk eric but everything is down from last year so this space as you point out is undergoing a transformation yeah the pressures are real and it's you know it's sort of that perfect storm where it's not only the data that's telling us that but also the direct feedback we get from the community uh pretty much all the interviews i do i've done a few panels specifically on this topic for anyone who wants to you know dive a little bit deeper we've had some experts talk about this space and there really is no denying that there is a deceleration in spend and it's happening because that spend is getting spread out among different vendors people are using you know a data dog for certain aspects they're using elastic where they can because it's cheaper they're using splunk because they have to but because it's so expensive they're cutting some of the things that they're putting into splunk which is dangerous particularly on the security side if i have to decide what to put in and whatnot that's not really the right way to have security hygiene um so you know this space is just getting crowded there's disruptive vendors coming from the emerging space as well and what you're seeing here is the only bit of positivity is elastic on a survey over survey basis with a slight slight uptick everywhere else year over year and survey over survey it's showing declines it's just hard to ignore and then you've got dynatrace who based on the the interviews you do in the venn you're you know one on one or one on five you know the private interviews that i've been invited to dynatrace gets very high scores uh for their road map you've got new relic which has been struggling you know financially but they've got a purpose built they've got a really good product and a purpose-built database just for this apm space and then of course you've got cisco with appd which is a strong business for them and then as you mentioned you've got startups coming in you've got chaos search which ed walsh is now running you know leave the data in place in aws and really interesting model honeycomb it's going to be really disruptive jeremy burton's company observed so this space is it's becoming jump ball yeah there's a great line that came out of one of them and that was that the lines are blurring it used to be that you knew exactly that app dynamics what they were doing it was apm only or it was logging and monitoring only and a lot of what i'm hearing from the itdm experts is that the lines are blurring amongst all of these names they all have functionality that kind of crosses over each other and the other interesting thing is it used to be application versus infrastructure monitoring but as you know infrastructure is becoming code more and more and more and as infrastructure becomes code there's really no difference between application and infrastructure monitoring so we're seeing a convergence and a blurring of the lines in this space which really doesn't bode well and a great point about new relic their tech gets good remarks uh i just don't know if their enterprise level service and sales is up to snuff right now um as one of my experts said a cto of a very large public online hospitality company essentially said that he would be shocked that within 18 months if all of these players are still uh standalone that there needs to be some m a or convergence in this space okay now we're going to call out some of the data that that really has jumped out to etr in the latest survey and some of the names that are getting the most queries from etr clients which are many of which are investor clients so let's start by having a look at one of the most important and prominent work from home names zoom uh let's let's look at this eric is the ride over for zoom oh i've been saying it for a little bit of a time now actually i do believe it is um i will get into it but again pointing out great dave uh the reason we're presenting today splunk elastic and zoom are they are the most viewed on the etr plus platform uh trailing behind that only slightly is f5 i decided not to bring f5 to the table today because we don't have a rating on the data set um so then i went one deep one below that and it's pure so the reason we're presenting these to you today is that these are the ones that our clients and our community are most interested in which is hopefully going to gain interest to your viewers as well so to get to zoom um yeah i call zoom the pandec pandemic bull market baby uh this was really just one that had a meteoric ride you look back january in 2020 the stock was at 60 and 10 months later it was like like 580. that's in 10 months um that's cooled down a little bit uh into the mid 300s and i believe that cooling down should continue and the reason why is because we are seeing a huge deceleration in our spending intentions uh they're hitting all-time lows it's really just a very ugly data set um more importantly than the spending intentions for the first time we're seeing customer growth in our survey flattened in the past we could we knew that the the deceleration and spend was happening but meanwhile their new customer growth was accelerating so it was kind of hard to really make any call based on that this is the first time we're seeing flattening customer growth trajectory and that uh in tandem with just dominance from microsoft in every sector they're involved in i don't care if it's ip telephony productivity apps or the core video conferencing microsoft is just dominating so there's really just no way to ignore this anymore the data and the commentary state that zoom is facing some headwinds well plus you've pointed out to me that a lot of your private conversations with buyers says that hey we're we're using the freebie version of zoom you know we're not paying them and so in that combined with teams i mean it's it's uh it's i think you know look zoom has to figure it out they they've got to they've got to figure out how to use their elevated market cap to transform and expand their tan um but let's let's move on here's the data on pure storage and we've highlighted a number of times this company is showing elevated spending intentions um pure announces earnings in in may ibm uh just announced storage what uh it was way down actually so sort of still pure more positive but i'll comment on a moment but what does this data tell you eric yeah you know right now we started seeing this data last survey in january and that was the first time we really went positive on the data set itself and it's just really uh continuing so we're seeing the strongest year-over-year acceleration in the entire survey um which is a really good spot to be pure is also a leading position in among its sector peers and the other thing that was pretty interesting from the data set is among all storage players pure has the highest positive public cloud correlation so what we can do is we can see which respondents are accelerating their public cloud spend and then cross-reference that with their storage spend and pure is best positioned so as you and i both know uh you know digital transformation cloud spending is increasing you need to be aligned with that and among all storage uh sector peers uh pure is best positioned in all of those in spending intentions and uh adoptions and also public cloud correlation so yet again just another really strong data set and i have an anecdote about why this might be happening because when i saw the date i started asking in my interviews what's going on here and there was one particular person he was a director of cloud operations for a very large public tech company now they have hybrid um but their data center is in colo so they don't own and build their own physical building he pointed out that doran kovid his company wanted to increase storage but he couldn't get into his colo center due to covert restrictions they weren't allowed you had so 250 000 square feet right but you're only allowed to have six people in there so it's pretty hard to get to your rack and get work done he said he would buy storage but then the cola would say hey you got to get it out of here it's not even allowed to sit here we don't want it in our facility so he has all this pent up demand in tandem with pent up demand we have a refresh cycle the ssd you know depreciation uh you know cycle is ending uh you know ssds are moving on and we're starting to see uh new technology in that space nvme sorry for technology increasing in that space so we have pent up demand and we have new technology and that's really leading to a refresh cycle and this particular itdm that i spoke to and many of his peers think this has a long tailwind that uh storage could be a good sector for some time to come that's really interesting thank you for that that extra metadata and i want to do a little deeper dive on on storage so here's a look at storage in the the industry in context and some of the competitive i mean it's been a tough market for the reasons that we've highlighted cloud has been eating away that flash headroom it used to be you'd buy storage to get you know more spindles and more performance and you were sort of forced to buy more flash gave more headroom but it's interesting what you're saying about the depreciation cycle so that's good news so etr combines just for people's benefit here combines primary and secondary storage into a single category so you have companies like pure and netapp which are really pure play you know primary storage companies largely in the sector along with veeam cohesity and rubric which are kind of secondary data or data protection so my my quick thoughts here are that pure is elevated and remains what i call the one-eyed man in the land of the blind but that's positive tailwinds there so that's good news rubric is very elevated but down it's a big it's big competitor cohesity is way off its highs and i have to say to me veeam is like the steady eddy consistent player here they just really continue to do well in the data protection business and and the highs are steady the lows are steady dell is also notable they've been struggling in storage their isg business which comprises service and storage it's been soft during covid and and during even you know this new product rollout so it's notable with this new mid-range they have in particular the uptick in dell this survey because dell so large a small uptick can be very good for dell hpe has a big announcement next month in storage so that might improve based on a product cycle of course the nimble brand continues to do well ibm as i said just announced a very soft quarter you know down double digits again uh and there in a product cycle shift and netapp is that looks bad in the etr data from a spending momentum standpoint but their management team is transforming the company into a cloud play which eric is why it was interesting that pure has the greatest momentum in in cloud accounts so that is sort of striking to me i would have thought it would be netapp so that's something that we want to pay attention to but i do like a lot of what netapp is doing uh and other than pure they're the only big kind of pure play in primary storage so long winded uh uh intro there eric but anything you'd add no actually i appreciate it was long winded i i'm going to be honest with you storage is not my uh my best sector as far as a researcher and analyst goes uh but i actually think a lot of what you said is spot on um you know we do capture a lot of large organizations spend uh we don't capture much mid and small so i think when you're talking about these large large players like netapp and um you know not looking so good all i would state is that we are capturing really big organizations spending attention so these are names that should be doing better to be quite honest uh in those accounts and you know at least according to our data we're not seeing it and it's long-term depression as you can see uh you know netapp now has a negative spending velocity in this analysis so you know i can go dig around a little bit more but right now the names that i'm hearing are pure cohesity uh um i'm hearing a little bit about hitachi trying to reinvent themselves in the space but you know i'll take a wait-and-see approach on that one but uh pure and cohesity are the ones i'm hearing a lot from our community so storage is transforming to cloud as a service you're seeing things like apex and in green lake from dell and hpe and container storage little so not really a lot of people paying attention to it but pure about a company called portworx which really specializes in container storage and there's many startups there they're trying to really change the way david flynn has a startup in that space he's the guy who started fusion i o so a lot a lot of transformations happening here okay i know it's been a long segment we have to summarize and then let me go through a summary and then i'll give you the last word eric so tech spending appears to be tracking us gdp at six to seven percent this talent shortage could be a blocker to accelerating i.t deployments and that's kind of good news actually for for services companies digital transformation you know it's it remains a priority and that bodes well not only for services but automation uipath went public this week we we profiled that you know extensively that went public last wednesday um organizations they've i said at the top face some tough decisions on how to allocate resources you know running the business growing the business transforming the business and we're seeing a bifurcation of spending and some residual effects on vendors and that remains a theme that we're watching eric your final thoughts yeah i'm going to go back quickly to just the overall macro spending because there's one thing i think is interesting to point out and we're seeing a real acceleration among mid and small so it seems like early on in the covid recovery or kovitz spending it was the deep pockets that moved first right fortune 500 knew they had to support remote work they started spending first round that in the fortune 500 we're only seeing about five percent spent but when you get into mid and small organizations that's creeping up to eight nine so i just think it's important to point out that they're playing catch-up right now uh also would point out that this is heavily skewed to north america spending we're seeing laggards in emea they just don't seem to be spending as much they're in a very different place in their recovery and uh you know i do think that it's important to point that out um lastly i also want to mention i know you do such a great job on following a lot of the disruptive vendors that you just pointed out pure doing container storage we also have another bi-annual survey that we do called emerging technology and that's for the private names that's going to be launching in may for everyone out there who's interested in not only the disruptive vendors but also private equity players uh keep an eye out for that we do that twice a year and that's growing in its respondents as well and then lastly one comment because you mentioned the uipath ipo it was really hard for us to sit on the sidelines and not put some sort of rating on their data set but ultimately um the data was muted unfortunately and when you're seeing this kind of hype into an ipo like we saw with snowflake the data was resoundingly strong we had no choice but to listen to what the data said for snowflake despite the hype um we didn't see that for uipath and we wanted to and i'm not making a large call there but i do think it's interesting to juxtapose the two that when snowflake was heading to its ipo the data was resoundingly positive and for uipath we just didn't see that thank you for that and eric thanks for coming on today it's really a pleasure to have you and uh so really appreciate the the uh collaboration and look forward to doing more of these we enjoy the partnership greatly dave we're very very happy to have you in the etr family and looking forward to doing a lot lot more with you in the future ditto okay that's it for today remember these episodes are all available as podcasts wherever you listen all you got to do is search breaking analysis podcast and please subscribe to the series check out etr's website it's etr dot plus we also publish a full report every week on wikibon.com at siliconangle.com you can email me david.velante at siliconangle.com you can dm me on twitter at dvalante or comment on our linkedin post i could see you in clubhouse this is dave vellante for eric porter bradley for the cube insights powered by etr have a great week stay safe be well and we'll see you next time

Published Date : Apr 25 2021

SUMMARY :

itself not the company um you know

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
12 percentQUANTITY

0.99+

sixQUANTITY

0.99+

2021DATE

0.99+

april 20thDATE

0.99+

microsoftORGANIZATION

0.99+

april 21DATE

0.99+

david flynnPERSON

0.99+

april 20thDATE

0.99+

63 percentQUANTITY

0.99+

dave vellantePERSON

0.99+

januaryDATE

0.99+

29 percentQUANTITY

0.99+

2020DATE

0.99+

teresa carlsonPERSON

0.99+

two yearsQUANTITY

0.99+

three yearsQUANTITY

0.99+

jan 21DATE

0.99+

portworxORGANIZATION

0.99+

todayDATE

0.99+

six peopleQUANTITY

0.99+

last yearDATE

0.99+

bostonLOCATION

0.99+

a yearQUANTITY

0.99+

splunkORGANIZATION

0.99+

ibmORGANIZATION

0.99+

late decemberDATE

0.99+

jeremy burtonPERSON

0.99+

first timeQUANTITY

0.99+

aprilDATE

0.99+

100QUANTITY

0.99+

250 000 square feetQUANTITY

0.99+

first timeQUANTITY

0.99+

nimbleORGANIZATION

0.99+

next monthDATE

0.99+

eightQUANTITY

0.99+

siliconangle.comOTHER

0.99+

first roundQUANTITY

0.99+

every two yearsQUANTITY

0.99+

dellORGANIZATION

0.99+

more than 60 percentQUANTITY

0.99+

dynatraceORGANIZATION

0.98+

hitachiORGANIZATION

0.98+

seven percentQUANTITY

0.98+

this weekDATE

0.98+

three namesQUANTITY

0.98+

six monthsQUANTITY

0.98+

twoQUANTITY

0.98+

several quarters agoDATE

0.98+

oneQUANTITY

0.98+

bothQUANTITY

0.98+

bi-annualQUANTITY

0.98+

fortune 500ORGANIZATION

0.98+

first chartQUANTITY

0.98+

seventh iterationQUANTITY

0.97+

netappORGANIZATION

0.97+

etrORGANIZATION

0.97+

twice a yearQUANTITY

0.97+

last wednesdayDATE

0.97+

10 monthsQUANTITY

0.97+

twitterORGANIZATION

0.97+

uipathORGANIZATION

0.97+

over a fifthQUANTITY

0.97+

palo altoORGANIZATION

0.97+

ericPERSON

0.97+

one thingQUANTITY

0.97+

18 monthsQUANTITY

0.97+

hpeORGANIZATION

0.97+

oracleORGANIZATION

0.97+

3QUANTITY

0.97+

marchDATE

0.97+

30 of the respondentsQUANTITY

0.96+

27QUANTITY

0.96+

apexORGANIZATION

0.96+

Breaking Analysis: Tech Spend Momentum but Mixed Rotation to the ‘Norm’


 

>> From theCUBE studios in Palo Alto and Boston, Bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Recent survey data from ETR shows that enterprise tech spending is tracking with projected US GDP growth at six to 7% this year. Many markers continue to point the way to a strong recovery, including hiring trends and the loosening of frozen IT Project budgets. However skills shortages are blocking progress at some companies which bodes well for an increased reliance on external IT services. Moreover, while there's much talk about the rotation out of work from home plays and stocks such as video conferencing, VDI, and other remote worker tech, we see organizations still trying to figure out the ideal balance between funding headquarter investments that have been neglected and getting hybrid work right. In particular, the talent gap combined with a digital mandate, means companies face some tough decisions as to how to fund the future while serving existing customers and transforming culturally. Hello everyone, and welcome to this week's Wikibon CUBE's Insights powered by ETR. In this "Breaking Analysis", we welcome back Erik Porter Bradley of ETR who will share fresh data, perspectives and insights from the latest survey data. Erik, great to see you. Welcome. >> Thank you very much, Dave. Always good to see you and happy to be on the show again. >> Okay, we're going to share some macro data and then we're going to dig into some highlights from ETR's most recent March COVID survey and also the latest April data. So Erik, the first chart that we want to show, it shows CIO and IT buyer responses to expected IT spend for each quarter of 2021 versus 2020, and you can see here a steady quarterly improvement. Erik, what are the key takeaways, from your perspective? >> Sure, well, first of all, for everyone out there, this particular survey had a record-setting number of participation. We had a 1,500 IT decision makers participate and we had over half of the Fortune 500 and over a fifth of the Global 1000. So it was a really good survey. This is seventh iteration of the COVID Impact Survey specifically, and this is going to transition to an overlarge macro survey going forward so we can continue it. And you're 100% right, what we've been tracking here since March of last year was, how is spending being impacted because of COVID? Where is it shifting? And what we're seeing now finally is that there is a real re-acceleration in spend. I know we've been a little bit more cautious than some of the other peers out there that just early on slapped an eight or a 9% number, but what we're seeing is right now, it's at a midpoint of over six, about 6.7% and that is accelerating. So, we are still hopeful that that will continue, and really, that spending is going to be in the second half of the year. As you can see on the left part of this chart that we're looking at, it was about 1.7% versus 3% for Q1 spending year-over-year. So that is starting to accelerate through the back half. >> I think it's prudent to be cautious (indistinct) 'cause normally you'd say, okay, tech is going to grow a couple of points higher than GDP, but it's really so hard to predict this year. Okay, the next chart here that we want to show you is we asked respondents to indicate what strategies they're employing in the short term as a result of coronavirus and you can see a few things that I'll call out and then I'll ask Erik to chime in. First, there's been no meaningful change of course, no surprise in tactics like remote work and holding travel, however, we're seeing very positive trends in other areas trending downward, like hiring freezes and freezing IT deployments, a downward trend in layoffs, and we also see an increase in the acceleration of new IT deployments and in hiring. Erik, what are your key takeaways? >> Well, first of all, I think it's important to point out here that we're also capturing that people believe remote work productivity is still increasing. Now, the trajectory might be coming down a little bit, but that is really key, I think, to the backdrop of what's happening here. So people have a perception that productivity of remote work is better than hybrid work and that's from the IT decision makers themselves, but what we're seeing here is that, most importantly, these organizations are citing plans to increase hiring, and that's something that I think is really important to point out. It's showing a real following, and to your point right in the beginning of the intro, we are seeing deployments stabilize versus prior survey levels, which means early on, they had no plans to launch new tech deployments, then they said, "Nope, we're going to start." and now that stalling, and I think it's exactly right, what you said, is there's an IT skills shortage. So people want to continue to do IT deployments 'cause they have to support work from home and a hybrid back return to the office, but they just don't have the skills to do so, and I think that's really probably the most important takeaway from this chart, is that stalling and to really ask why it's stalling. >> Yeah, so we're going to get into that for sure, and I think that's a really key point, is that accelerating IT deployments, it looks like it's hit a wall in the survey, but before we get deep into the skills, let's take a look at this next chart, and we're asking people here how our return to the new normal, if you will, and back to offices is going to change spending with on-prem architectures and applications. And so the first two bars, they're Cloud-friendly, if you add them up, it's 63% of the respondents, say that either they'll stay in the Cloud for the most part, or they're going to lower their on-prem spend when they go back to the office. The next three bars are on-prem friendly. If you add those up it's 29% of the respondents say their on-prem spend is going to bounce back to pre-COVID levels or actually increase, and of course, 12% of that number, by the way, say they've never altered their on-prem spend. So Erik, no surprise, but this bodes well for Cloud, but isn't it also a positive for on-prem? We've had this dual funding premise, meaning Cloud continues to grow, but neglected data center spend also gets a boost. What's your thoughts? >> Really, it's interesting. It's people are spending on all fronts. You and I were talking in the prep, it's like we're in battle and I've got naval, I've got air, I've got land, I've got to spend on Cloud and digital transformation, but I also have to spend for on-prem. The hybrid work is here and it needs to be supported. So this is spending is going to increase. When you look at this chart, you're going to see though, that roughly 36% of all respondents say that their spending is going to remain mostly on Cloud. So that is still the clear direction, digital transformation is still happening, COVID accelerated it greatly, you and I, as journalists and researchers already know this is where the puck is going, but spend has always lagged a little bit behind 'cause it just takes some time to get there. Inversely, 27% said that their on-prem spending will decrease. So when you look at those two, I still think that the trend is the friend for Cloud spending, even though, yes, they do have to continue spending on hybrid, some of it's been neglected, there are refresh cycles coming up, so, overall it just points to more and more spending right now. It really does seem to be a very strong backdrop for IT growth. >> So I want to talk a little bit about the ETR taxonomy before we bring up the next chart. We get a lot of questions about this, and of course, when you do a massive survey like you're doing, you have to have consistency for time series, so you have to really think through what the buckets look like, if you will. So this next chart takes a look at the ETR taxonomy and it breaks it down into simple-to-understand terms. So the green is the portion of spending on a vendor's tech within a category that is accelerating, and the red is the portion that is decelerating. So Erik, what are the key messages in this data? >> Well, first of all, Dave, thank you so much for pointing that out. We used to do, just what we call a Net score. It's a proprietary formula that we use to determine the overall velocity of spending. Some people found it confusing. Our data scientists decided to break this sector, break down into what you said, which is really more of a mode analysis. In that sector, how many of the vendors are increasing versus decreasing? So again, I just appreciate you bringing that up and allowing us to explain the reasoning behind our analysis there. But what we're seeing here goes back to something you and I did last year when we did our predictions, and that was that IT services and consulting was going to have a true rebound in 2021, and that's what this is showing right here. So in this chart, you're going to see that consulting and services are really continuing their recovery, 2020 had a lot of the clients and they have the biggest sector year-over-year acceleration sector wise. The other thing to point out on this, which we'll get to again later, is that the inverse analysis is true for video conferencing. We will get to that, so I'm going to leave a little bit of ammunition behind for that one, but what we're seeing here is IT consulting services being the real favorable and video conferencing having a little bit more trouble. >> Great, okay, and then let's take a look at that services piece, and this next chart really is a drill down into that space and emphasizes, Erik, what you were just talking about. And we saw this in IBM's earnings, where still more than 60% of IBM's business comes from services and the company beat earnings, in part, due to services outperforming expectations, I think it had a somewhat easier compare and some of this pent-up demand that we've been talking about bodes well for IBM and other services companies, it's not just IBM, right, Erik? >> No, it's not, but again, I'm going to point out that you and I did point out IBM in our predictions when we did in late December, so, it is nice to see. One of the reasons we don't have a more favorable rating on IBM at the moment is because they are in the process of spinning out this large unit, and so there's a little bit of a corporate action there that keeps us off on the sideline. But I would also want to point out here, Tata, Infosys and Cognizant 'cause they're seeing year-over-year acceleration in both IT consulting and outsourced IT services. So we break those down separately and those are the three names that are seeing acceleration in both of those. So again, at the Tata, Infosys and Cognizant are all looking pretty well positioned as well. >> So we've been talking a little bit about this skills shortage, and this is what's, I think, so hard for forecasters, is that in the one hand, There's a lot of pent up demand, Scott Gottlieb said it's like Woodstock coming out of the COVID, but on the other hand, if you have a talent gap, you've got to rely on external services. So there's a learning curve, there's a ramp up, it's an external company, and so it takes time to put those together. So this data that we're going to show you next, is really important in my view and ties what we were saying at the top. It asks respondents to comment on their staffing plans. The light blue is "We're increasing staff", the gray is "No change" and the magenta or whatever, whatever color that is that sort of purplish color, anyway, that color is decreasing, and the picture is very positive across the board. Full-time staff, offshoring, contract employees, outsourced professional services, all up trending upwards, and this Erik is more evidence of the services bounce back. >> Yeah, it's certainly, yes, David, and what happened is when we caught this trend, we decided to go one level deeper and say, all right, we're seeing this, but we need to know why, and that's what we always try to do here. Data will tell you what's happening, it doesn't always tell you why, and that's one of the things that ETR really tries to dig in with through the insights, interviews panels, and also going direct with these more custom survey questions. So in this instance, I think the real takeaway is that 30% of the respondents said that their outsourced and managed services are going to increase over the next three months. That's really powerful, that's a large portion of organizations in a very short time period. So we're capturing that this acceleration is happening right now and it will be happening in real time, and I don't see it slowing down. You and I are speaking about we have to increase Cloud spend, we have to increase hybrid spend, there are refresh cycles coming up, and there's just a real skills shortage. So this is a long-term setup that bodes very well for IT services and consulting. >> You know, Erik, when I came out of college, somebody told me, "Read, read, read, read as much as you can." And then they said, "Read the Wall Street Journal every day." and so I did it, and I would read the tech magazines and back then it was all paper, and what happens is you begin to connect the dots. And so the reason I bring that up is because I've now taken a bath in the ETR data for the better part of two years and I'm beginning to be able to connect the dots. The data is not always predictive, but many, many times it is. And so this next data gets into the fun stuff where we name names. A lot of times people don't like it because they're either marketing people at organizations, say, "Well, data's wrong." because that's the first thing they do, is attack the data. But you and I know, we've made some really great calls, work from home, for sure, you're talking about the services bounce back. We certainly saw the rise of CrowdStrike, Okta, Zscaler, well before people were talking about that, same thing with video conferencing. And so, anyway, this is the fun stuff and it looks at positive versus negative sentiment on companies. So first, how does ETR derive this data and how should we interpret it, and what are some of your takeaways? >> Sure, first of all, how we derive the data, are systematic survey responses that we do on a quarterly basis, and we standardize those responses to allow for time series analysis so we can do trend analysis as well. We do find that our data, because it's talking about forward-looking spending intentions, is really more predictive because we're talking about things that might be happening six months, three months in the future, not things that a lot of other competitors and research peers are looking at things that already happened, they're looking in the past, ETR really likes to look into the future and our surveys are set up to do so. So thank you for that question, It's a enjoyable lead in, but to get to the fun stuff, like you said, what we do here is we put ratings on the datasets. I do want to put the caveat out there that our spending intentions really only captures top-line revenue. It is not indicative of profit margin or any other line items, so this is only to be viewed as what we are rating the data set itself, not the company, that's not what we're in the game of doing. So I think that's very important for the marketing and the vendors out there themselves when they take a look at this. We're just talking about what we can control, which is our data. We're going to talk about a few of the names here on this highlighted vendors list. One, we're going to go back to that you and I spoke about, I guess, about six months ago, or maybe even earlier, which was the observability space. You and I were noticing that it was getting very crowded, a lot of new entrants, there was a lot of acquisition from more of the legacy or standard players in the space, and that is continuing. So I think in a minute, we're going to move into that observability space, but what we're seeing there is that it's becoming incredibly crowded and we're possibly seeing signs of them cannibalizing each other. We're also going to move on a little bit into video conferencing, where we're capturing some spend deceleration, and then ultimately, we're going to get into a little bit of a storage refresh cycle and talk about that. But yeah, these are the highlighted vendors for April, we usually do this once a quarter and they do change based on the data, but they're not usually whipsawed around, the data doesn't move that quickly. >> Yeah, so you can see some of the big names in the left-hand side, some of the SAS companies that have momentum. Obviously, ServiceNow has been doing very, very well. We've talked a lot about Snowflake, Okta, CrowdStrike, Zscaler, all very positive, as well as several others. I guess I'd add some things. I mean, I think if thinking about the next decade, it's Cloud, which is not going to be like the same Cloud as the last decade, a lot of machine learning and deep learning and AI and the Cloud is extending to the edge and the data center. Data, obviously, very important, data is decentralized and distributed, so data architectures are changing. A lot of opportunities to connect across Clouds and actually create abstraction layers, and then something that we've been covering a lot is processor performance is actually accelerating relative to Moore's law. It's probably instead of doubling every two years, it's quadrupling every two years, and so that is a huge factor, especially as it relates to powering AI and AI inferencing at the edge. This is a whole new territory, custom Silicon is really becoming in vogue and so something that we're watching very, very closely. >> Yeah, I completely, agree on that and I do think that the next version of Cloud will be very different. Another thing to point out on that too, is you can't do anything that you're talking about without collecting the data and organizations are extremely serious about that now. It seems it doesn't matter what industry they're in, every company is a data company, and that also bodes well for the storage goal. We do believe that there is going to just be a huge increase in the need for storage, and yes, hopefully that'll become portable across multi-Cloud and hybrid as well. >> Now, as Erik said, the ETR data, it's really focused on that top-line spend. So if you look on the right side of that chart, you saw NetApp was kind of negative, was very negative, right? But it is a company that's in transformation now, they've lowered expectations and they've recently beat expectations, that's why the stock has been doing better, but at the macro, from a spending standpoint, it's still stout challenged. So you have big footprint companies like NetApp and Oracle is another one. Oracle's stock is at an all time high, but the spending relative to sort of previous cycles are relative to, like for instance, Snowflake, much, much smaller, not as high growth, but they're managing expectations, they're managing their transition, they're managing profitability. Zoom is another one, Zoom looking negative, but Zoom's got to use its market cap now to transform and increase its TAM. And then Splunk is another one we're going to talk about. Splunk is in transition, it acquired SignalFX, It just brought on this week, Teresa Carlson, who was the head of AWS Public Sector. She's the president and head of sales, so they've got a go-to-market challenge and they brought in Teresa Carlson to really solve that, but Splunk has been trending downward, we called that several quarters ago, Erik, and so I want to bring up the data on Splunk, and this is Splunk, Erik, in analytics, and it's not trending in the right direction. The green is accelerating spend, the red is in the bars is decelerating spend, the top blue line is spending velocity or Net score, and the yellow line is market share or pervasiveness in the dataset. Your thoughts. >> Yeah, first I want to go back. There's a great point, Dave, about our data versus a disconnect from an equity analysis perspective. I used to be an equity analyst, that is not what we do here. And the main word you said is expectations, right? Stocks will trade on how they do compare to the expectations that are set, whether that's buy-side expectations, sell-side expectations or management's guidance themselves. We have no business in tracking any of that, what we are talking about is the top-line acceleration or deceleration. So, that was a great point to make, and I do think it's an important one for all of our listeners out there. Now, to move to Splunk, yes, I've been capturing a lot of negative commentary on Splunk even before the data turns. So this has been a about a year-long, our analysis and review on this name and I'm dating myself here, but I know you and I are both rock and roll fans, so I'm going to point out a Led Zeppelin song and movie, and say that the song remains the same for Splunk. We are just seeing recent spending attentions are taking yet another step down, both from prior survey levels, from year ago levels. This, we're looking at in the analytics sector and spending intentions are decelerating across every single group, and we went to one of our other slide analysis on the ETR+ platform, and you do by customer sub-sample, in analytics, it's dropping in every single vertical. It doesn't matter which one. it's really not looking good, unfortunately, and you had mentioned this is an analytics and I do believe the next slide is an information security. >> Yeah, let's bring that up. >> And unfortunately it's not doing much better. So this is specifically Fortune 500 accounts and information security. There's deep pockets in the Fortune 500, but from what we're hearing in all the insights and interviews and panels that I personally moderate for ETR, people are upset, that they didn't like the strong tactics that Splunk has used on them in the past, they didn't like the ingestion model pricing, the inflexibility, and when alternatives came along, people are willing to look at the alternatives, and that's what we're seeing in both analytics and big data and also for their SIM and security. >> Yeah, so I think again, I pointed Teresa Carlson. She's got a big job, but she's very capable. She's going to meet with a lot of customers, she's a go-to-market pro, she's going to to have to listen hard, and I think you're going to see some changes there. Okay, so sorry, there's more bad news on Splunk. So (indistinct) bring this up is Net score for Splunk and Elastic accounts. This is for analytics, so there's 106 Elastic accounts in the dataset that also have Splunk and it's trending downward for Splunk, that's why it's green for Elastic. And Erik, the important call out from ETR here is how Splunk's performance in Elastic accounts compares with its performance overall. The ELK stack, which obviously Elastic is a big part of that, is causing pain for Splunk, as is Datadog, and you mentioned the pricing issue, well, is it pricing in your assessment or is it more fundamental? >> It's multi-level based on the commentary we get from our ITDMs teams that take the survey. So yes, you did a great job with this analysis. What we're looking at is the spending within shared accounts. So if I have Splunk already, how am I spending? I'm sorry if I have Elastic already, how am I spending on Splunk? And what you're seeing here is it's down to about a 12% Net score, whereas Splunk overall, has a 32% Net score among all of its customers. So what you're seeing there is there is definitely a drain that's happening where Elastic is draining spend from Splunk and usage from them. The reason we used Elastic here is because all observabilities, the whole sector seems to be decelerating. Splunk is decelerating the most, but Elastic is the only one that's actually showing resiliency, so that's why we decided to choose these two, but you pointed out, yes, it's also Datadog. Datadog is Cloud native. They're more dev ops-oriented. They tend to be viewed as having technological lead as compared to Splunk. So a really good point. Dynatrace also is expanding their abilities and Splunk has been making a lot of acquisitions to push their Cloud services, they are also changing their pricing model, right? They're trying to make things a little bit more flexible, moving off ingestion and moving towards consumption. So they are trying, and the new hires, I'm not going to bet against them because the one thing that Splunk has going for them is their market share in our survey, they're still very well entrenched. So they do have a lot of accounts, they have their foothold. So if they can find a way to make these changes, then they will be able to change themselves, but the one thing I got to say across the whole sector is competition is increasing, and it does appear based on commentary and data that they're starting to cannibalize themselves. It really seems pretty hard to get away from that, and you know there are startups in the observability space too that are going to be even more disruptive. >> I think I want to key on the pricing for a moment, and I've been pretty vocal about this. I think the old SAS pricing model where you essentially lock in for a year or two years or three years, pay up front, or maybe pay quarterly if you're lucky, that's a one-way street and I think it's a flawed model. I like what Snowflake's doing, I like what Datadog's doing, look at what Stripe is doing, look at what Twilio is doing, you mentioned it, it's consumption-based pricing, and if you've got a great product, put it out there and damn, the torpedoes, and I think that is a game changer. I look at, for instance, HPE with GreenLake, I look at Dell with Apex, they're trying to mimic that model and apply it to infrastructure, it's much harder with infrastructure 'cause you've got to deploy physical infrastructure, but that is a model that I think is going to change, and I think all of the traditional SAS pricing is going to come under disruption over the next better part of the decades, but anyway, let's move on. We've been covering the APM space pretty extensively, application performance management, and this chart lines up some of the big players here. Comparing Net score or spending momentum from the April 20th survey, the gray is, sorry, the gray is the April 20th survey, the blue is Jan 21 and the yellow is April 21, and not only are Elastic and Datadog doing well relative to Splunk, Erik, but everything is down from last year. So this space, as you point out, is undergoing a transformation. >> Yeah, the pressures are real and it's sort of that perfect storm where it's not only the data that's telling us that, but also the direct feedback we get from the community. Pretty much all the interviews I do, I've done a few panels specifically on this topic, for anyone who wants to dive a little bit deeper. We've had some experts talk about this space and there really is no denying that there is a deceleration in spend and it's happening because that spend is getting spread out among different vendors. People are using a Datadog for certain aspects, they are using Elastic where they can 'cause it's cheaper. They're using Splunk because they have to, but because it's so expensive, they're cutting some of the things that they're putting into Splunk, which is dangerous, particularly on the security side. If I have to decide what to put in and whatnot, that's not really the right way to have security hygiene. So this space is just getting crowded, there's disruptive vendors coming from the emerging space as well, and what you're seeing here is the only bit of positivity is Elastic on a survey-over-survey basis with a slight, slight uptick. Everywhere else, year-over-year and survey-over-survey, it's showing declines, it's just hard to ignore. >> And then you've got Dynatrace who, based on the interviews you do in the (indistinct), one-on-one, or one-on-five, the private interviews that I've been invited to, Dynatrace gets very high scores for their roadmap. You've got New Relic, which has been struggling financially, but they've got a really good product and a purpose-built database just for this APM space, and then of course, you've got Cisco with AppD, which is a strong business for them, and then as you mentioned, you've got startups coming in, you got ChaosSearch, which Ed Walsh is now running, leave the data in place in AWS and really interesting model, Honeycomb is getting really disruptive, Jeremy Burton's company, Observed. So this space is it's becoming jumped ball. >> Yeah, there's a great line that came out of one of them, and that was that the lines are blurring. It used to be that you knew exactly that AppDynamics, what they were doing, it was APM only, or it was logging and monitoring only, and a lot of what I'm hearing from the ITDM experts is that the lines are blurring amongst all of these names. They all have functionality that kind of crosses over each other. And the other interesting thing is it used to be application versus infrastructure monitoring, but as you know, infrastructure is becoming code more and more and more, and as infrastructure becomes code, there's really no difference between application and infrastructure monitoring. So we're seeing a convergence and a blurring of the lines in this space, which really doesn't bode well, and a great point about New Relic, their tech gets good remarks. I just don't know if their enterprise level service and sales is up to snuff right now. As one of my experts said, a CTO of a very large public online hospitality company essentially said that he would be shocked that within 18 months if all of these players are still standalone, that there needs to be some M and A or convergence in this space. >> Okay, now we're going to call out some of the data that really has jumped out to ETR in the latest survey, and some of the names that are getting the most queries from ETR clients, many of which are investor clients. So let's start by having a look at one of the most important and prominent work from home names, Zoom. Let's look at this. Erik is the ride over for Zoom? >> Ah, I've been saying it for a little bit of a time now actually. I do believe it is, and we'll get into it, but again, pointing out, great, Dave, the reason we're presenting today Splunk, Elastic and Zoom, they are the most viewed on the ETR+ platform. Trailing behind that only slightly is F5, I decided not to bring F5 to the table today 'cause we don't have a rating on the data set. So then I went one deep, one below that and it's pure. So the reason we're presenting these to you today is that these are the ones that our clients and our community are most interested in, which is hopefully going to gain interest to your viewers as well. So to get to Zoom, yeah, I call Zoom the pandemic bull market baby. This was really just one that had a meteoric ride. You look back, January in 2020, the stock was at $60 and 10 months later, it was like 580, that's in 10 months. That's cooled down a little bit into the mid-300s, and I believe that cooling down should continue, and the reason why is because we are seeing huge deceleration in our spending intentions. They're hitting all-time lows, it's really just a very ugly dataset. More importantly than the spending intentions, for the first time, we're seeing customer growth in our survey flatten. In the past, we knew that the deceleration of spend was happening, but meanwhile, their new customer growth was accelerating, so it was kind of hard to really make any call based on that. This is the first time we're seeing flattening customer growth trajectory, and that in tandem with just dominance from Microsoft in every sector they're involved in, I don't care if it's IP telephony, productivity apps or the core video conferencing, Microsoft is just dominating. So there's really just no way to ignore this anymore. The data and the commentary state that Zoom is facing some headwinds. >> Well, plus you've pointed out to me that a lot of your private conversations with buyers says that, "Hey, we're, we're using the freebie version of Zoom, and we're not paying them." And that combined with Teams, I mean, it's... I think, look, Zoom, they've got to figure out how to use their elevated market cap to transform and expand their TAM, but let's move on. Here's the data on Pure Storage and we've highlighted a number of times this company is showing elevated spending intentions. Pure announced it's earnings in May, IBM just announced storage, it was way down actually. So still, Pure, more positive, but I'll on that comment in a moment, but what does this data tell you, Erik? >> Yeah, right now we started seeing this data last survey in January, and that was the first time we really went positive on the data set itself, and it's just really continuing. So we're seeing the strongest year-over-year acceleration in the entire survey, which is a really good spot to be. Pure is also a leading position among its sector peers, and the other thing that was pretty interesting from the data set is among all storage players, Pure has the highest positive public Cloud correlation. So what we can do is we can see which respondents are accelerating their public Cloud spend and then cross-reference that with their storage spend and Pure is best positioned. So as you and I both know, digital transformation Cloud spending is increasing, you need to be aligned with that. And among all storage sector peers, Pure is best positioned in all of those, in spending intentions and adoptions and also public Cloud correlation. So yet again, to start another really strong dataset, and I have an anecdote about why this might be happening, because when I saw the data, I started asking in my interviews, what's going on here? And there was one particular person, he was a director of Cloud operations for a very large public tech company. Now, they have hybrid, but their data center is in colo, So they don't own and build their own physical building. He pointed out that during COVID, his company wanted to increase storage, but he couldn't get into his colo center due to COVID restrictions. They weren't allowed. You had 250,000 square feet, right, but you're only allowed to have six people in there. So it's pretty hard to get to your rack and get work done. He said he would buy storage, but then the colo would say, "Hey, you got to get it out of here. It's not even allowed to sit here. We don't want it in our facility." So he has all this pent up demand. In tandem with pent up demand, we have a refresh cycle. The SSD depreciation cycle is ending. SSDs are moving on and we're starting to see a new technology in that space, NVMe sorry, technology increasing in that space. So we have pent up demand and we have new technology and that's really leading to a refresh cycle, and this particular ITDM that I spoke to and many of his peers think this has a long tailwind that storage could be a good sector for some time to come. >> That's really interesting, thank you for that extra metadata. And I want to do a little deeper dive on storage. So here's a look at storage in the industry in context and some of the competitive. I mean, it's been a tough market for the reasons that we've highlighted, Cloud has been eating away that flash headroom. It used to be you'd buy storage to get more spindles and more performance and we're sort of forced to buy more, flash, gave more headroom, but it's interesting what you're saying about the depreciation cycle. So that's good news. So ETR combines, just for people's benefit here, combines primary and secondary storage into a single category. So you have companies like Pure and NetApp, which are really pure play primary storage companies, largely in the sector, along with Veeam, Cohesity and Rubrik, which are kind of secondary data or data protection. So my quick thoughts here that Pure is elevated and remains what I call the one-eyed man in the land of the blind, but that's positive tailwinds there, so that's good news. Rubrik is very elevated but down, it's big competitor, Cohesity is way off its highs, and I have to say to me, Veeam is like the Steady Eddy consistent player here. They just really continue to do well in the data protection business, and the highs are steady, the lows are steady. Dell is also notable, they've been struggling in storage. Their ISG business, which comprises servers and storage, it's been softer in COVID, and during even this new product rollout, so it's notable with this new mid range they have in particular, the uptick in Dell, this survey, because Dell is so large, a small uptick can be very good for Dell. HPE has a big announcement next month in storage, so that might improve based on a product cycle. Of course, the Nimble brand continues to do well, IBM, as I said, just announced a very soft quarter, down double digits again, and they're in a product cycle shift. And NetApp, it looks bad in the ETR data from a spending momentum standpoint, but their management team is transforming the company into a Cloud play, which Erik is why it was interesting that Pure has the greatest momentum in Cloud accounts, so that is sort of striking to me. I would have thought it would be NetApp, so that's something that we want to pay attention to, but I do like a lot of what NetApp is doing, and other than Pure, they're the only big kind of pure play in primary storage. So long-winded, intro there, Erik, but anything you'd add? >> No, actually I appreciate it as long-winded. I'm going to be honest with you, storage is not my best sector as far as a researcher and analyst goes, but I actually think that a lot of what you said is spot on. We do capture a lot of large organizations spend, we don't capture much mid and small, so I think when you're talking about these large, large players like NetApp not looking so good, all I would state is that we are capturing really big organization spending attention, so these are names that should be doing better to be quite honest, in those accounts, and at least according to our data, we're not seeing it in. It's longterm depression, as you can see, NetApp now has a negative spending velocity in this analysis. So, I can go dig around a little bit more, but right now the names that I'm hearing are Pure, Cohesity. I'm hearing a little bit about Hitachi trying to reinvent themselves in the space, but I'll take a wait-and-see approach on that one, but pure Cohesity are the ones I'm hearing a lot from our community. >> So storage is transforming to Cloud as a service. You've seen things like Apex in GreenLake from Dell and HPE and container storage. A little, so not really a lot of people paying attention to it, but Pure bought a company called Portworx which really specializes in container storage, and there's many startups there, they're trying to really change the way. David Flynn, has a startup in that space, he's the guy who started Fusion-io. So a lot of transformations happening here. Okay, I know it's been a long segment, we have to summarize, and let me go through a summary and then I'll give you the last word, Erik. So tech spending appears to be tracking US GDP at 6 to 7%. This talent shortage could be a blocker to accelerating IT deployments, so that's kind of good news actually for services companies. Digital transformation, it remains a priority, and that bodes, well, not only for services, but automation. UiPath went public this week, we profiled that extensively, that went public last Wednesday. Organizations that sit at the top face some tough decisions on how to allocate resources. They're running the business, growing the business, transforming the business, and we're seeing a bifurcation of spending and some residual effects on vendors, and that remains a theme that we're watching. Erik, your final thoughts. >> Yeah, I'm going to go back quickly to just the overall macro spending, 'cause there's one thing I think is interesting to point out and we're seeing a real acceleration among mid and small. So it seems like early on in the COVID recovery or COVID spending, it was the deep pockets that moved first, right? Fortune 500 knew they had to support remote work, they started spending first. Around that in the Fortune 500, we're only seeing about 5% spend, but when you get into mid and small organizations, that's creeping up to eight, nine. So I just think it's important to point out that they're playing catch up right now. I also would point out that this is heavily skewed to North America spending. We're seeing laggards in EMEA, they just don't seem to be spending as much. They're in a very different place in their recovery, and I do think that it's important to point that out. Lastly, I also want to mention, I know you do such a great job on following a lot of the disruptive vendors that you just pointed out, with Pure doing container storage, we also have another bi-annual survey that we do called Emerging Technology, and that's for the private names. That's going to be launching in May, for everyone out there who's interested in not only the disruptive vendors, but also private equity players. Keep an eye out for that. We do that twice a year and that's growing in its respondents as well. And then lastly, one comment, because you mentioned the UiPath IPO, it was really hard for us to sit on the sidelines and not put some sort of rating on their dataset, but ultimately, the data was muted, unfortunately, and when you're seeing this kind of hype into an IPO like we saw with Snowflake, the data was resoundingly strong. We had no choice, but to listen to what the data said for Snowflake, despite the hype. We didn't see that for UiPath and we wanted to, and I'm not making a large call there, but I do think it's interesting to juxtapose the two, that when snowflake was heading to its IPO, the data was resoundingly positive, and for UiPath, we just didn't see that. >> Thank you for that, and Erik, thanks for coming on today. It's really a pleasure to have you, and so really appreciate the collaboration and look forward to doing more of these. >> Yeah, we enjoy the partnership greatly, Dave. We're very happy to have you on the ETR family and looking forward to doing a lot, lot more with you in the future. >> Ditto. Okay, that's it for today. Remember, these episodes are all available as podcasts wherever you listen. All you have to do is search "Breaking Analysis" podcast, and please subscribe to the series. Check out ETR website it's etr.plus. We also publish a full report every week on wikibon.com and siliconangle.com. You can email me, david.vellante@siliconangle.com, you can DM me on Twitter @dvellante or comment on our LinkedIn posts. I could see you in Clubhouse. This is Dave Vellante for Erik Porter Bradley for the CUBE Insights powered by ETR. Have a great week, stay safe, be well and we'll see you next time. (bright music)

Published Date : Apr 23 2021

SUMMARY :

This is "Breaking Analysis" out the ideal balance Always good to see you and and also the latest April data. and really, that spending is going to be that we want to show you and that's from the IT that number, by the way, So that is still the clear direction, and the red is the portion is that the inverse analysis and the company beat earnings, One of the reasons we don't is that in the one hand, is that 30% of the respondents said a bath in the ETR data and the vendors out there themselves and the Cloud is extending and that also bodes well and the yellow line is and say that the song hearing in all the insights in the dataset that also have Splunk but the one thing I got to and the yellow is April 21, and it's sort of that perfect storm and then as you mentioned, and a blurring of the lines and some of the names that and the reason why is Here's the data on Pure and the other thing that and some of the competitive. is that we are capturing Organizations that sit at the and that's for the private names. and so really appreciate the collaboration and looking forward to doing and please subscribe to the series.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ErikPERSON

0.99+

DavePERSON

0.99+

Dave VellantePERSON

0.99+

IBMORGANIZATION

0.99+

OracleORGANIZATION

0.99+

David FlynnPERSON

0.99+

Teresa CarlsonPERSON

0.99+

April 20thDATE

0.99+

DavidPERSON

0.99+

AprilDATE

0.99+

MicrosoftORGANIZATION

0.99+

Erik Porter BradleyPERSON

0.99+

ApexORGANIZATION

0.99+

MayDATE

0.99+

April 21DATE

0.99+

Scott GottliebPERSON

0.99+

Jan 21DATE

0.99+

three yearsQUANTITY

0.99+

2021DATE

0.99+

sixQUANTITY

0.99+

12%QUANTITY

0.99+

ETRORGANIZATION

0.99+

JanuaryDATE

0.99+

29%QUANTITY

0.99+

SplunkORGANIZATION

0.99+

Jeremy BurtonPERSON

0.99+

Teresa CarlsonPERSON

0.99+

NetAppORGANIZATION

0.99+

63%QUANTITY

0.99+

TwilioORGANIZATION

0.99+

30%QUANTITY

0.99+

two yearsQUANTITY

0.99+

PortworxORGANIZATION

0.99+

BostonLOCATION

0.99+

9%QUANTITY

0.99+

UiPathORGANIZATION

0.99+

HitachiORGANIZATION

0.99+

HPEORGANIZATION

0.99+

250,000 square feetQUANTITY

0.99+

eightQUANTITY

0.99+

six peopleQUANTITY

0.99+

a yearQUANTITY

0.99+

DellORGANIZATION

0.99+

last yearDATE

0.99+

2020DATE

0.99+

3%QUANTITY

0.99+

100%QUANTITY

0.99+

DatadogORGANIZATION

0.99+

StripeORGANIZATION

0.99+

FirstQUANTITY

0.99+

PureORGANIZATION

0.99+

New RelicORGANIZATION

0.99+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018


 

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Piotr MierzejewskiPERSON

0.99+

James KobielusPERSON

0.99+

JamesPERSON

0.99+

IBMORGANIZATION

0.99+

PiotrPERSON

0.99+

HortonworksORGANIZATION

0.99+

30 secondsQUANTITY

0.99+

BerlinLOCATION

0.99+

IWSORGANIZATION

0.99+

PythonTITLE

0.99+

SparkTITLE

0.99+

twoQUANTITY

0.99+

FirstQUANTITY

0.99+

ScalaTITLE

0.99+

Berlin, GermanyLOCATION

0.99+

350,000 employeesQUANTITY

0.99+

DSXORGANIZATION

0.99+

MacCOMMERCIAL_ITEM

0.99+

two thingsQUANTITY

0.99+

RStudioTITLE

0.99+

DSXTITLE

0.99+

DSX 1.2TITLE

0.98+

both developersQUANTITY

0.98+

secondQUANTITY

0.98+

GDPRTITLE

0.98+

Watson ExplorerTITLE

0.98+

Dataworks Summit 2018EVENT

0.98+

first lineQUANTITY

0.98+

Dataworks Summit Europe 2018EVENT

0.98+

SiliconANGLE MediaORGANIZATION

0.97+

end of JuneDATE

0.97+

TensorFlowTITLE

0.97+

thousands of librariesQUANTITY

0.96+

RTITLE

0.96+

JupyterORGANIZATION

0.96+

1.2.1OTHER

0.96+

two excellent daysQUANTITY

0.95+

Dataworks SummitEVENT

0.94+

Dataworks Summit EU 2018EVENT

0.94+

SPSSTITLE

0.94+

oneQUANTITY

0.94+

AzureTITLE

0.92+

one kindQUANTITY

0.92+

theCUBEORGANIZATION

0.92+

HDPORGANIZATION

0.91+

Alan Gates, Hortonworks | Dataworks Summit 2018


 

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

James KobielusPERSON

0.99+

Mandy ChessellPERSON

0.99+

AlanPERSON

0.99+

YahooORGANIZATION

0.99+

Jim KobielusPERSON

0.99+

EuropeLOCATION

0.99+

HortonworksORGANIZATION

0.99+

Alan GatesPERSON

0.99+

four yearsQUANTITY

0.99+

JamesPERSON

0.99+

INGORGANIZATION

0.99+

BerlinLOCATION

0.99+

yesterdayDATE

0.99+

ApacheORGANIZATION

0.99+

SQLTITLE

0.99+

JavaTITLE

0.99+

GDPRTITLE

0.99+

PythonTITLE

0.99+

100%QUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

DataWorks SummitEVENT

0.99+

AtlasORGANIZATION

0.99+

DataWorks Summit 2018EVENT

0.98+

Data Steward StudioORGANIZATION

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

NiFiORGANIZATION

0.98+

Dataworks Summit 2018EVENT

0.98+

HadoopORGANIZATION

0.98+

one platformQUANTITY

0.97+

2018EVENT

0.97+

bothQUANTITY

0.97+

millions of eventsQUANTITY

0.96+

HbaseORGANIZATION

0.95+

TabloTITLE

0.95+

ODPIORGANIZATION

0.94+

Big Data AnalyticsORGANIZATION

0.94+

OneQUANTITY

0.93+

theCUBEORGANIZATION

0.93+

NiFiCOMMERCIAL_ITEM

0.92+

day twoQUANTITY

0.92+

about fiveQUANTITY

0.91+

KafkaTITLE

0.9+

ZeppelinORGANIZATION

0.89+

AtlasTITLE

0.85+

RangerORGANIZATION

0.84+

JupyterORGANIZATION

0.83+

firstQUANTITY

0.82+

Apache AtlasORGANIZATION

0.82+

HadoopTITLE

0.79+

Data Science for All: It's a Whole New Game


 

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Tricia WangPERSON

0.99+

KatiePERSON

0.99+

Katie LinendollPERSON

0.99+

RobPERSON

0.99+

GoogleORGANIZATION

0.99+

JoanePERSON

0.99+

DanielPERSON

0.99+

Michael LiPERSON

0.99+

Nate SilverPERSON

0.99+

AppleORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

TrumpPERSON

0.99+

NatePERSON

0.99+

HondaORGANIZATION

0.99+

SivaPERSON

0.99+

McKinseyORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Larry BirdPERSON

0.99+

2017DATE

0.99+

Rob ThomasPERSON

0.99+

MichiganLOCATION

0.99+

YankeesORGANIZATION

0.99+

New YorkLOCATION

0.99+

ClintonPERSON

0.99+

IBMORGANIZATION

0.99+

TescoORGANIZATION

0.99+

MichaelPERSON

0.99+

AmericaLOCATION

0.99+

LeoPERSON

0.99+

four yearsQUANTITY

0.99+

fiveQUANTITY

0.99+

30%QUANTITY

0.99+

AstrosORGANIZATION

0.99+

TrishPERSON

0.99+

Sudden CompassORGANIZATION

0.99+

Leo MessiPERSON

0.99+

two teamsQUANTITY

0.99+

1,000 linesQUANTITY

0.99+

one yearQUANTITY

0.99+

10 investmentsQUANTITY

0.99+

NASDAQORGANIZATION

0.99+

The Signal and the NoiseTITLE

0.99+

TriciaPERSON

0.99+

Nir KalderoPERSON

0.99+

80%QUANTITY

0.99+

BCGORGANIZATION

0.99+

Daniel HernandezPERSON

0.99+

ESPNORGANIZATION

0.99+

H2OORGANIZATION

0.99+

FerrariORGANIZATION

0.99+

last yearDATE

0.99+

18QUANTITY

0.99+

threeQUANTITY

0.99+

Data IncubatorORGANIZATION

0.99+

PatriotsORGANIZATION

0.99+

Yaron Haviv, iguazio | BigData NYC 2017


 

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back everyone, we're live in New York City, this is theCUBE's coverage of BigData NYC, this is our own event for five years now we've been running it, been at Hadoop World since 2010, it's our eighth year covering the Hadoop World which has evolved into Strata Conference, Strata Hadoop, now called Strata Data, and of course it's bigger than just Strata, it's about big data in NYC, a lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the cohost this week with Jim Kobielus, who's the lead analyst on our BigData and our Wikibon team. Our next guest is Yaron Haviv, who's with iguazio, he's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE, good to see you again, congratulations. >> Yes, thanks, thanks very much. We're happy to be here again. >> You're known in the theCUBE community as the guy on Twitter who's always pinging me and Dave and team, saying, "Hey, you know, you guys got to "get that right." You really are one of the smartest guys on the network in our community, you're super-smart, your team has got great tech chops, and in the middle of all that is the hottest market which is cloud native, cloud native as it relates to the integration of how apps are being built, and essentially new ways of engineering around these solutions, not just repackaging old stuff, it's really about putting things in a true cloud environment, with an application development, with data at the center of it, you got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my pointed questions I know Jim's got a ton of questions, is give us an update on what's going on so you guys got some news here at the show, let's get to that first. >> So since the last time we spoke, we had tons of news. We're making revenues, we have customers, we've just recently GA'ed, we recently got significant investment from major investors, we raised about $33 million recently from companies like Verizon Ventures, Bosch, you know for IoT, Chicago Mercantile Exchange, which is Dow Jones and other properties, Dell EMC. So pretty broad. >> John: So customers, pretty much. >> Yeah, so that's the interesting thing. Usually you know investors are sort of strategic investors or partners or potential buyers, but here it's essentially our customers that it's so strategic to the business, we want to... >> Let's go with GA of the projects, just get into what's shipping, what's available, what's the general availability, what are you now offering? >> So iguazio is trying to, you know, you alluded to cloud native and all that. Usually when you go to events like Strata and BigData it's nothing to do with cloud native, a lot of hard labor, not really continuous development and integration, it's like continuous hard work, it's continuous hard work. And essentially what we did, we created a data platform which is extremely fast and integrated, you know has all the different forms of states, streaming and events and documents and tables and all that, into a very unique architecture, won't dive into that today. And on top of it we've integrated cloud services like Kubernetes and serverless functionality and others, so we can essentially create a hybrid cloud. So some of our customers they even deploy portions as an Opix-based settings in the cloud, and some portions in the edge or in the enterprise deployed the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. >> John: Is this a SAS product? >> So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you can just rent it, and it's deployed in Equanix facility, we have very strong partnerships with them globally. If you want to have something on-prem, you can get a software reference architecture, you go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics tech you could think of. You just put it in the factory instead of like two racks of Hadoop. >> So you're not general purpose, you're just whatever the customer wants to deploy the stack, their flexibility is on them. >> Yeah. Now it is an appliance >> You have a hosting solution? >> It is an appliance even when you deploy it on-prem, it's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs and you have UIs, and just like the cloud experience when you go to Amazon, you don't open the Kimono, you know, you just use it. So our experience that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs, >> You don't host anything for anyone? >> We host for some customers, including >> So you do whatever the customer was interested in doing? >> Yes. (laughs) >> So you're flexible, okay. >> We just want to make money. >> You're pretty good, sticking to the product. So on the GA, so here essentially the big data world you mentioned that there's data layers, like data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right. >> Yaron: Okay. >> Okay, yeah. >> No, you're a smart guy. >> What problem are you solving. So we'll just go to the simple. I love what you're doing, I assume you guys are super-smart, which I can say you are, but what's the problem you're solving, what's in it for me? >> Okay, so there are two problems. One is the challenge everyone wants to transform. You know there is this digital transformation mantra. And it means essentially two things. One is, I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. You know, I want to do mobile apps which are smarter, you know get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy Hadoop and Hive and all that stuff, and it takes them two years to productize it. And then they get to the data science bit. And by the time they finished they understand that this Hadoop thing can only do one thing. It's queries, and reporting and BI, and data warehousing. How do you do actionable insights from that stuff, okay? 'Cause actionable insights means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details. And then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into data lake, and started querying it and generating reports. And the boss said >> Low cost data link basically, was what you say. >> Yes, and the boss said, "Okay, what are we going to do with this report? "Is it generating any revenue to the business?" No. The only revenue generation if you take this data >> You're fired, exactly. >> No, not all fired, but now >> John: Look at the budget >> Now they're starting to buy our stuff. So now the point is okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase, I want to promote it into production. That's cloud native architectures, okay? Hadoop is not cloud, How do I take a Spark, Zeppelin, you know, a notebook and I turn it into production? There's no way to do that. >> By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. >> Yeah, so the cloud providers do address that because they are selling the package, >> Expands all the clouds, yeah. >> Yeah, so cloud providers are starting to have their own offerings which are all proprietary around this is how you would, you know, forget about HDFS, we'll have S3, and we'll have Redshift for you, and we'll have Athena, and again you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Redshift, you know, Kinesis, all that stuff, and it took them about two hours to generate the insights. Now the problem is they want to do driver incentives in real time. So they want to incent the driver to go and make more rides or other things, so they have to analyze the event of the location of the driver, the event of the location of the customers, and just throwing messages back based on analytics. So that's real time analytics, and that's not something that you can do >> They got to build that from scratch right away. I mean they can't do that with the existing. >> No, and Uber invested tons of energy around that and they don't get the same functionality. Another unique feature that we talk about in our PR >> This is for the use case you're talking about, this is the Grab, which is the car >> Grab is the number one ride-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use Hadoop, they use MemSQL for that stuff, so it's not really using open source and all that. But the point is for example, with Uber, when you have a, when they monetize the rides, they do it just based on demand, okay. And with Grab, now what they do, because of the capability that we can intersect tons of data in real time, they can also look at the weather, was there a terror attack or something like that. They don't want to raise the price >> A lot of other data points, could be traffic >> They don't want to raise the price if there was a problem, you know, and all the customers get aggravated. This is actually intersecting data in real time, and no one today can do that in real time beyond what we can do. >> A lot of people have semantic problems with real time, they don't even know what they mean by real time. >> Yaron: Yes. >> The data could be a week old, but they can get it to them in real time. >> But every decision, if you think if you generalize round the problem, okay, and we have slides on that that I explain to customers. Every time I run analytics, I need to look at four types of data. The context, the event, okay, what happened, okay. The second type of data is the previous state. Like I have a car, was it up or down or what's the previous state of that element? The third element is the time aggregation, like, what happened in the last hour, the average temperature, the average, you know, ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now. That's secondary data. So every time I run a machine learning task or any decision I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay, you take MemSQL, it's only the state part, you take Hadoop it's only like historical stuff. How do you assemble and stitch a feature vector. >> Well you talked about complex machine learning pipeline, so clearly, you're talking about a hybrid >> It's a prediction. And actions based on just dumb things, like the car broke and I need to send a garage, I don't need machine learning for that. >> So within your environment then, do you enable the machine learning models to execute across the different data platforms, of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real time decision? >> In our solution, everything is a document, so even a picture is a document, a lot of things. So you can essentially throw in a picture, run tensor flow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible, so that's what we give customers. The first thing is simplicity. They can now build applications, you know we have tier one now, automotive customer, CIO coming, meeting us. So you know when I have a project, one year, I need to have hired dozens of people, it's hugely complex, you know. Tell us what's the use case, and we'll build a prototype. >> John: All right, well I'm going to >> One week, we gave them a prototype, and he was amazed how in one week we created an application that analyzed all the streams from the data from the cars, did enrichment, did machine learning, and provided predictions. >> Well we're going to have to come in and test you on this, because I'm skeptical, but here's why. >> Everyone is. >> We'll get to that, I mean I'm probably not skeptical but I kind of am because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack. I mean that thing just morphed into a beast. Hadoop was a cost of ownership nightmare as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how are you different, if I'm going to play the skeptic here? You know, I've heard this before. How are you different than say OpenStack or Hadoop Clusters, 'cause that was a nightmare, cost of ownership, I couldn't get the type of value I needed, lost my budget. Why aren't you the same? >> Okay, that's interesting. I don't know if you know but I ran a lot of development for OpenStack when I was in Matinox and Hadoop, so I patched a lot of those >> So do you agree with what I said? That that was a problem? >> They are extremely complex, yes. And I think one of the things that first OpenStack tried to bite on too much, and it's sort of a huge tent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay. And also Hadoop is sort of a something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data. And then the ecosystem evolved into real time, and streaming and machine learning. >> A data warehousing alternative or whatever. >> So it doesn't fit the original model of batch processing, 'cause if an event comes from the car or an IoT device, and you have to do something with it, you need a table with an index. You can't just go and build a huge Parquet file. >> You know, you're talking about complexity >> John: That's why he's different. >> Go ahead. >> So what we've done with our team, after knowing OpenStack and all those >> John: All the scar tissue. >> And all the scar tissues, and my role was also working with all the cloud service providers, so I know their internal architecture, and I worked on SAP HANA and Exodata and all those things, so we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide, provide you infrastructure as a service. Let's focus on the application, and build from the application all the way to the flash, and the CPU instruction set, and the adapters and the networking, okay. That's what's different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with the dozen of dev ops guys all the stack above. You go to Amazon, you get services. Just they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes a VM and starts writing some >> But they're still a good service, but you got to put it together. >> Yeah right. But also the way they implement, because in order for them to scale is that they have a common layer, they found VMs, and then they're starting to build up applications so it's inefficient. And also a lot of it is built on 10-year-old baseline architecture. We've designed it for a very modern architecture, it's all parallel CPUs with 30 cores, you know, flash and NVMe. And so we've avoided a lot of the hardware challenges, and serialization, and just provide and abstraction layer pretty much like a cloud on top. >> Now in terms of abstraction layers in the cloud, they're efficient, and provide a simplification experience for developers. Serverless computing is up and coming, it's an important approach, of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. Can you talk about what you're doing at iguazio on serverless frameworks for on-prem or public? >> Yes, it's the first time I'm very active in CNC after Cloud Native Foundation. I'm one of the authors of the serverless white paper, which tries to normalize the definitions of all the vendors and come with a proposal for interoperable standard. So I spent a lot of energy on that, 'cause we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guys' stuff. We have all the Amazon APIs for data services, like Kinesis, Dynamo, S3, et cetera. We have the open source APIs, like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or iguazio, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nuclio. It's designed for real time >> John: How do you spell that? >> N-U-C-L-I-O. I even have the logo >> He's got a nice slick here. >> It's really fast because it's >> John: Nuclio, so that's open source that you guys just sponsor and it's all code out in the open? >> All the code is in the open, pretty cool, has a lot of innovative ideas on how to do stream processing and best, 'cause the original serverless functionality was designed around web hooks and HTTP, and even many of the open source projects are really designed around HTTP serving. >> I have a question. I'm doing research for Wikibon on the area of serverless, in fact we've recently published a report on serverless, and in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public, you know, serverless like AWS Lambda, and private on-prem deployment of serverless. Do you have any customers who are doing that or interested in hybridizing serverless across public and private? >> Of course, and we have some patents I don't want to go into, but the general idea is, what we've done in Nuclio is also the decoupling of the data from the computation, which means that things can sort of be disjoined. You can run a function in Raspberry Pi, and the data will be in a different place, and those things can sort of move, okay. >> So the persistence has to happen outside the serverless environment, like in the application itself? >> Outside of the function, the function acts as the persistent layer through APIs, okay. And how this data persistence is materialized, that server separate thing. So you can actually write the same function that will run against Kafka or Kinesis or Private MQ, or HTTP without modifying the function, and ad hoc, through what we call function bindings, you define what's going to be the thing driving the data, or storing the data. So that can actually write the same function that does ETL drop from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about a hundred times faster than Lambda, we do 400,000 events per second in Nuclio. So if you write your serverless code in Nuclio, it's faster than writing it yourself, because of all those low-level optimizations. >> Yaron, thanks for coming on theCUBE. We want to do a deeper dive, love to have you out in Palo Alto next time you're in town. Let us know when you're in Silicon Valley for sure, we'll make sure we get you on camera for multiple sessions. >> And more information re:Invent. >> Go to re:Invent. We're looking forward to seeing you there. Love the continuous analytics message, I think continuous integration is going through a massive renaissance right now, you're starting to see new approaches, and I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. >> Great. >> That's very great. >> This is theCUBE coverage of the hot startups here at BigData NYC, live coverage from New York, after this short break. I'm John Furrier, Jim Kobielus, after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media I'm John Furrier, the cohost this week with Jim Kobielus, We're happy to be here again. and in the middle of all that is the hottest market So since the last time we spoke, we had tons of news. Yeah, so that's the interesting thing. and some portions in the edge or in the enterprise all the analytics tech you could think of. So you're not general purpose, you're just Now it is an appliance and just like the cloud experience when you go to Amazon, So I got to ask you the question, which I can say you are, So the first generation is people that basically, was what you say. Yes, and the boss said, and in the same time generate actions, By the way, depending on which cloud you go to, and that's not something that you can do I mean they can't do that with the existing. and they don't get the same functionality. because of the capability that we can intersect and all the customers get aggravated. A lot of people have semantic problems with real time, but they can get it to them in real time. the average temperature, the average, you know, like the car broke and I need to send a garage, So you know when I have a project, an application that analyzed all the streams from the data Well we're going to have to come in and test you on this, but I kind of am because the history is pretty clear. I don't know if you know but I ran a lot of development is how do you do web ranking, okay, and you have to do something with it, and build from the application all the way to the flash, but you got to put it together. it's all parallel CPUs with 30 cores, you know, Now in terms of abstraction layers in the cloud, So also on the serverless, my agenda is trying to promote I even have the logo and even many of the open source projects on the area of serverless, in fact we've recently and the data will be in a different place, So if you write your serverless code in Nuclio, We want to do a deeper dive, love to have you is exactly along the lines of what the world wants, I'm John Furrier, Jim Kobielus, after this short break.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

MicrosoftORGANIZATION

0.99+

IBMORGANIZATION

0.99+

BoschORGANIZATION

0.99+

UberORGANIZATION

0.99+

JohnPERSON

0.99+

John FurrierPERSON

0.99+

Verizon VenturesORGANIZATION

0.99+

Yaron HavivPERSON

0.99+

AsiaLOCATION

0.99+

NYCLOCATION

0.99+

GoogleORGANIZATION

0.99+

New York CityLOCATION

0.99+

JimPERSON

0.99+

Palo AltoLOCATION

0.99+

30 coresQUANTITY

0.99+

New YorkLOCATION

0.99+

AWSORGANIZATION

0.99+

two yearsQUANTITY

0.99+

BigDataORGANIZATION

0.99+

Silicon ValleyLOCATION

0.99+

AmazonORGANIZATION

0.99+

five yearsQUANTITY

0.99+

two problemsQUANTITY

0.99+

Dell EMCORGANIZATION

0.99+

YaronPERSON

0.99+

OneQUANTITY

0.99+

DavePERSON

0.99+

KafkaTITLE

0.99+

third elementQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

Dow JonesORGANIZATION

0.99+

two thingsQUANTITY

0.99+

two racksQUANTITY

0.99+

todayDATE

0.99+

GrabORGANIZATION

0.99+

NuclioTITLE

0.99+

two key challengesQUANTITY

0.99+

Cloud Native FoundationORGANIZATION

0.99+

about $33 millionQUANTITY

0.99+

eighth yearQUANTITY

0.99+

HadoopTITLE

0.98+

second typeQUANTITY

0.98+

LambdaTITLE

0.98+

10 years agoDATE

0.98+

each cloudQUANTITY

0.98+

Strata ConferenceEVENT

0.98+

EquanixLOCATION

0.98+

10-year-oldQUANTITY

0.98+

first thingQUANTITY

0.98+

first generationQUANTITY

0.98+

oneQUANTITY

0.98+

second generationQUANTITY

0.98+

Hadoop WorldEVENT

0.98+

first timeQUANTITY

0.98+

theCUBEORGANIZATION

0.97+

NutanixORGANIZATION

0.97+

MemSQLTITLE

0.97+

each oneQUANTITY

0.97+

2010DATE

0.97+

KinesisTITLE

0.97+

SASORGANIZATION

0.96+

WikibonORGANIZATION

0.96+

Chicago Mercantile ExchangeORGANIZATION

0.96+

about two hoursQUANTITY

0.96+

this weekDATE

0.96+

one thingQUANTITY

0.95+

dozenQUANTITY

0.95+

Arun Murthy, Hortonworks | DataWorks Summit 2017


 

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

IBMORGANIZATION

0.99+

RobPERSON

0.99+

HortonworksORGANIZATION

0.99+

Rob ThomasPERSON

0.99+

GeorgePERSON

0.99+

LisaPERSON

0.99+

30%QUANTITY

0.99+

San JoseLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

25 machinesQUANTITY

0.99+

10 operating systemsQUANTITY

0.99+

hundredsQUANTITY

0.99+

Arun MurthyPERSON

0.99+

Silicon ValleyLOCATION

0.99+

twoQUANTITY

0.99+

AetnaORGANIZATION

0.99+

10 yearsQUANTITY

0.99+

ArunPERSON

0.99+

todayDATE

0.99+

SparkTITLE

0.99+

yesterdayDATE

0.99+

AWSORGANIZATION

0.99+

bothQUANTITY

0.99+

PythonTITLE

0.99+

last yearDATE

0.99+

Four years agoDATE

0.99+

15QUANTITY

0.99+

tomorrowDATE

0.99+

CUBEORGANIZATION

0.99+

threeQUANTITY

0.99+

DataWorks SummitEVENT

0.99+

seven databasesQUANTITY

0.98+

fourQUANTITY

0.98+

DataWorks Summit 2017EVENT

0.98+

United StatesLOCATION

0.98+

Dataworks SummitEVENT

0.98+

10QUANTITY

0.98+

EuropeLOCATION

0.97+

10 companiesQUANTITY

0.97+

OneQUANTITY

0.97+

one customerQUANTITY

0.97+

thousands of machinesQUANTITY

0.97+

about 10 yearsQUANTITY

0.96+

GDPRTITLE

0.96+

DockerTITLE

0.96+

SmartsenseORGANIZATION

0.96+

about 12 yearsQUANTITY

0.95+

this morningDATE

0.95+

eachQUANTITY

0.95+

two different versionsQUANTITY

0.95+

five turnsQUANTITY

0.94+

RTITLE

0.93+

four meta-trainsQUANTITY

0.92+

day 2QUANTITY

0.92+

Data Lakes 1.0COMMERCIAL_ITEM

0.92+

FlinkORGANIZATION

0.91+

firstQUANTITY

0.91+

HDPORGANIZATION

0.91+

Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017


 

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)

Published Date : Jun 14 2017

SUMMARY :

brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JamiePERSON

0.99+

TelcoORGANIZATION

0.99+

MadhuPERSON

0.99+

George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

IBMORGANIZATION

0.99+

Jamie EngesserPERSON

0.99+

Madhu KocharPERSON

0.99+

Rob BeardenPERSON

0.99+

GeorgePERSON

0.99+

LisaPERSON

0.99+

HortonworksORGANIZATION

0.99+

twoQUANTITY

0.99+

Rob ThomasPERSON

0.99+

Silicon ValleyLOCATION

0.99+

U.S.LOCATION

0.99+

second stepQUANTITY

0.99+

FirstQUANTITY

0.99+

thirdQUANTITY

0.99+

yesterdayDATE

0.99+

first stepQUANTITY

0.99+

two activitiesQUANTITY

0.99+

San JoseLOCATION

0.99+

second thingQUANTITY

0.99+

HortonworkORGANIZATION

0.99+

2015DATE

0.99+

firstQUANTITY

0.99+

first thoughtQUANTITY

0.98+

two thingsQUANTITY

0.98+

eight monthsQUANTITY

0.98+

three thingsQUANTITY

0.98+

OneQUANTITY

0.98+

todayDATE

0.98+

DataWorks SummitEVENT

0.97+

DataWorks Summit 2017EVENT

0.97+

two next guestsQUANTITY

0.97+

bothQUANTITY

0.97+

HadoopTITLE

0.97+

ApacheORGANIZATION

0.97+

Steven Pousty, Red Hat - Cisco DevNet Create 2017 - #DevNetCreate - #theCUBE


 

>> Announcer: Live from San Francisco, it's theCUBE, covering DevNet Create 2017, brought to you by Cisco. >> Okay, welcome back, everyone. We're here live in San Francisco for theCUBE's exclusive coverage of Cisco's new inaugural event called DevNet Create, an extension, an augmentation, a community-focused event of their DevNet community, which is a Cisco developer community, now out in the wild. Our next guest is Steven Pousty, lead developer and evangelist at Red Hat, I'm John Furrier, and my co-host Peter Burris. Steven, welcome to theCUBE. >> Thank you, thank you very much. It's exciting to be here. >> Great to have you on. We were just talking before on camera, getting all animated like, "Hey, turn the cameras on. "We got to get this conversation." We're talking about open source and really looking at some of the trends, but more importantly, the impact. >> Steven: Right. >> Also, we've had you guys on many times on theCUBE. We covered Red Hat Summit, Jim Whitehurst. So, abstractions layers in software, open source ecosystems, you have a background in nature. >> Steven: Yeah. I- >> And ecosystems, literally. >> Steven: Yeah. Yeah, yeah. Yeah, actually I have my PhD in ecology. I'm actually a conservation biologist by training, but IT and computer programming pays the bills a lot better than-- >> Hey, anthropologists and ecologists do very well in the tech world, believe it or not. >> Steven: Yeah, I love big data. >> Peter: And philosophers. >> Yeah, and philosophers. Yeah, with all that logic and the ontologies and all that. >> Ontologies and symbiotics. >> Steven: Yep, yep. >> John: Okay, so I got to ask you, obviously Red Hat has been really the poster child for open source companies going public. We've heard since over the past generation, "The Red Hat of blank, The Red Hat of," and that got played. Certainly we downplayed that. People were trying to call Cloudera the Red Hat of Hadoop (mumbles) realizing that that's never going to happen. You were a once in a generational company, but Red Hat was a tier two company back in those days. Now, open source is certainly tier one software across the board, and I think this event at Cisco kind of amplifies that. Look at it, open source has gone a whole nother generation. A lot of young kids coming in. It's tier one software. The business model is open source. Four new companies just went public recently. So, done deal. >> Right, I mean, I think if you look in the technology ecosystem as a whole, if you don't start with open source you either have some incredibly magic sauce that no one else has or you're done. You couldn't even look at the movies... The arch enemy when I was growing up in software was Microsoft of open source, right? If you look at them now with Satya, they've made great strides to be part of the open source ecosystem at a real level, not like just lip service like they used to do sometimes. Like when I interact with some of our Microsoft partners, you can tell that there's a different change and they really believe in that open source-- >> Microsoft used to be known as lip service and vaporware and they used to kind of freeze the market with their monopoly power as some would say, but more recently they've... Back in the old days, Linux was a cancer. Steve Ballmer said, "Linux is the cancer to the industry." >> Steven: And so-- >> John: Now they're doing Linux with .NET. >> And so at the Red Hat Summit just recently I did the Microsoft keynote, I was the Red Hat person on the Microsoft keynote, and we demonstrated .NET Core running in OpenShift on Linux machines, we demonstrated SQL Server running in containers on OpenShift, and then for the end we showed some of the community work, because both of us are involved in Kubernetes. We actually showed a Windows container spinning up IIS being orchestrated from a Linux OpenShift. So, it was actually the Linux server, the Linux OpenShift server, was talking to Windows containers and spinning up Windows containers on the fly. So, I never thought that would've happened. So, it's definitely a sea change. >> And boy was that partly the sea change, we can encapsulate it, is that we used to think in terms of winners and losers in the tech industry, and now it's big winners and less big winners, but the question is how is, I think the realization Microsoft had, is that open source does not demarcate winners from losers. It demarcates, or rather suggests, a new way of thinking about how software gets developed, how software gets integrated and packaged, and ultimately how software gets diffused. So, talk a little bit about this notion of the new world of winners and winners and how this thing moves together, almost in an ecosystem type of way, so that the capabilities overall improve over time, because that's really where we're going is digital business being able to do more for customers. >> Right, and I think that's one of the things that you're seeing coming out from the open source world now is it's becoming less and less about I have this technology versus this is the technology, this open source technology, that we use to help solve your business problems. I gave a talk about this a couple times. There's a concept in ecology called, now I'm blocking on the word, but you probably came across it in school, probably even elementary school. It's the idea that you have bare earth, and then a few plants show up and they start breaking it up, and those plants create a condition where new trees come in, and then it just keeps going and going and going, and then you finally have a rainforest at the end, right? >> Peter: Diversity? >> No, it's-- >> Anyway, we don't want to put you out. >> Yeah, I'm stuck on the word and I can't remember-- >> Here's an ecology question. I saw a Facebook thing where in Yellowstone National Park they introduced four wolves to the ecosystem, and all of a sudden the rivers are no longer wide, they're tighter, there's pools. So four wolves create dynamics. So there's a coexistence, but there's still wolves. >> Right, and so the-- >> John: Who's the wolves in the industry? >> See, that's the thing, it's not that. Just because there are wolves in the industry doesn't mean that they control the entire ecosystem. So I think what I say at the end of this talk is there is no right or wrong about where you are in the ecosystem or in your evolution as an ecosystem, right? There is what is right for your business problem. So, we have this in our, especially in the United States, we have this idea of you're either the winner in this space, you're the cloud solution and you're the winner, or you're not, you're nothing. It's like the Talladega Nights, "If you're not first, you're last!" >> He runs around in his underwear. That's your outcome if you have that strategy. >> Great strategy. >> It was such a good movie. But so the point that I was trying to make in this talk is there's lots of different... So like with bird species, when they need to share a tree, there can be six different species all in the same tree, and what they do is what's called niche differentiation. That means, "Oh, I'm going to specialize "in the tops of the trees "and I'm going to only eat this type of caterpillar." And the one on the bottom says, "I specialize on beetles and I do this." And I think what you're seeing with the open source stuff is all these things can coexist. Like GNOME versus KDE. Everybody was claiming GNOME or KDE was the winner for forever. They're still around for forever. So, what I think with this cloud software as well where everybody is like, "Oh, this is the one winning," or this is the, there's a whole host of places for them all to live, and with open source I think things just live forever. >> John: What's your ecosystem analogy that coexistence is actually a better philosophy looking at the big picture than some dominant wolf or whatever. >> That's right, it's the diversity, it's the mutualism, it's the coevolution, it's the right diversity. Like a desert is actually a beautiful place if you go to it. Like we like to pick on the desert, but if you actually spend time in the desert it's gorgeous. There's nothing wrong with the desert. So, if you're some company who doesn't need Kubernetes and all the other pieces in this huge cloud environment, don't feel like that's something you have to take on. >> Peter: But they are the desert. >> That's right, but they are the desert. But, all my PhD research was in the desert, and I used to hate it, because I started this little rolly polly in the desert, and by the time I left I was like, "Oh, I miss the desert when I don't have it." >> John: The sunrises are beautiful. >> Sunrises are beautiful. You can see forever. If you actually pay attention to the small things... All I'm trying to point out is people live in Kansas, people live in New York, people live all over, and they usually find where they live, unless it's some disgusting dump, they say this is a beautiful-- >> Peter: They find beauty in it. >> Yeah, and I think it shouldn't necessarily be everybody has to get to the same place and use all the same technology. There's technology reasons for everything. >> So, I want to pick up on that concept. So the industry used to be pretty much structured around asset specificity. This asset does this for you. As we move more to a software orientation that notion of asset specificity starts to blend away. I think that's one of the seminal features of digital business and digital business transformation is the reduction of asset specificity, but it does mean that increasingly we need to focus on what I'll call value specificity, that we're moving away from the asset being the dominant determinant of structure and how you do things to the value that's being generated and the value that's being presented in any number of different fashions, and that becomes what dictates or describes who you are, what you do, both as an individual, also as a company, as well as a piece of software data. So talk a bit about kind of this notion of niche specialization being more tied to the value that you create as opposed to the asset that you bring. >> That's right, and we're seeing this a lot with our customers, who... You know, OpenShift is based off of Kubernetes and Docker and all that stuff, and containers, and so what we're seeing is a lot of companies come to us and say, "Well, I want to use OpenShift for this. "I want to use OpenShift for that." It's no more that we go to customers and say, "Here's OpenShift and you will use it "for purposes X, Y, and Z." What it is is well, that IT group might say well I've got three different business groups that I have to produce stuff for them that they can use. And they'll say, "Can I use Kubernetes for this? "Can I use, oh, I can't? "Well, then I'll get something else for this, or can we adapt-- >> Or complement it. >> Yeah, it's about creating value for the business unit, and it's becoming more and more that now. I think it's an evolution that we've seen, again, this evolution of stuff with the shadow IT and all that stuff. It became less about you're some sort of specialized high priest with this special asset that only you know how to control, I know how to do GIS software, I know how to do big data, no, what value do you produce for me? I don't care that you can buy these kinds of servers and provision them. If I can't use them, what does that do for me, right? So I think we see that at Red Hat a lot where we were the enterprise Linux company, and I think our leaders have done a really good job of saying, "Yeah, that's a good place "where the puck is right now, "but that's not where the puck is staying. "It's moving towards value, "it's moving towards integrated solutions." Go ahead. >> Let me extend this a little bit. So one of the things that we've observed within (mumbles) SiliconAngle, and we've talked to some other people today specifically about this, was the idea that open source has done a really good job of looking at a thing, a convention, that's well defined and well established and then building an open source variant of it. Open source has not been as successful, for example, in the big data world, where the use case or the definition of where we're going is amorphous. Instead, a lot of open source development ends up looking at each other saying, "Well, I'll fix your problem and you'll fix my problem, kind of. Nothing wrong with that, but the vision of where the industry is going to go. How are different companies, what will be open source leadership at redefining where this industry goes so that the open source developers can both be free to do what they need to do, create value as they need to, but at the same time, share a common understanding of where this ends up? >> So I think this goes back to what you were talking about with value, right? So I think what ends up... I'll use the example of big data. So I did a lot of statistical analysis for my PhD, and back then you used SAS or S-PLUS, both proprietary solutions. I think what has caused some of the explosion in big data is that you had these data scientists, the statisticians, intermingling, fertilizing with the computer science people who were handling these other really big problems. So what comes out of that, this is that margin thing again, right? You have statistics and-- >> Peter: Diversity and interesting things happen in the margin. >> At the margin. So what you have is these two groups come together, and suddenly you have the computer science people saying, "Oh, well I know a lot about algorithms "and I'm going to help you figure out "how to get value of what... "You're trying to solve this statistical algorithm, "I'm going to help you build distributed software that does that and that's where we get that happening. >> So the collaboration at the edge, the fringe, the lunatic fringe, or whatever you want to call it, the margin, is where the innovation is. >> I think that's where the innovation is because that helps avoid the navel gazing, right? Like, "Oh, I'm looking at what you exactly built, "and I'm going to build a slight variation on it." Well no, I actually need some, when you bring other disciplines in they say, "Well, this is the problem I'm going to solve," and the computer science person or the other side will say, "Well, that sounds "kind of like this thing, but let's try," and then suddenly new ideas come up and new ways to handle things. So I think, again, switching to value rather than what technology am I going to build is what's going to actually drive like, we need something to handle our big data. That's what's going to drive the vision. So you see in the big data world you see Spark, you see Zeppelin, you see all these different things competing, but what they're all doing is trying to drive how do I analyze big data efficiently? So you get some competing solutions. Then over time I think that's the vision that they're driving. >> I got to ask you, so like naval gazers is one dimension, but also there's the rearranging the deck chairs, like someone says, "Let's move things around "and magic will happen." Well you're pushing a whole nother concept, which I think is legit, which is as you put people together it might be uncomfortable, but then innovation can come out of it. Okay, so here's the ways. Computer and science and cloud computing, all that great stuff is happening, compute, storage, algorithm, etc., data, now society. So now society has issues, because what's the societal impact? These are first generation problems that we're facing, which side of the street does the cards drive on? Who gets hit first? They have to make these decisions. You see all these new issues, from even younger kids, cyber bullying, online behavior, across the board, societal impact. We are those margins. >> So I think for me tools... I thought about this a lot, right, because in the college I was kind of a tools person, and I think tools are value neutral. Any tool can be used for good or for bad. So, what we're doing right now in the open source world is develop, and in IT in general, is developing new tools, and what usually ends up happening is society develops norms after the tools have been created. In some ways, I think... I some ways, I kind of... It's a hard one. This is a much longer discussion and probably would involve some sort of alcoholic liquid or something to draw it out. >> It's a double edged sword, or tool, depending on how you look at it. We got to see it first before you can problem solve it. >> But the problem is-- >> You can't problem solve vapor. >> That's right, but on the other hand, sometimes you can see if you stopped and aren't so enamored with the latest and greatest tool without thinking about like, "Oh, well what are actually the implications of it?" I was going to say, I think the Europeans do a little bit of a better job of putting a little bit of foresight into tools when they come out saying, "Hold on, let's take a look at this." >> John: At the impact? >> Yeah, at the impact. >> So let me add one more thing to the conversation, because I think you're spot on, that the tools may be value neutral, but the impact, the transaction cost, of doing certain types of work in a different ways, and some work, and work is not necessarily value neutral. We may look at some tools and say, "That work is not good. "This tool reduces the transaction cost "of performing that work faster "or more completely than that work, "so that tool is going to have a less positive impact--" >> Impact on society as a whole >> "Than some other tool." And I think we can start introducing that kind of an analysis into it. >> I think so. I think that was... I live in this area, like I'm in Santa Cruz, so when I want to I say I'm not in the Valley, but when I want to I say I am in the Valley, I think the Valley is particularly enamored with the toys, or the tools, that it produces, and how technology will solve all our problems, and technology is great, and it is inherently good, and I like to say, "No, it's a tool, "and so a tool could be used for good or for bad." Like one example is ride sharing. Everybody was like, "Oh, this is the best! "This is awesome!" One of the things I thought of, my father is an immigrant, so I'm first generation on my father's side, and he wasn't a taxi driver, but I know how hard it is for first generation immigrants if you don't speak the language really well. So what used to happen with those ride shares is you had to have the capital to acquire a car before you could actually do ride sharing. So what you were basically doing was disenfranchising people who didn't have the capital from actually having this as a source of income when they came to the country. So, I was very conflicted about it to start with. Now, I'm less conflicted. I actually don't think ride share, given the economics I've seen actually play out I actually think ride sharing is not as big of a market and as game changing as everybody was making it. It was just some funny economics. >> Well Steven, certainly the conversation is very awesome. We should have you at the studio in Palo Alto next time you're in the Valley. >> Sounds great. >> You have plenty of tools and shiny new toys. >> Go by the Baylands and then go birding together at the Baylands, or maybe some fishing. >> Let's bring theCube over to Santa Cruz for a couple days. >> We should go down. >> That's great. >> Chill in Santa Cruz. Surf those waves, cloud, data, society. >> There you go. >> theCube on the boardwalk. >> Final question for you. Cisco is trying to push the margin with this event. It's a new event. It's an extension. It's outside their comfort zone. They had some projects that were kind of dismissed, interclouding, other things, this is a statement. Your thoughts on this show, because they have DevNet, why DevNet Create? Your thoughts. >> I think DevNet Create is a great opportunity for Cisco. I've been to the Cisco, is it Cisco Live, the huge gazillion people event? And there's a lot of energy around that, but that's mostly like network engineers and people who were bread and butter Cisco people. I really like that Cisco, that blurring between software and hardware means that Cisco really should be pushing people more in the, "We're going to help you create really interesting solutions." The more they make that easy for the developers... I think some developers are hardware hackers and love it. I am not one of those, and there's a lot of us who are not, and the more you make it easy for me to use software to create really interesting hardware things, the better it is for us. >> It's a classic case, the data scientists meets the algorithm guy. >> Steven: Exactly. >> So they're trying to bring these margins together where it might be awkward at first, but magic can happen. >> If I got to sit with some hardware people and like, "You need to make it so that I can write in Python "and do a whole bunch of neat networking and stuff "so at my house I can keep track "of how many birds are coming to my bird feeder "because I want to do this really cool experiment, "make that easy for me." >> By the way, you got camera, so you got bird recognition software. >> Steven: Exactly, exactly. >> A new feature on AWS. >> Yeah, I've seen demos of that. It's incredible what they can actually pull out now. >> Steven Pousty, Lead Developer at Red Hat, thanks for coming on theCube. Great conversation. >> Thank you very much. >> We'll have to continue it in Palo Alto. More live coverage here at Cisco Systems' DevNet Create. It's their inaugural event for developers. It's where IoT and app developers meet infrastructure, application infrastructure (mumbles). I'm John Furrier, Peter Burris with theCube. We'll be right back. Stay with us. (techno music) >> Hi, I'm April Mitchell, and I'm the Senior Director of Strategy & Planning for Cisco DevNet.

Published Date : May 23 2017

SUMMARY :

covering DevNet Create 2017, brought to you by Cisco. I'm John Furrier, and my co-host Peter Burris. It's exciting to be here. and really looking at some of the trends, you have a background in nature. pays the bills a lot better than-- do very well in the tech world, believe it or not. Yeah, and philosophers. and I think this event at Cisco kind of amplifies that. Right, I mean, I think if you look in Steve Ballmer said, "Linux is the cancer to the industry." I did the Microsoft keynote, so that the capabilities overall improve over time, It's the idea that you have bare earth, and all of a sudden the rivers are no longer wide, It's like the Talladega Nights, That's your outcome if you have that strategy. But so the point that I was trying to make in this talk looking at the big picture and all the other pieces and by the time I left I was like, and they usually find where they live, Yeah, and I think it shouldn't necessarily be and the value that's being presented "Here's OpenShift and you will use it I don't care that you can buy these kinds of servers so that the open source developers to what you were talking about with value, right? happen in the margin. and suddenly you have the computer science people saying, the lunatic fringe, or whatever you want to call it, and the computer science person or the other side will say, Okay, so here's the ways. because in the college I was kind of a tools person, We got to see it first before you can problem solve it. You can't and aren't so enamored with the latest and greatest tool that the tools may be value neutral, And I think we can start introducing and I like to say, "No, it's a tool, Well Steven, certainly the conversation is very awesome. Go by the Baylands and then go birding together Chill in Santa Cruz. They had some projects that were kind of dismissed, and the more you make it easy for me to use software the data scientists meets the algorithm guy. So they're trying to bring these margins together If I got to sit with some hardware people and like, By the way, you got camera, It's incredible what they can actually pull out now. Steven Pousty, Lead Developer at Red Hat, We'll have to continue it in Palo Alto. and I'm the Senior Director

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Peter BurrisPERSON

0.99+

JohnPERSON

0.99+

Steven PoustyPERSON

0.99+

Steve BallmerPERSON

0.99+

StevenPERSON

0.99+

Palo AltoLOCATION

0.99+

PeterPERSON

0.99+

Jim WhitehurstPERSON

0.99+

MicrosoftORGANIZATION

0.99+

John FurrierPERSON

0.99+

CiscoORGANIZATION

0.99+

Santa CruzLOCATION

0.99+

KansasLOCATION

0.99+

New YorkLOCATION

0.99+

Red HatORGANIZATION

0.99+

San FranciscoLOCATION

0.99+

April MitchellPERSON

0.99+

United StatesLOCATION

0.99+

two groupsQUANTITY

0.99+

LinuxTITLE

0.99+

AWSORGANIZATION

0.99+

BaylandsLOCATION

0.99+

bothQUANTITY

0.99+

WindowsTITLE

0.99+

Talladega NightsTITLE

0.99+

2017DATE

0.99+

first generationQUANTITY

0.99+

Yellowstone National ParkLOCATION

0.99+

Red Hat SummitEVENT

0.98+

Linux OpenShiftTITLE

0.98+

PythonTITLE

0.98+

six different speciesQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

FacebookORGANIZATION

0.98+

Cisco DevNetORGANIZATION

0.98+

firstQUANTITY

0.98+

OpenShiftTITLE

0.98+

Cisco LiveEVENT

0.97+

Four new companiesQUANTITY

0.97+

oneQUANTITY

0.97+

KDETITLE

0.97+

SQL ServerTITLE

0.97+

tier oneQUANTITY

0.97+

OneQUANTITY

0.96+

one exampleQUANTITY

0.96+