Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

David Noy, Veritas | Vertias Vision 2017

>> Narrator: Live from Las Vegas it's The Cube covering Veritas Vision 2017. Brought to you by Veritas. >> Welcome back to Las Vegas, everybody this is The Cube, the leader in live tech coverage. We are here covering Veritas Vision 2017, the hashtag is VtasVision. My name is Dave Vellante, and I'm here with Stuart Miniman my cohost David Noy is here, he's the vice president of product management at Vertias. David, thanks for coming to The Cube. >> Thanks for having me, pretty excited. >> Yes, we enjoyed your keynote today taking us through the new product announcements. Let's unpack it, you're at the center of it all. Actually, let's start with the way you started your keynote is you recently left EMC, came here, why, why was that? >> I talk to lots and lots of customers, hundreds, thousands of customers. They're enterprise customers, they're all trying to solve the same kind of problems, reducing infrastructure costs, moving to commodity based architectures, moving to the cloud, in fact they did move to the cloud in Angara. If you look at the NAS market in 2016 it had been on a nice two percent incline until about the second half of 2016 it basically dove 12% and a big part of that was enterprises who were kicking the tires finally saying we're going to move to cloud and actually doing it as opposed to just talking about it. At EMC and a lot of the other big iron vendors they have a strategy that they discuss around helping customers move to cloud, helping them adopt commodity, but the reality is they make their money, their big margin points, on selling branded boxes, right? And as much as it's lip service, it's really hard to fulfill that promise when that's where you're making your revenue, you have revenue margin targets. Veritas on the other hand, it's a software company. We're here to sell software, we're able to make your data more manageable to understand that it's a truth in information, I don't need to own every bit, and I thought that the company that can basically A, provide the real promise of what software define offers is going to be a software company. Number two is that you can't buck the trend of the cloud it's going to happen, and either you're in the critical path and trying to provide friction, in which case you're going to become irrelevant pretty soon or you enable it and figure out how to partner with the cloud vendors in a nonthreatening way. I found that Veritas, because of its heterogeneity background, hey you want AIX, you want Linux, you want Solaris, great, we'll help you with all those. We can do the same thing with the cloud, and the cloud vendors will partner up with us because they love us for that reason. >> Before we get into the products, let's unpack that a little bit. Why is it that as Veritas you can participate in profit from that cloud migration? We know why you can't as a hardware vendor because ultimately the cloud vendor is going to be providing the box. >> Well, the answer is that, a couple things. One is, we believe and even the cloud vendors believe that you're going to be in a hybrid environment. If you project out for the next ten years, it's likely that a lot of data and applications and workloads will move to cloud, but not all of them will. And you probably end up in about a 50/50 shift. The vendor who can provide the management and intelligence and compliance capabilities, and the data protection capabilities across both your on-prem, and your off-premise state as a single unified product set is going to win, in my opinion, that's number one. Number two is that the cloud vendors are all great, but they specialize in different things. Some are specialized in machine learning, some are really good with visual image recognition, some are really good with mobile applications, and people are, in my opinion, going to go to two, three, four different clouds, just like I would go to contracting agencies, some might be good at giving me engineers, I might go to dice.com for engineers, I might go to something completely different for finance people, and you're going to use the best of breed clouds for specific applications. Being able to actually aggregate what you have in your universe of multicloud, and your hybrid environment and allowing you, as an administrator to be aware of all my assets, is something that as a non-branded box pusher, as a software vendor I can go do with credibility. >> You're a recovering box pusher. >> I'm a recovering box pusher, I'm one month into recovery, so thank you very much. >> And David, one of the things we're trying to understand a little bit, you've got products that live in lots of these environments, why do you have visibility into the data? Is it because they're backup customers, is it other pieces? Help us understand in that multicloud world, what I need to be to get that full. >> That's a great question and I'll bridge into some of the new products too. Number one is that Veritas has a huge amount of data that's basically trapped in repositories because we do provide backup, we're the largest backup vendor. So we have all this data that's essentially sitting inactive you know, Mike talks about it, Mike Palmer our CPO, talks about it as kind of like the Uber, you know, what do you do with your car when it's not being used, or Air BnB if you will, what do you do with your home when it's not being used, is you potentially rent it out. You make it available for other purposes. With all this trapped data, there's tons of information that we can glean that enterprises have been grabbing for years and years and years. So that's number one, we're in a great position 'cause we hold a lot of that data. Now, we have products that have the capabilities through classification engines, through engines that are extending machine learning capabilities, to open that data up and actually figure out what's inside. Now we can do it with the backup products, but let's face it, data is stored in a number of different other modaliites, right? So there's blocked data that is sitting at the bottom of containerized private clouds, there are tons and tons of unstructured data sitting in NAS repositories, and growing off-prem, but actually on prem this object storage technology for the set it and forget it long term retention. All of that data has hidden information, all of it can be extracted for more value with our same classification engines that we can run against the net backup estate, we can basically take that and extend that into these new modalities, and actually have compelling products that are not just offering infrastructure, but that are actually offering infrastructure with the promise of making that data more valuable. Make sense? >> It does, I mean it's the holy grail of backup. For years it's been insurance, and insurance is a good business, don't get me wrong, but even when you think about information governance, through sarbanes-oxley and FRCP et cetera, it was always that desire to turn that corpus of data into something more valuable than just insurance, it feels like, like you're saying with automated classification and the machine learning AI, we're sort of at the cusp of that, but we've been disappointed so many times what gives you confidence that this time it'll stick? >> Look, there's some very straightforward things that are happening that you just cannot ignore. GDPR is one, there's a specific timeline, specific rules, specific regulatory requirements that have to be met. That one's a no brainer, and that will drive people to understand that, hey when they apply our policies against the data that they have they'll be able to extract value. That'll be one of many, but that's an extreme proof-point because there's no getting around it, there's no interpretation of that, and the date is a hard date. What we'll do is we'll look quickly at other verticals, we'll look at vertical specific data, whether its in data surveillance, or germain sequencing or what have you, and we'll look at what we can extract there, and we'll partner with ISVs, is a strategy that I learned in my past life, in order to actually bring to market systems or solutions that can categorize specific, vertical industry data to provide value back to the end users. If we just try to provide a blanket, hey, I'm just going to provide data categorization, it's a swiss army knife solution. If we get hyper-focused around specific use cases, workloads and industries now we can be very targeted to what the end users care about. >> If I heard right, it's not just for backup, it's primary and secondary data that you're helping to solve and leverage and put intelligence into these products. >> That's right, initially we have an enormous trapped pool of secondary data, so that's great, we want to turn that trapped pool from just basically a stagnant pool into something that you can actually get value out of. >> That Walking Dead analogy you used. >> The Walking Dead, yeah. We also say that there's a lot of data that sits in primary storage, in fact there's a huge category of archive, which we call active archive, it's not really archive, still wanted on spinning disk or flash. You still want to use it for some purpose but what happens when that data goes out into the environment? I talked to customers in automotive, for example, automotive design manufacturers, they do simulations, and they're consuming storage and capacity all the time, they've got all of these runs, and they're overrunning their budget for storage and they have no idea which of those runs they can actually delete, so they create policies like "well, if it hasn't been touched "in 90 days, I'll delete it," Well, just because it hasn't been touched in 90 days doesn't mean there wasn't good information to be gleaned out of that particular simulation run, right? >> Alright, so I want to get back to the object, but before we go deeper there, block and file, there's market leaders out there that seems that, it's a bit entrenched, if you will, what between the hyperscale product and Veritas access, what's the opportunity that you see that Veritas has there, what differentiates you? >> Sure, well, let's start with block. The one big differentiator we'll have in block storage is that it's not just about providing storage to containerized applications. We want to be able to provide machine learning capabilities to where we can actually optimize the IO path for quality of service. Then, we also want to be able to through machine learning determine whether, if it's how you decide to run your business, you want a burst workloads actually out into the cloud. So we're partnered with the cloud vendors, who are happy to partner with us for the reasons that I described earlier, is that we're very vendor agnostic, we're very heterogeneous. To actually move workloads on-prem and off-prem that's a very differentiated capability. You see with a few of the vendors that are out there, I think Nutanix for example, can do that, but it's not something that everyone's going after, because they want to keep their workloads in their environments, they want to check controls. >> And if I can, that high speed data mover is your IP? >> That's right, that's our IP. Now, on the file system side... >> Just one thing, cloud bursting's one of those things, moving real-time is difficult, physics is still a challenge for us. Any specifics you can give, kind of a customer use case where they're doing that? A lot of times I want this piece of the application here, I want to store the data there, but real time, doing things, I can't move massive amounts of data just 'cause, speed of light. >> If you break it down, I don't think that we're going to solve the use case of, "I'm going to snap my finger "and move the workload immediately offline." Essentially what we'll do is we'll sync the data in the background, once it has been synced we'll actually be able to move the application offline and that'll all come down to one of two things: Either user cases that exceed the capabilities of the current infrastructure and I want to be able to continue to grow without building them into my data center, or I have an end of the month processing. A great case is I have a media entertainment company that I used to work with that was working on a film, and it came close to the release date of that film, and they were asked to go back and recut and reedit that film for specific reasons, a pretty interesting reason actually, it had to do with government pressure. And when they went to go back and edit that film they essentially had a point where like, oh my gosh, all of the servers that were dedicated to render for this film have been moved off to another project. What do we do now, right? The answer is, you got to burst. And if you had cloud burst capabilities you could actually use whatever application and then containerize whether you're running on-prem or off-prem, it doesn't matter, it's containeraized, if we can get the data out there into the cloud through fast pipes then basically you can now finish that job without having to take all those servers back, or repurchase that much infrastructure. So that's a pretty cool use case, that's things that people have been talking about doing but nobody's every successfully done. We're staring to prove that out with some vendors and some partners that potentially even want to embed this in their own solutions, larger technology partners. Now, you wanted to talk about file as well, right, and what makes file different. I spent five years with one of the most successful scale-up file systems, you probably know who they are. But the thing about them was that extracting that file system out of the box and making it available as a software solution that you could layer on any hardware is really hard, because you become so addicted to the way that the behavior of the underlying infrastructure, the behavior of the drives, down to the smart errors that come off the drives, you're so tied into that, which is great because you build a very high performance available product when you do that, but the moment you try to go to any sort of commodity hardware, suddenly things start to fall apart. We can do that, and in fact with our file system we're not saying "hey, you've got to go it on "commodity servers and with DAS drives in them." You could layer it on top of your existing net app, your Isolon, your whatever, you name it, your BNX, encapsulate it, and create policies to move data back and forth between those systems, or potentially even provision them out say, "okay, you know what, this is my gold tier, "my silver tier, my bronze tier." We can even encapsulate, for example, a directory on one file service, like a one file system array, and we can actually migrate that data into an object service, whether its on-prem or off-prem, and then provide the same NFS or SMB connectivity back into that data, for example a home directory migration use case, moving off of a NAS filer onto an object storer, on premise or off premise and to the end user, they don't know that things have actually moved. We think that kind of capability is really critical, because we love to sell boxes, if that's what the customer wants to buy from us, and appliance form factor, but we're not pushing the box as the ultimate end point. The ultimate end point is that software layer on top, and that's where the Veritas DNA really shines. >> That's interesting, the traditional use cases for block certainly, and maybe to a lesser extent file, historically fairly well known an understood. So to your point, you could tune an array specifically for those use cases, but in this day and age the processes, and the new business models that are emerging in the digital economy, very unpredictable in terms of the infrastructure requirements. So your argument is a true software defined capability is going to allow you to adapt much more freely and quickly. >> We've also built and we've demoed at Vision this week machine learning capabilities to actually go in and look at your workloads that are running against those underlying infrastructure and tell you are they correctly positioned or not. Oh, guess what, we really don't think this workload should belong on this particular tier that you've chosen, maybe you ought to consider moving it over here. That's something that historically has been the responsibility of the admin, to go in and figure out where those policies are, and try to make some intelligent decisions. But usually those decisions are not super intelligent, they're just like, is it old, is it not old, do I think it's going to be fast? But I don't really know until runtime, based on actual access patterns whether it's going to be high performance or not. Whether it's going to require moving or aging or not. By using machine learning type of algorithms we can actually look at the data, the access patterns over time, and help the administrators make that decision. >> Okay, we're out of time, but just to summarize, hyperscales, the block, access is the scale out, NAS piece, cloud object... >> Veritas cloud storage we call it. Veritas cloud storage, very similar to the access product is for object storage, but again it's not trying to own the entire object bits, if you will, we'll happily be the broker and the asset manager for those objects, classify them and maintain the metadata catalog, because we think it's the metadata around the data that's critical, whether it lives off-prem, on-prem, or in our own appliance. >> You had a nice X/Y graph, dollars on the vertical axis, high frequency of access to the left part of the horizontal axis, lower SLAs to the right, and you had sort of block, file, object as the way to look at the world. Then you talked about the intelligence you bring to the object world. Last question, and then let's end there. Thoughts on object, Stu and I were talking off camera, it's taken a long time, obviously S3 and the cloud guys have been there, you've seen some take outs of object storage companies. But it really hasn't exploded, but it feels like we're on the cusp. What's your observation about object? >> I think object is absolutely on the cusp. Look, people have put it on the cloud, because traditionally object has been used for keeping deep, and because performance doesn't matter, and the deeper you get, the less expensive it gets. So a cloud provider's great, because they're going to aggrigate capacity across 1,000 or 20,000 or a million customers. They can get as deep as possible, and they can slice it off to you. As a single enterprise, I can never get as deep as a cloud service provider. >> The volume, right? >> But what ends up happening is that more and more workloads are not expecting to hold a connection open to their data source. They're actually looking at packetize, get-put type semantics that you can see in genomic sequencing, you see it in a number of different workloads where that kind of semantic, even in hydoop analytic workloads, where that kind of get-put semantic makes sense, not holding that connection open, and object's perfect for that, but it hasn't traditionally had the performance to be able to do that really well. We think that by providing a high performance object system that also has the intelligence to do that data classification, ties into our data protection products, provides the actionable information and metadata, and also makes it possible to use on-prem infrastructure as well as push to cloud or multicloud, and maintain that single pane of glass for that asset management for the objects is really critical, and again, it's the software that matters, the intelligence we build into it that matters. And I think that the primary workloads in a number of different industries in verticals or in adopting object more and more, and that's going to drive more on premise growth of object. By the way, if you look at the NAS market and the object market, you see the NAS market kind of doing this, and you see the object market kind of doing this, it's left pocket right pocket. >> And that get-put framework is a simplifying factor for organizations so, excellent. David, thank you very much for coming on The Cube. We appreciate it. >> Appreciate it, thanks for having me. >> You're welcome, alright, bringing you the truth from Veritas Visions, this is The Cube. We'll be right back, right after this short break.

Published Date : Sep 20 2017

SUMMARY :

Brought to you by Veritas. David, thanks for coming to The Cube. Actually, let's start with the way you started and the cloud vendors will partner up with us Why is it that as Veritas you can participate Being able to actually aggregate what you have I'm one month into recovery, so thank you very much. And David, one of the things we're trying what do you do with your home when it's not being used, and the machine learning AI, that have to be met. it's primary and secondary data that you're into something that you can actually get value out of. I talked to customers in automotive, for example, if it's how you decide to run your business, Now, on the file system side... Any specifics you can give, kind of a customer use case but the moment you try to go to capability is going to allow you to adapt and tell you are they correctly positioned or not. hyperscales, the block, access is the scale out, and the asset manager for those objects, lower SLAs to the right, and you had sort of and the deeper you get, the less expensive it gets. and the object market, you see the NAS market David, thank you very much for coming on The Cube. You're welcome, alright, bringing you the truth

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
David	PERSON	0.99+
David Noy	PERSON	0.99+
Stuart Miniman	PERSON	0.99+
Mike Palmer	PERSON	0.99+
Mike	PERSON	0.99+
five years	QUANTITY	0.99+
Veritas	ORGANIZATION	0.99+
2016	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
Vertias	ORGANIZATION	0.99+
Angara	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
12%	QUANTITY	0.99+
two percent	QUANTITY	0.99+
90 days	QUANTITY	0.99+
one	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
20,000	QUANTITY	0.99+
Stu	PERSON	0.99+
one month	QUANTITY	0.99+
single	QUANTITY	0.99+
two	QUANTITY	0.99+
Linux	TITLE	0.98+
Veritas Visions	ORGANIZATION	0.98+
1,000	QUANTITY	0.98+
One	QUANTITY	0.98+
Vertias Vision	ORGANIZATION	0.98+
two things	QUANTITY	0.98+
both	QUANTITY	0.98+
three	QUANTITY	0.98+
The Walking Dead	TITLE	0.97+
this week	DATE	0.96+
one file	QUANTITY	0.96+
Veritas	TITLE	0.95+
Walking Dead	TITLE	0.95+
hundreds,	QUANTITY	0.95+
today	DATE	0.94+
GDPR	TITLE	0.94+
single enterprise	QUANTITY	0.93+
a million customers	QUANTITY	0.92+
S3	TITLE	0.91+
2017	DATE	0.91+
Number two	QUANTITY	0.88+
The Cube	ORGANIZATION	0.88+
Air BnB	ORGANIZATION	0.85+
Solaris	TITLE	0.84+
Veritas Vision 2017	EVENT	0.84+
four	QUANTITY	0.83+
AIX	ORGANIZATION	0.81+
second half of 2016	DATE	0.8+
years	QUANTITY	0.79+
couple things	QUANTITY	0.78+
one file service	QUANTITY	0.78+
one thing	QUANTITY	0.77+
tons and tons of unstructured data	QUANTITY	0.77+
dice.com	OTHER	0.76+
customers	QUANTITY	0.75+
information	QUANTITY	0.71+
single pane	QUANTITY	0.7+
50/50	QUANTITY	0.7+
thousands	QUANTITY	0.68+
VtasVision	ORGANIZATION	0.62+
Vision	ORGANIZATION	0.58+
tons	QUANTITY	0.56+
ten	DATE	0.54+
Isolon	ORGANIZATION	0.5+

Day One Kickoff | VMworld 2017

>> Announcer: Live from Las Vegas, it's theCUBE. Covering VMworld 2017. Brought to by VMware and its ecosystem partners. (upbeat techno music) >> Okay, we're live here at VMworld 2017's theCUBE's coverage of VMworld 2017. I'm John Furrier. My hosts, Dave Vellante and Stu Miniman. We've got two sets kicking off live here in Las Vegas for our eighth year of coverage. Boomy, we're in the broadcast booth at the Mandalay Bay. Guys, we're here to kick off the show. Three days of wall-to-wall coverage. Three days of great keynotes. Today, big surprise, Andy Jassy, the CEO of Amazon Web Services joined Pat Gelsinger on stage in a surprise announcement together, hugging each other before they talked and even after they talked. This partnership is going to be big. We're going to have coverage, in-depth analysis of that. Dave, VMWorld is now the cloud show with re:Invent. If you look at what's going on, Stu, you've been to many, many shows. This is our eighth year. This was the show. Great community. Now re:Invent has been called the new VMWorld. You put 'em both together, it's really the only cloud show that matters. Google does not have yet a presence. Microsoft has all these shows that are kind of spread all over the place. All the top people are here in IT and cloud at VMWorld and at re:Invent coming up in December. >> Well, John, eight years ago we talked about is this the last stop for IT before cloud just decimates it? And if you go back two years ago, VMware was not in favor. The stock was half of what it is today. Licensed revenue was down 1%. Fast forward to today, it's growing at 10 to 12% a year. Licenses up 13%. It's throwing off operating cash flow at $3 Billion a year. The market's booming. Wall Street's talking VMware now being and undervalued stock. The big question is, is this a fundamental shift in customer mindsets? In other words, are they saying, "Hey, we want to bring the cloud operating model to the business and not try to force our business into the cloud." Or, is this the last gap of onprem. >> Stu, I want to get your thoughts cause I want, squinting through the announcements and all the hype and all the posturing from the vendors is I was looking for, where's hybrid in all this? Where's the growth? And, my validation point on the keynote was when we heard very few words hybrid. Private, on premise was the focus. You guys put out at Wikibon a report called True Private Cloud, Market Sizing. Kind of lay out, that's where the growth is. But, I tweeted private cloud is the gateway drug to hybrid. We're seeing customers now wanting to do hybrid, but they got to do their homework first. They got to do the building blocks on premise, and that is what your calling True Private Cloud. Do you agree? And your thoughts. >> Yeah, so, really good points, John. And the nuance here, 'cause if I'm VMware, I've got a great position in the data center. 500,000 customers. Absolutely, the growth is the move from legacy to True Private Cloud. The challenge for VMware is they already have 500,000 customers there. Those are the customers that are making that shift. So it does not increase vSphere. One of the key things for me, is Pat said, "What vSphere had done for the last 20 years, is what NSX is going to do for the next 10 years, or more." Because they're betting on networking, security, some of these multi-cloud services that they announced. How do those expand VMware so that as True Private Cloud grows and they also do public cloud, VMware has a bigger seat at the table, not just saying "Wait, my customers are shifting. Where are they going?" >> Dave, I want to get your thoughts. You and I talk about all the time on camera, and also privately, about waves. We've been through many waves in the industry. We've seen a lot of waves. Pat Gelsinger has seen many waves, too. Let's talk about Pat Gelsinger because, interesting little tidbits inside the stage area. One, he said "I want to thank you for being the CEO of this company." Stu, you made a comment that this is the first VMWorld where there's not a rumor that Pat's not going to be the CEO. He's kind of kickin' ass and takin' names right now. Stock's up and he put the wave slide out there. And wave slides to me, you can tell the senior management's kind of mojo by how well laid out the wave slide is. He put up a slide on one side. Mainframe mini computer cloud. And the other side client server, internet, IoT Edge. He nailed it, I think. Pat Gelsinger is going to go down as being one of the most brilliant stroke of genius by looking at either laying down what looked like a data center position, and some say capitulate, to Jassy, who's smiling up there saying, "Bring those customers to Amazon." But this is a real partnership. So, Pat Gelsinger, go big or go home. You can't be any bigger, bold bet that Pat Gelsinger right now with VMware, and it looks like it's paying out. What's your thoughts on Pat Gelsinger, the wave and his bold bet? >> Well, I think that businesses are configuring the cloud, John, to the realities of the data. And the data, most of the data, is on prem. So the big question I have it, how is Amazon going to respond to this? And Stu, you and Furrier have had debates over the years. Furrier has said flat out, Amazon is going to do a True Private Cloud, just like Azure Stack. You have said, no. But if Amazon doesn't do that, I think that Pat Gelsinger's going to look like a genius. If they do do that, it's going to become an increasingly more competitive relationship than it is right now. >> Yeah, just a little bit of the inside baseball. Kudos to VMware for getting this VMware on AW out. I hear it was a sprint to the finish because taking cloud foundation, which is kind of a big piece. It's got the VSAN, the NSX, all that stuff, and putting it in a virtual private data center. Amazon owns the data center. They give them servers. This was a heavy lift. NSX, some of the pieces are still kind of early, but getting this out the door, limited availability. It's one data center. They're going to roll out services, but to Dave's point, right, where does this go down the road? Is this Amazon sticking a straw into 500,000 data centers and saying, "Come on in. You know that we've got great services, and this is awesome." 'Cause, I don't see Amazon re-writing their linux stuff to be all native VMware, So, where will this partnership mature? Andy said, "We're going to listen to our customers." "We're going to do what you're asking us." And absolutely today VMware and Amazon, two of these strongest players in the ecosystem today, they're going to listen to their customers. Google, Oracle, IBM, Microsoft, all in the wings fighting for these customers, so it's battle royale. >> You know the straw is in there, John, what's your take, and where do the developers fit in this? >> Well Stu wrote a good point, inside baseball, the key is that success with Amazon was critical. Jassy said basically, this is not a Barney deal, which he kind of modernized by saying most deals are optical really hitting at Microsoft on this one and Google. I mean, they're groping for relevance. It's clear that they're way behind. Everyone's trying to follow these guys. But, on the heels of Vcloud Air, it was critical that they get stake in the ground with Amazon. They took a lot of heat for the Vcloud Air, Stu. This had to get done. Now, my take on this is that, I think it's a genius move. I think Pat Gelsinger, by betting the ranch on Amazon, will go down in history as being a great move. You heard that here, 2017. He's so smart, he wants to be a component of the Amazon takeover, which will happen. It'll be a two-cloud game, maybe three, maybe four, we'll see, but mainly two. But the ecosystem partners on this phase one is key. DXT, Deloitte, Accenture, Capgemini, and then you start to see the logos coming in. They have so many logos, you have to break them down. But more importantly the white space. devops, migration cost, network security and data protection are all filled in with plenty more room for more players. I think this is where the ecosystem was lagging just a few years ago. You saw the shift in the tide. Now you're seeing the ecosystem going, "Wow, I get what VMware's doing. I'm doubling down." It's an Amazon Web Services, VMWare world. All the other cloud players, in my opinion, are really fumbling the ball. >> So, I can infer from that, you see this as a balanced partnerhip i.e. that's not like one needs more than the other. I mean, clearly, Amazon needs VMWARE to reach those 500,000 customers, and clearly, VMware needs a cloud strategy because Vcloud Air and many other attempts have failed. Yes, we said that. It's failed, we asked Pat about that. So, you see it as a more balanced partnership. Do you see that balance of power shifting over time a that straw gets bigger and bigger and bigger. >> Well the Walking Dead or as the Game of Thrones reference going on is kind of the Gray War is happening in cloud. And it really is going to become Amazon versus whoever they can partner with, and the rest of the legacy world. I think the wave slide was impressive to me because this is such a shift from just distributed computing now decentralized with blockchain and AI looming as massive disrupts, I think this is only going to get more decentralized. So whoever has tech that's legacy, will ultimately be toast. And I think Gelsinger's smart to see that wave, and I'm starting to see the movement. It's super early, so, no big bets. It's just be directionally correct and ride that wave. >> Yeah, so, one of the things that got me is last year, it kind of went under the radar that VMware is starting to launch some cloud services, and were very direct, today, that they said there are seven, basically SaaS offerings. It's security, it's cost management. Now, VMWare on AWS, little expensive. We're starting to get the data on how much it is per month or per year or for three-year. But going to have the SaaS offerings. We know Vcloud Air failed, also Paul Moritz had played the Microsoft game. We're going to get this suite of applications. We're going to give you email. We're going to give you, you know, social. We're going to give you all these things. They're all gone. Kind of cleared the table of all those. Now they've got these SaaS applications, so how will that play. I kind of like Pat, very up front on security, and said, "As an industry, we have failed you." Dave, you've been looking at this for a long time. It's a board-level discussion. It's a do-over for security. Does VMware have the chops to play in this space, Dave? Do you buy them as a, you know, valid SaaS provider? >> Well, two questions there. One is in the security front that great tech is always going to get beat by bad user behavior. So this is a board-level issue. As far as SaaS, to me, it's a business model issue that VMware is migrating its business to a routable business model, which is smart. I don't see it as SaaS as an applications, but I see it as a monthly fee. Better to get ahead of it now, while you're hot, than get crushed by Wall Street as you're trying to make that transition like many other companies have failed to do. >> Guys, one thing I want to note is that VMware also laid out their strategy. You kind of heard it there even though that Jassy came on stage. A look it, Jassy's not an idiot, he's smart. He knows what's going on. He knows that he has to win VMware over because VMware ... he's got to balance it. Got 'em in the back pocket on one hand, got a great relationship, Stu, 500,000 customers. Remember, VMware is also an arms dealer. They got the ops, IT operations locked down with their customers. So they have other clouds they can go to. SO, the big trend that we didn't hear, that's out there kind of hiding in plain site is multi-cloud. Multi-cloud is ultimately VMware's strategy. He laid out, one, make private cloud easy. You guys reported on that. Two, deep partnerships with major cloud providers. And three, expand the ecosystem. >> John, so I mean a little bit of kind of rumors I heard. They were actually looking to make the partnership not with AWS at first, it was going to be Google. And Michael Dell said, "If we're going to start with a cloud deal, it's going to be Amazon." The right move, absolutely, that's where it's going to be. But you remember last year, we were here. John, you and I, the announcement was with IBM. Now, no offense to SoftLayer, great acquisition. It's doing well, but IBM does not play at the level of an Amazon. They might have the revenue of a Google in cloud, but, you know, very different positioning. They were up on stage talking about security today. Great position there with analytics. But, we'll see, there's two more days of keynotes. I expect we'll see another cloud provider making some announcements with VMware. And VMware absolutely an arms dealer. They put out on the slide all of their service providers. We've got people like CenturyLink and OVH and Rackspace on theCUBE this week, as well as how their going to play with the Microsoft Google. You've got Michael Dell on tomorrow. I know you're going to talk to him about how Dell fits with Azure Stack, and how the Dell whole family is going to play across all of these because at the end of the day, Michael Dell, and Pat working for him, they want to keep getting revenue no matter who's the winner out there. >> Okay final question as we wrap up the segment. Customers are that watching here, it's clear to me that, we even heard from one on stage, saying, "Well, we're taking baby steps." That wasn't her exact words, but, their going slow to hybrid cloud. All the actions on private as you guys pointed out in our True Private Cloud report on Wikibon.com If you haven't seen it, go check it out, it's going viral. But, this is classic slowness of most enterprise customers. When there's doubt, they slow down. And, one of the things that concerns me, Stu, about the cloud guys right now, whether it's AWS, Google, and Microsoft, is the market's moving so fast, that if these clouds aren't dead simple easy to use, the customers aren't going to go to hybrid. They're going to go back to their comfort zone, which is the true private cloud, going to build that base. It's just got to get easier to manage. It's got to get easier to multi-cloud. And the bottom line is that Amazon's clearly in the lead. So, Jassy has a window right now to run the table on enterprise. He's got about 18-24 months, but Google's putting the pedal to the metal. I mean they're pedaling as fast as they can. Microsoft's cobbling together their legacy, okay, running as fast as they can. But there's this economies of scale, Stu, for them. Your thoughts and reactions. >> Yeah, so, I always thought enterprise simplicity is actually an oxymoron, does not exist. This VMware community, one of the things people loved about it, they were builders. They were all like get in there, and I tweak that. Harvard Business School calls it the Ikea effect. If I help build it just enough, I actually love it a little bit more. VMware's not simple. NSX, hitting about a billion dollars when you get into it is not easy. Security and networking are never going to be you know dirt simple. Amazon, we thought it was real simple, now thousands of services. Absolutely, we've been at that ecosystem for many years. It gets tougher and tougher the more you get into it. And, John, some of the builders there, the developers there, they get in. There's lot of room for this ecosystem to build around that. Because one of the things we talk about as VMware goes to some of these clouds. Where do they get that ecosystem? You mentioned some of the systems integrators, but the rest of the channel, where can they make money? And trying to help, because it's not simple, how do they help get opinionated, make those choices, build it all together. There's professional services dollars there. There's ways to help consult with companies there. >> Ecosystem is the key point. Watch the ecosystem and how that's forming around cloud, hybrid cloud, true private cloud, whatever you want to call it. And then, again, the technology's maturing. It's all about the people and the process to actually affect so called cloud, hybrid cloud, bringing the cloud model to the data, not forcing your business into cloud. >> We got to wrap up here. We've also got Lisa Martin and Keith Townsend and John Troyer, and we got some community guests as well, joining like we did last year. So this will be great. But I want to put something out there, guys, so we can hit up tomorrow and tease it out. I worry about when you have these fast waves that are coming through and the velocity is phenomenal right now. Is that, what tends to crumble, Dave, to your ecosystem point, are these foundations. When you have these industry consortiums, it's kind of like it's political. They've got boards and multiple fingers in it. That could be the suffering point, in my opinion. And that points directly at Cloud Foundry. Cloud Foundry, OpenStack, some of these consortium groups are at risk, in my opinion, if it goes too fast. Stu, to your point. Kubernetes has got great traction. You've got Containers. Dockers got a new CEO. Uber's got a new CEO. I mean the world is moving so fast. So, rhetorical question, industry consortiums. Do they suffer, or do they win in this environment? >> Depends on what they're doing, right? If they're low-level technical standards that advance the industry, I think they do win. I think if it's posturing, and co-opetition, and trying to cut off the one vendor at the knees, it loses. >> Stu, real quick, consortiums. Win or lose in this environment? >> Yeah, we've seen some that have done quite well, and some that have been horrific. So, absolutely, if it gets way too political. Open source has done some really good things, but the foundations, once they get in there, it's challenging and, I'd say, more times than not, they don't help. >> Well, we're in theCUBE. We're breaking it down. We're going to be squinting through all the announcements looking at where the meat on the bone is, where the action is and the relevance and the impact to enterprises and emerging tech. This is theCUBE. I'm John Furrier with Stu Miniman and Dave Vellante. We're back with more live coverage. Day one, after this short break. (techy music)

Published Date : Aug 28 2017

SUMMARY :

Brought to by VMware and its ecosystem partners. Dave, VMWorld is now the cloud show with re:Invent. our business into the cloud." and all the posturing from the vendors is I've got a great position in the data center. You and I talk about all the time on camera, the cloud, John, to the realities of the data. It's got the VSAN, the NSX, all that stuff, But the ecosystem partners on this phase one is key. I mean, clearly, Amazon needs VMWARE to reach I think this is only going to get more decentralized. Does VMware have the chops to play in this space, Dave? One is in the security front that great tech They got the ops, IT operations locked down and how the Dell whole family putting the pedal to the metal. This VMware community, one of the things bringing the cloud model to the data, I mean the world is moving so fast. that advance the industry, I think they do win. Win or lose in this environment? but the foundations, once they get in there, and the impact to enterprises and emerging tech.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Pat	PERSON	0.99+
Andy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Moritz	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Michael Dell	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Jassy	PERSON	0.99+
Uber	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
DXT	ORGANIZATION	0.99+
Capgemini	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
OpenStack	ORGANIZATION	0.99+
10	QUANTITY	0.99+
seven	QUANTITY	0.99+
NSX	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
2017	DATE	0.99+
December	DATE	0.99+
two	QUANTITY	0.99+
last year	DATE	0.99+
500,000 customers	QUANTITY	0.99+
three	QUANTITY	0.99+

Shaun Walsh, QLogic - #VMworld 2015 - #theCUBE

San Francisco extracting the signal from the noise it's the cube covering vmworld 2015 brought to you by VM world and its ecosystem sponsors now your host Stu miniman and Brian Grace Lee welcome back this is the cube SiliconANGLE TVs live production of vmworld 2015 here in moscone north san francisco happy to have back on this segment we're actually gonna dig into some of the networking pieces Brian Grace Lee and myself here hosting it Sean Walsh repeat cube guest you know in a new role though so Sean welcome back here now the general manager of the ethernet business at qlogic thanks for joining us thank you thanks for having me alright so I mean Sean you know we're joking before we start here I mean you and I go back about 15 years I do you know those that know the adapter business I mean you know Jay and I've LJ core business on you've worked for qlogic before you did a stint in ml accent and you're now back to qlogic so why don't we start off with that you know what brought you back to qlogic what do you see is the opportunity there sure um I'll tell you more than anything else what brought me back was this 25 gig transition it's very rare and I call it the Holy trifecta of opportunity so you've got a market transition you actually have a chip ready for the market at the right time and the number one incumbent which is Intel doesn't have a product I mean not that they're late they just don't have a product and that's the type of stuff that great companies are built out of are those unique opportunities in the market and you know more than anything else that's when brought me back to qlogic alright so before we dig into some of the ethernet and hyperscale piece you know what what's the state of fibre channel Sean you know what we said is in those fiber channel the walking dead is it a cash cow that you know qlogic be a bit of milk and brocade and the others in the fibre channel business for a number years you know what's your real impression of fibre channel did that yeah so you know look fibre channel is mature there's no question about it is that the walking dead no not by any stretch and if it is the walking dead man it produces a lot of cash so I'll take that any day of the year right The Walking Dead's a real popular show so fibre channel you know it's still it's still gonna be used in a lot of environments but you know jokingly the way that I describe it to people is I look at fibre channel now is the Swiss bank of networks so a lot of web giant's by our fiber channel cards and people will look at me and go why do they do that because for all the hype of open compute and all the hype of the front end processors and all the things that are happening when you click on something where there's money involved that's on back end Oracle stuff and it's recorded on fibre channel and if there's money involved it's on fibre and as long as there's money in the enterprise or in the cloud I'm reasonably certain fibre channel will be around yeah it's a funny story I remember two years ago I think we were at Amazon's reinvent show and Andy Jesse's on stage and somebody asked you know well how much of Amazon is running amazoncom is running on AWS and its most of it and we all joke that somewhere in the back corner running the financials is you know a storage area network with the traditional array you know probably atandt touched by fibre channel absolutely i mean we just did a roll out with one of the web giants and there were six different locations each of the each of the pods for the service for about 5,000 servers and you know as you would expect about 3,000 on the front access servers there's about 500 for pop cash that was about 15 maybe twelve thirteen hundred for the for the big data and content distribution and all those other things the last 500 servers look just like the enterprise dual 10 gigs dual fibre channel cards and you know I don't see that changing anytime soon all right so let's talk a bit a little bit 25 gig Ethernet had an interview yesterday with mellanox actually who you know have some strong claims about their market leadership in the you know greater than 10 gig space so where are we with kind of the standards the adoption in queue logical position and 25 gig Ethernet sure so you know obviously like everyone in this business we all know each other yeah and when you look at the post 10 gig market okay 40 gigs been the dominant technology and I will tip my hat to mellanox they've done well in that space now we're both at the same spot so we have exactly the same opportunity in front of us we're early to market on the 25 we have race to get there and what we're seeing is the 10 gig market is going to 25 pretty straightforward because I like the single cable plant versus the quad cable plant the people that are at 40 aren't going to 50 they're going to transition straight to 100 we're seeing 50 more as a blade architecture midplane sort of solution and that's where at right now and I can tell you that we have multiple design win opportunities that we're in the midst of and we are slugging it out with these guys everything and it will be an absolute knife fight between us and mellanox to see who comes out number one in this market obviously we both think we're going to win but at the end of the day I've placed my bet and I expect to win all right so Sean can you lay out for us you know where are those battles so traditionally the network adapter it was an OEM type solution right I got it into the traditional server guys yeah and then it was getting the brand recognition for the enterprise customers and pushing that through how much is that traditional kind of OEM is it changing what's having service providers and those hyperscale web giants yes so there's there's three fundamental things when you look at 25 gig you gotta deal with so first off the enterprise is going to be much later because they need the I Triple E version that has backwards auto-negotiation so you know that's definitely a 17 18 pearly transition type thing the play right now is in the cloud and the service provider market where they're rolling out specific services and they're not as concerned about the backwards compatibility so that's where we're seeing the strength of this so they're all the names that you would expect and I have to say one of the interesting things about working with these guys is there n das or even nastier than our Liam India is they do not want you talking about them but it is very much that market where it's a non traditional enterprise type of solution for the next 12-18 months and then as we roll into that next gen around the pearly architecture where we all have full auto-negotiation that's where you're going to see the enterprise start to kick in yeah what what what are the types of applications that are driving this this next bump in speed what is it is it video is it sort of east and west types of application traffic is a big data what's what's driving this next bump so a couple of things you would expect which would be the you know certainly hadoop mapreduce you know those sorts of things are going there the beginning of migration to spark where they're doing real-time analytics versus post or processing batch type stuff so there they really care about it and this is where our DMA is also becoming very very popular in it the next area that most people probably don't think of is the telco in a vspace is the volume as these guys are doing their double move and there going from a TCA type platforms running mostly one in ten they're going to leave right to 25 and for them the big thing is the ability to partition the network and do that virtualization and be able to run deep edk in one set of partitions standard storage another set of partitions in classic IP on the third among the among the few folks that you know you would expect in that are the big content distribution guys so one of the companies that I can mention is Netflix so they've already been out at their at 40 right now and you know they're not waiting for 50 they're going to make another leap that goes forward and they've been pretty public about those types of statements if you look at some of the things that they talked about at NDF or IDF and they're wanting to have nvme and direct gas connection over i serve that's driving 100 gig stuff we did a demo at a flash memory summit with Samsung where we had a little over 3 million I ops coming off of it and again it's not the wrong number that matters but it's that ability to scale and deal with that many concurrent sessions that are driving it so those are the early applications and I don't think the applications will be a surprise because they're all the ones that have moved to 40 you know the 10 wasn't enough 40 might be too much they're going to 25 and for a lot of the others and its really the pop cash side that's driving the hunter gig stuff because you know when that Super Bowl ad goes you got to be able to take all that bandwidth it once yeah so Sean you brought up nvme maybe can you discuss a little bit you know what are the you know nvm me and some of these next-generation architectures and what's the importance to the user sure so nvme is basically a connection capability that used to run for hard drives then as intel moved into SSDs they added this so you had very very high performance low latency pci express like performance what a number of us in this business are starting to do is then say hey look instead of using SAS which is kind of running out of gas at 12 gig let's move to nvme and make it a fabric and encapsulate it so there's three dynamics that help that one is the advent of 25 50 100 the second is the use of RDMA to get the latency that you want and then the third is encapsulation I sir or the ice cozy with RDMA together and it's sort of that trifecta of things that are giving very very high performance scale out on the back end and again this is for the absolute fastest applications where they want the lowest latency there was an interesting survey that was done by a university of arizona on latency and it said that if two people are talking and if you pause for more than a quarter of a second that's when people change their body language they lean forward they tilt their head they do whatever and that's kind of the tolerance factor for latency on these things and again one of the one of the statements that that Facebook made publicly at their recent forum was that they will spend a hundred million dollars to save a millisecond because that's the type of investment that drives their revenue screen the faster they get clicks the faster they generate revenue so when you think of high frequency trading when you think of all those things that are time-sensitive the human factor and that are going to drive this all right so storage the interaction with networking is you know critically important especially to show like this at vmworld I mean John you and I talked for years is it wasn't necessarily you know fibre channel versus the ethernet now it's changing operational models if I go use Salesforce I don't think about my network anymore I felt sort of happen to used Ethernet it's I don't really care um hyper convergence um when somebody buys hyper convergence you know they just kind of the network comes with it when I buy a lot of these solutions my networking decision is made for me and I haven't thought about it so you know what's that trend that you're seeing so the for us the biggest trend is that it's a shifting customer base so people like new tonics and these guys are becoming the drivers of what we do and the OEMs are becoming much more distribution vehicles for these sorts of things than they are the creators of this content so when we look at how we write and how we build these things there's far more multi-threading in terms of them there's far more partitions in terms of the environment because we never know when we get plugged into it what that is going to be so incorporating our l2 and our RDMA into one set of engine so that you always have that hyper for it's on tap on demand and you know without getting down into the minutia of the implementation it is a fundamental shift in how we look at our driver architectures you know looking at arm based solutions and micro servers versus just x86 as you roll the film forward and it also means that as we look at our architectures they have to become much smaller and much lighter so some of the things that we traditionally would have done in an offload environment we may do more in firmware on the side and I think the other big trend that is going to drive that is this move towards FPGAs and some of the other things that are out there essentially acting as coprocessors from you you mentioned earlier Open Compute open compute platform those those foundations and what's going on what is what what's really going on there i think a lot of us see the headlines sometimes you think about it you go okay this is an opportunity for lots of engineering to contribute to things but what's the reality that you're dealing with the web scale folks sure if they seem like the first immediate types of companies that would buy into this or use it what's the reality of what's going on with that space well obviously inside the the i will say the web scale cloud giant space you know i think right now if you look at it you've got sort of the big 10 baidu Tencent obama at amazon web as your microsoft being those guys and then you know they are definitely building and designing their own stuff there's another tier below that where you have the ebays the Twitter's the the other sorts of folks that are in there and you know they're just now starting that migration if you look at the enterprise not a big surprise the financial guys are leading this we've seen public statements from JPM and other folks that have been at these events so you know I view it very much like the blade server migration I think it's going to be twenty twenty-five percent of the overall market whether we whether people like to admit it or not good old rack and stack is going to be around for a very long time and you know they're there are applications where it makes a lot of sense when you're deploying prop private cloud in the managed service provider market we're starting to see a move into that but you know if you say you know what's the ten year life cycle of an architect sure i would say that in the cloud were probably four or five years into it and the enterprise were maybe one or two years into it all right so what about the whole sdn discussion Sean you know how much does qlogic play into that what are you seeing in general and you know we're at vmworld so what about nsx you know is that part of the conversation and what do you hear in the marketplace today yeah it really is part of the conversation and the interesting part is that I think sdn is getting a lot of play because of the capabilities that people want and again you know when you look at the managed service providers wanting to have large scale lower costs that's going to definitely drive it but much like OpenStack and Linux and some of these other things it's not going to be you know the guys going to go download it off the web and put it in production at AT&T you know it's going to be a prepackaged solution it's going to be embedded as part of it if you look at what Red Hat is doing with their OpenStack release we look what mirantis is doing with their OpenStack release again from an enterprise perspective and from a production in the MSP and second tier cloud that's what you're going to see more of so for us Sdn is critical because it allows us to then start to do things that we want to do for high-performance storage it allows us to change the value proposition in terms of if you look at Hadoop one of these we want to be able to do is take the storage engine module and run that on our card with our embedded V switch and our next gen ship so that we can do zero stack copies between nodes to improve latency so it's not just having RDMA is having a smart stack that goes with it and having the SDN capability to go out tell the controller pay no attention this little traffic that's going on over here you know these are not the droids you're looking for and then everything goes along pretty well so it's it's very fundamental and strategic but it's it's a game it's a market in which we're going to participate but it's not one we're going to try and write or do a distribution for okay any other VMware related activities q logics doing announcements this week that you want to share this week I would have to say no you know I think the one other thing that we're strategically working on them on with that you would expect is RDMA capabilities across vMotion visa and those sorts of things we've been one of the leaders in terms of doing genevieve which is the follow-on to VX land for hybrid cloud and that sort of thing and we see that as a key fundamental partnership technology with VMware going forward all right so let's turn back to qlogic for a second so the CEO recently left he DNA that there's a search going on so give us the company update if you will well actually there isn't a search so Jean who is gonna is going to run the ship forward as CEO we've brought in chris king who was on our board as executive chair in person chris has a lot of experience in the chip market and she understands that intimate tie that we have to that intel tick-tock model and really how you run an efficient ship driven organization you know whether we play in the systems in between level you know we're not quite the system but we're not quite the chip and understanding that market is part of what she does and the board has given us the green light to continue to go forward develop what we need to do in terms of the other pieces jean has a strong financial background she was acting CEO for a year between HK and simon aires me after Simon left so she's got the depth she knows the business and for us you know you know it's kind of a non op where everything else is continuing on as you would expect yeah okay last question I have for you Sean I mean the dynamics change for years you know what there was kind of the duopoly Xin the market I mean it was in tellin broadcom oh yeah on the ethernet side it was Emulex and amp qlogic it's a different conversation today I mean you mentioned Intel we talked about mellanox don't you logic you know your old friend I don't lie back on a vago bought broadcom and now they're called broadcom I think so yeah so you know layout for us you know kind of you know where you see that the horses on the track and you know what excites you yeah so again you know if you look at the the 10 gig side of the business clearly intel has the leadership position now we're number two in the market if you look at the shared data that's come out you know the the the Emulex part of a vago has been struggling in losing chair then we have this 25 gig transition that came in the market and that was driven by broadcom and you know for those of us who have followed this business they I think everyone can appreciate the irony of avago of avago buying Emulex and then for all the years we tried to keep him separate bringing them back together was but we-we've chuckled over a few beers on that one but then you've got this 25 gig transition and you know the other thing is that if you look at so let me step back and say the other thing on the 10 gig market is was a very very clear dividing line the enterprise was owned by the broadcom / qlogic emulex side the cloud the channel the the the appliance business was owned by Intel mellanox okay now as we go into this next generation you've got us mellanox and the the original broadcom team coming in with 25 game we've all done something that gets us through this consortium approach we're all going to have a night Ripley approach from there and Intel isn't there you know we haven't seen any announcements or anything specific from Emulex that they've said publicly in that space so right now we kind of view it as a two-horse race we think from a software perspective that our friends at at broadcom com whatever we want to call them or bravado I think is how r CT / first tool that I don't think they have a software depth to run this playbook right now and then we have to do is take our enterprise strength and move those things like load balancing and failover and the SDN tools and end par and all the virtualization capabilities we have we got to move those rapidly into the into the cloud space and go after it for us it means we have to be more open source driven than we have been in the past it means that we have a different street fight for every one of these it represents a change in some of the sales model and how we go to market so you know not to say that we're you know we we've got all of everything wrapped up and perfect in this market but again right time right place and this will be the transition for another you know we think three to five years and there's there's still a lot of interesting things that are happening ironically one of the most interesting things I think it's got to happen in 25 is this use of the of the new little profile connectors I think that will do more to help the adoption of 25 gig in Hunter gig where you can use the RCX or r XC connector there's our cxr see I forgot the acronym but it kind of looks like the firewire HDMI connectors that you have on your laptop's now and now imagine that you can have a car that has that connector in a form factor that's you know maybe a half inch square and now you've got incredible port density and you can dynamically change between 25 50 and 100 on the fly well let Sean Sean you know we've always talked there's a lot of complexity that goes in under the covers and it's the interest who's got a good job of making that simple and consumable right and help tried those new textures go forward all right Sean thank you so much for joining us we'll be right back with lots more coverage including some more networking in-depth conversation thank you for watching thanks for having me

Published Date : Sep 2 2015

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Jean	PERSON	0.99+
Shaun Walsh	PERSON	0.99+
Brian Grace Lee	PERSON	0.99+
one	QUANTITY	0.99+
Sean	PERSON	0.99+
three	QUANTITY	0.99+
four	QUANTITY	0.99+
Jay	PERSON	0.99+
Andy Jesse	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
10 gig	QUANTITY	0.99+
Emulex	ORGANIZATION	0.99+
ten year	QUANTITY	0.99+
25 gig	QUANTITY	0.99+
Brian Grace Lee	PERSON	0.99+
qlogic	ORGANIZATION	0.99+
Samsung	ORGANIZATION	0.99+
25 gig	QUANTITY	0.99+
100 gig	QUANTITY	0.99+
10 gig	QUANTITY	0.99+
Simon	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
yesterday	DATE	0.99+
two years	QUANTITY	0.99+
two people	QUANTITY	0.99+
12 gig	QUANTITY	0.99+
mellanox	ORGANIZATION	0.99+
50	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
chris	PERSON	0.99+
amazon	ORGANIZATION	0.99+
microsoft	ORGANIZATION	0.99+
twelve thirteen hundred	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
amazoncom	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
40 gigs	QUANTITY	0.99+
Stu miniman	PERSON	0.99+
about 5,000 servers	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
40	QUANTITY	0.99+
100	QUANTITY	0.99+
10 gig	QUANTITY	0.99+
two years ago	DATE	0.98+
Linux	TITLE	0.98+
twenty twenty-five percent	QUANTITY	0.98+
about 3,000	QUANTITY	0.98+
25	QUANTITY	0.98+
Super Bowl	EVENT	0.98+
jean	PERSON	0.98+
OpenStack	TITLE	0.98+
third	QUANTITY	0.98+
today	DATE	0.98+
moscone	LOCATION	0.98+
San Francisco	LOCATION	0.98+
first	QUANTITY	0.97+
Red Hat	ORGANIZATION	0.97+
this week	DATE	0.97+
over 3 million	QUANTITY	0.97+
The Walking Dead	TITLE	0.96+
Intel	ORGANIZATION	0.96+
telco	ORGANIZATION	0.96+
John	PERSON	0.96+
about 500	QUANTITY	0.96+
greater than 10 gig	QUANTITY	0.96+
three fundamental things	QUANTITY	0.96+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Walking Dead: