Jerome Lecat and Chris Tinker | CUBE Conversation 2021

>>and welcome to this cube conversation. I'm john for a host of the queue here in Palo alto California. We've got two great remote guests to talk about, some big news hitting with scalability and Hewlett Packard enterprise drill, MCAT ceo of sexuality and chris Tinker, distinguished technologist from H P E. Hewlett Packard enterprise U room chris, Great to see you both. Cube alumni's from an original gangster days. As we say Back then when we started almost 11 years ago. Great to see you both. >>It's great to be back. >>So let's see. So >>really compelling news around kind of this next generation storage, cloud native solution. Okay. It's a, it's really kind of an impact on the next gen. I call, next gen devops meets application, modern application world and some, we've been covering heavily, there's some big news here around sexuality and HP offering a pretty amazing product. You guys introduced essentially the next gen piece of it are pesca, we'll get into in a second. But this is a game changing announcement you guys announces an evolution continuing I think it's more of a revolution but I think you know storage is kind of abstraction layer of evolution to this app centric world. So talk about this environment we're in and we'll get to the announcement which is object store for modern workloads but this whole shift is happening jerome, this is a game changer to storage, customers are gonna be deploying workloads. >>Yeah skeleton. Really I mean I personally really started working on Skele T more than 10 years ago 15 now And if we think about it I mean cloud has really revolutionized IT. and within the cloud we really see layers and layers of technology. I mean we all started around 2006 with Amazon and Google and finding ways to do initially we was consumer it at very large scale, very low incredible reliability and then slowly it creeped into the enterprise and at the very beginning I would say that everyone was kind of wizards trying things and and really coupling technologies together uh and to some degree we were some of the first wizard doing this But we're now close to 15 years later and there's a lot of knowledge and a lot of experience, a lot of schools and this is really a new generation, I'll call it cloud native, you can call it next year and whatever, but there is now enough experience in the world, both at the development level and at the infrastructure level to deliver truly distributed automate systems that run on industry standard service. Obviously good quality server deliver a better service than the service. But there is now enough knowledge for this to truly go at scale and call this cloud or call this cloud native. Really the core concept here is to deliver scalable I. T at very low cost, very high level of reliability. All based on software. We've we've been participated in this solution but we feel that now the draft of what's coming is at the new level and it was time for us to think, develop and launch a new product that specifically adapted to that. And chris I will let you comment on this because customers or some of them you can add a custom of you to that. >>Well, you know, you're right. You know, I've been in there have been like you have been in this industry for uh, well a long time, a little longer to 20, years. This HPV and engineering and look at the actual landscape has changed with how we're doing scale out, suffered to find storage for particular workloads and were a catalyst has evolved. Here is an analytic normally what was only done in the three letter acronyms and massively scale out politics name, space, file systems, parallel file systems. The application space has encroached into the enterprise world where the enterprise world needed a way to actually take a look at how to help simplify the operations. How do I actually be able to bring about an application that can run in the public cloud or on premise or hybrid. Be able to actually look at a workload off my stat that aligns the actual cost to the actual analytics that I'm going to be doing the work load that I'm going to be doing and be able to bridge those gaps and be able to spin this up and simplify operations. And you know, and if you if you are familiar with these parallel fossils, which by the way we we actually have on our truck. I do engineer those. But they are they are they are they have their own unique challenges. But in the world of enterprise where customers are looking to simplify operations, then take advantage of new application, analytic workloads, whether it be sparred may so whatever it might be right. If I want to spend the Mongol BB or maybe maybe a last a search capability, how do I actually take those technologies embrace a modern scale out storage stack that without without breaking the bank but also provide a simple operations. And that's that's why we look for object storage capabilities because it brings us this massive parallelization. Thank you. >>Well, before we get into the product, I want to just touch on one thing from you mentioned and chris you, you brought up the devoPS piece, next gen, next level, whatever term you use it is cloud Native. Cloud Native has proven that deVOPS infrastructure as code is not only legit being operationalized in all enterprises, add security in there. You have def sec ops this is the reality and hybrid cloud in particular has been pretty much the consensus. Is that standard. So or de facto saying whatever you want to call it, that's happening. Multi cloud on the horizon. So these new workloads have these new architectural changes, cloud on premises and edge, this is the number one story and the number one challenge, all enterprises are now working on how do I build the architecture for the cloud on premises and edge. This is forcing the deVOPS team to flex and build new apps. Can you guys talk about that particular trend and is and is that relevant here? >>Yeah, I, I not talk about uh really storage anywhere and cloud anywhere. And and really the key concept is edged to go to cloud. I mean we all understand now that the Edge will host a lot of data and the edges many different things. I mean it's obviously a smartphone, whatever that is, but it's also factories, it's also production, it's also, you know, moving uh moving machinery, trains, playing satellites, um that that's all the Edge cars obviously uh and a lot of that, I will be both produced and processed there. But from the Edge you will want to be able to send that uh for analysis for backup for logging to a court. And that core could be regional maybe not, you know, one call for the whole planet, but maybe one corporate region uh state in the US. Uh and then from there, you will also want to push some of the data to probably cloud. Uh One of the things that we see more and more is that the the our data center, the disaster recovery is not another physical data center, it's actually the cloud and that's a very efficient infrastructure, very cost efficient. Especially so really it's changing the padding on how you think about storage because you really need to integrate these three layers in a consistent approach, especially around the topic of security because you want the data to be secure all along the way and the data is not just data data and who can access the data, can modify the data. What are the conditions that allow modification or automatically ratios that are in some cases it's super important that data be automatically raised 10 years and all this needs to be transported fromage Co two cloud. So that that's one of the aspects, another aspect that resonates for me with what you said is a word you didn't say but it's actually crucial this whole revolution. It's kubernetes mean Cuban it isn't now a mature technology and it's just, you know, the next level of automaticity operation for distributed system Which we didn't have five or 10 years ago and that is so powerful that it's going to allow application developers to develop much faster system that can be distributed again edge to go to crowd because it's going to be an underlying technology that spans the three layers >>chris your thoughts. Hybrid cloud, I've been, I've been having conscious with the HP folks for got years and years on hybrid clouds now here. >>Well, you know, and it's exciting in a layout, right? So if you look at like a whether it be enterprise virtualization that is a scale out gender purpose fertilization workload. Whether the analytic workloads, whether we know data protection is a paramount to all of this orchestration is paramount. Uh if you look at that depth laptops absolutely you mean securing the actual data. The digital last set is absolutely paramount. And if you look at how we do this, look at the investments we're making we're making. And if you look at the collaborative platform development which goes to our partnership with reality it is we're providing them an integral aspect of everything we do. Whether we're bringing as moral which is our suffer be used orchestration. Look at the veneer of its control plane controlling kubernetes being able to actually control the african area clusters in the actual backing store for all the analytics. And we just talked about whether it be a web scale out That is traditionally using politics. Name space has now been modernized to take advantage of newer technologies running an envy me burst buffers or 100 gig networks with slingshot network at 200 and 400 gigabit. Looking at how do we actually get the actual analytics the workload to the CPU and have it attached to the data at rest? Where is the data? How do we land the data and how do we actually align essentially locality, locality of the actual asset to the compute. This is where, you know, we can leverage whether it be a juror or google or name your favorite hyper scaler, leverage those technologies leveraging the actual persistent store and this is where scale it is with this object store capability has been an industry trend setter, uh setting the actual landscape of how to provide an object store on premise and hybrid cloud running into public cloud but be able to facilitate data mobility and tie it back to and tie it back to an application. And this is where a lot of things have changed in the world of the, of analytics because the applications, the newer technologies that are coming on the market have taken advantage of this particular protocol as three so they can do web scale massively parallel concurrent workloads, >>you know what, let's get into the announcement, I love cool and relevant products and I think this hits the Mark Scaletta you guys have are Tesco which is um, just announced and I think, you know, we obviously we reported on it. You guys have a lightweight, true enterprise grade object store software for kubernetes. This is the announcement, Jerome. Tell us about it. >>What's the big >>deal? Cool and >>relevant? Come on, >>this is cool. All right, tell us >>I'm super excited. I'm not sure that it did. That's where on screen, but I'm super, super excited. You know, we, we introduced the ring 11 years ago and this is our biggest announcements for the past 11 years. So yes, do pay attention. Uh, you know, after after looking at all these trends and understanding where we see the future going, uh, we decided that it was time to embark block. So there's not one line of code that's the same as the previous generation product. They will both could exist. They both have space in the market, uh, and artist that was specifically this design for this cloud native era. And what we see is that people want something that's lightweight, especially because it had to go to the edge. They still want the enterprise grade, the security is known for and it has to be modern. What we really mean by modern is uh, we see object storage now being the primary storage for many application more and more applications and so we have to be able to deliver the performance that primary storage expects. Um this idea of skeletons serving primary storage is actually not completely new When we launched guilty 10 years ago, the first application that we were supporting West consumer email for which we were and we are still today the primary story. So we have we know what it is to be the primary store, we know what's the level of reliability you need to hit. We know what, what latest thinking and latency is different from fruit, but you really need to optimize both. Um, and I think that's still today. We're the only object storage company that protects that after both replication and the red recording because we understand that replication is factor the recording is better and more larger file were fast in terms of latency doesn't matter so much. So we, we've been bringing all that experience but really rethinking a product for that new generation that really is here now. And so we're truly excited against a little bit more about the product. It's a software was guilty is a software company and that's why we love to partner with HP who's producing amazing service. Um, you know, for the record and history, the very first deployment of skeleton in 2000 and 10 was on the HP service. So this is a, a long love story here. Um, and so to come back to artistic, uh, is lightweight in the sense that it's easy to use. We can start small, we can start from just one server or 11 VM instance. I mean start really small. Can grow infinitely. The fact that we start small, we didn't, you know, limit the technology because of that. Uh, so you can start from one too many. Um, and uh, it's contaminated in the sense that it's completely Cuban, it is compatible. It's communities orchestrated. It will deploy on many Cuban distributions. We're talking obviously with Admiral, we're also talking with Ponzu and with the other in terms of uh, communities distribution will also be able to be run in the cloud. I'm not sure that there will be many uh, true production deployment of artists in the club because you already have really good object storage by the cloud providers. But when you are developing something and you want to test their, um, you know, just doing it in the cloud is very practical. So you'll be able to deploy our discount communities cloud distribution and it's modern object storage in the sense that its application century. A lot of our work is actually validating that our storage is fit for a single purpose application and making sure that we understand the requirement of this application that we can guide our customers on how to deploy. And it's really designed to be the primary storage for these new workloads. >>The big part of the news is your relationship with Hewlett Packard Enterprises? Some exclusivity here as part of this announced, you mentioned, the relationship goes back many, many years. We've covered your relationship in the past chris also, you know, we cover HP like a blanket. Um, this is big news for h P E as >>well. >>What is the relationship talk about this? Exclusivity could you share about the partnership and the exclusivity piece? >>Well, the partnership expands into the pan HPV portfolio. We look we made a massive investment in edge IOT devices. Uh, so we actually have, how do we align the cost to the demand for our customers come to us wanting to looking at? Uh think about what we're doing with green, like a consumption based modeling, they want to be able to be able to consume the asset without having to do a capital outlay out of the gate uh, number to look at, you know, how do you deploy? Technology really demand? It depends on the scale. Right? So in a lot of your web skill, you know, scale out technologies, uh, putting them on a diet is challenging, meaning how skinny can you get it getting it down into the 50 terabyte range and then the complexities of those technologies at as you take a day one implementation and scale it out over, you know, you know, multiple iterations of recorders. The growth becomes a challenge. So, working with scalability, we we believe we've actually cracked this nut. We figured out how to a number one, how to start small but not limited customers ability to scale it out incrementally or grotesquely grotesque. A you can depending on the quarters the month, whatever whatever the workload is, how do you actually align and be able to consume it? Uh So now, whether it be on our edge line products are D. L. Products go back there. Now what the journalist talking about earlier, you know, we ship a server every few seconds. That won't be a problem. But then of course into our density optimized compute with the Apollo product. Uh This where uh our two companies have worked in an exclusivity where the, the scaly software bonds on the HP ecosystem. Uh and then we can of course provide you our customers the ability to consume that through our Green link financial models or through a complex parts of >>awesome. So jerome and chris who's the customer here? Obviously there's an exclusive period talk about the target customer. And how do customers get the product? How do we get the software? And how does this exclusivity with HP fit into it? >>Yeah. So there's really three types of customers and we really, we've worked a lot with a company called use design to optimize the user interface for each of the three types of customers. So we really thought about each uh customer role and providing with each of them the best product. Uh So the first type of customer application owners who are deploying application that requires an object storage in the back end. They typically want a simple objects to of one application. They wanted to be temple and work. I mean yesterday they want no freedom to just want an object store that works and they want to be able to start as small as they start with their application. Often it's, you know, the first department, maybe a small deployment. Um, you know, applications like backup like female rubric or uh, analytics like Stone Carver, tikka or false system now available as a software. Uh, you know, like Ceta does a really great department or nass that works very well. That means an object store in the back end of high performance computing. Wake up file system is an amazing file system. Um, we also have vertical application like broad peak, for example, who provides origin and view the software, the broadcasters. So all these applications, they request an object store in the back end and you just need a simple, high performance, working well object store and I'll discuss perfect. The second type of people that we think will be interested by artists. Uh essentially developers who are currently developing some communities of collaborative application your next year. Um and as part of their development stack, um it's getting better and better when you're developing a cloud native application to really target an object storage rather than NFS as you're persistently just, you know, think about generations of technologies and um, NFS and file system were great 25 years ago. I mean, it's an amazing technology. But now when you want to develop a distributed scalable application, objects toys a better fit because it's the same generation and so same thing. I mean, you know, developing something, they need uh an object so that they can develop on so they wanted very lightweight, but they also want the product that they're enterprise or their customers will be able to rely on for years and years on and this guy is really great for today. Um, the third type of customer are more architecture with security architects that are designing, uh, System where they're going to have 50 factories, 1000 planes, a million cars are going to have some local storage, which will they want to replicate to the core and possibly also to the club. And uh, as the design is really new generation workloads that are incredibly distributed. But with local storage, uh, these guys are really grateful for that >>and talk about the HP exclusive chris what's the, how does that fit into? They buy through sexuality. Can they get it for the HP? Are you guys working together on how customers can procure >>place? Yeah. Both ways they can procure it through security. They can secure it through HP. Uh, and it is the software stack running on our density, optimized compute platforms which you would choose online does. And to provide an enterprise quality because if it comes back to it in all of these use cases it's how do we align up into a true enterprise step? Um bringing about multi Tennessee, bringing about the fact that, you know, if you look at like a local racial coding, uh one of the things that they're bringing to it so that we can get down into the deal 3 25. So with the exclusivity, uh you actually get choice and that choice comes into our entire portfolio, whether it be the edge line platform, the D. L 3:25 a.m. B. Processing stack or the intel deal three eighties or whether whether it be the Apollo's or Alexa, there's there's so many ample choices there that facilitates this and it just allows us to align those two strategies >>awesome. And I think the kubernetes pieces really relevant because, you know, I've been interviewing folks practitioners um and kubernetes is very much maturing fast. It's definitely the centerpiece of the cloud native, both below the line, if you will under the hood for the, for the infrastructure and then for apps, um they want to program on top of it. That's critical. I mean, jeremy, this is like this is the future. >>Yeah. And if you don't mind, like to come back for a minute on the exclusive with HP. So we did a six month exclusive and the very reason we could do this is because HP has suffered such wrath of server portfolio and so we can go from, you know, really simple, very cheap, you know, HDD on the L 3 80 means a machine that retails for a few $4. I mean it's really like Temple System 50 terabyte. Uh we can have the dl 3 25. That uh piece mentioned there is really a powerhouse. All envy any uh slash uh all the storage is envy any uh very fast processors or uh you know, dance large large system like the Apollo 4500. So it's a very large breath of portfolio. We support the whole portfolio and we work together on this. So I want to say that you know, one of the reasons I want to send kudos to HP for for the breath of the silver lining rio as mentioned, um Jessica can be ordered from either company, hand in hand together. So anyway you'll see both of us uh and our field is working incredibly well together. >>We'll just on that point, I think just for clarification, uh was this co design by scalability and H P E. Because chris you mentioned, you know, the configuration of your systems. Can you guys quickly talk about the design, co design >>from from from the code base? The software entirely designed and developed by security from a testing and performance. So this really was a joint work with HP providing both hardware and manpower so that we could accelerate the testing phase. >>You know, chris H P E has just been doing such a great job of really focused on this. And you know, I've been Governor for years before it was fashionable the idea of apps working no matter where it lives. Public Cloud data center Edge, you mentioned. Edge line has been around for a while. You know, apps centric, developer friendly cloud first has been an H P E. Kind of guiding first principle for many, many years. >>But it has and you know, you know as our our ceo internal areas cited by 2022 everything will be able to be consumed as a service in our portfolio. Uh And then this stack allows us the simplicity and the consume ability of the technology and degranulation of it allows us to simplify the installation, simplify the actual deployment bringing into a cloud ecosystem. But more importantly for the end customer, they simply get an enterprise quality product running on identity optimized stack that they can consume through a orchestrated simplistic interface. That's that's cos that's what they're warning for today is where they come to me and asked hey how do I need a, I've got this new app new project and you know it goes back to who's actually coming, it's no longer the I. T. People who are actually coming to us, it's the lines of business. It's it's that entire dimension of business owners coming to us going this is my challenge and how can you HP help us And we rely on our breath of technology but also a breath of partners to come together and are of course reality is hand in hand and are collaborative business unit are collaborative storage product engineering group that actually brought this market. So we're very excited about this solution >>chris thanks for that input. Great insight, Jerome, congratulations on a great partnership with H. P. E. Obviously um great joint customer base congratulations on the product release here. Big moving the ball down the field as they say new functionality, clouds cloud native object store, phenomenal um So wrap wrap wrap up the interview. Tell us your vision for scalability in the future of storage. >>Yeah. Yeah I start I mean skeleton is going to be an amazing leader is already um but yeah so you know I have three themes that I think will govern how storage is going and obviously um Mark Andrews had said it software is everywhere and software is eating the world so definitely that's going to be true in the data center in storage in particular. Uh But the free trends that are more specific. First of all I think that security performance and agility is now basic expectation. It's not you know, it's not like an additional feature. It's just the best table, stakes, security performance and a job. Um The second thing is and we've talked about it during this conversation is edged to go you need to think your platform with Edge Co and cloud. You know you don't want to have separate systems separate design interface point for edge and then think about corn and think about clouds and then think about the divers. All this needs to be integrated in the design. And the third thing that I see as a major trend for the next 10 years is that a sovereignty uh more and more. You need to think about where is the data residing? What are the legal challenges? What is the level of protection against who are you protected? What what is your independence uh strategy? How do you keep as a company being independent from the people? You need to be independent. And I mean I say companies, but this is also true for public services. So these these for me are the three big trends. I do believe that uh software find distributed architecture are necessary for these tracks. But you also need to think about being truly enterprise grade. And there has been one of our focus with the design of a fresca. How do we combine a lot with product With all of the security requirements and that our sovereignty requirements that we expect to have in the next 10 years? >>That's awesome. Congratulations on the news scale. D Artois ca the big release with HP exclusive um, for six months, chris tucker, distinguished engineer at H P E. Great to ceo, jeremy, katz, ceo sexuality. Great to see you as well. Congratulations on the big news. I'm john for the cube. Thanks for watching. >>Mhm. >>Yeah.

Published Date : Apr 28 2021

SUMMARY :

from H P E. Hewlett Packard enterprise U room chris, Great to see you both. So let's see. but I think you know storage is kind of abstraction layer of evolution to this app centric world. the infrastructure level to deliver truly distributed And you know, Well, before we get into the product, I want to just touch on one thing from you mentioned and chris you, So that that's one of the aspects, another aspect that resonates for me with what you said Hybrid cloud, I've been, I've been having conscious with the HP folks for got locality of the actual asset to the compute. this hits the Mark Scaletta you guys have are Tesco which is um, this is cool. So we have we know what it is to be the primary store, we know what's the level of reliability you in the past chris also, you know, we cover HP like a blanket. number to look at, you know, how do you deploy? And how do customers get the product? I mean, you know, and talk about the HP exclusive chris what's the, how does that fit into? So with the exclusivity, uh you actually get choice And I think the kubernetes pieces really relevant because, you know, I've been interviewing folks all the storage is envy any uh very fast processors or uh you know, scalability and H P E. Because chris you mentioned, you know, the configuration of your from from from the code base? And you know, and asked hey how do I need a, I've got this new app new project and you know it goes back Big moving the ball down the field as they say new functionality, What is the level of protection against who are you protected? Great to see you as well.

ENTITIES

Entity	Category	Confidence
Jerome	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Chris Tinker	PERSON	0.99+
two companies	QUANTITY	0.99+
Hewlett Packard	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Jessica	PERSON	0.99+
Mark Andrews	PERSON	0.99+
US	LOCATION	0.99+
1000 planes	QUANTITY	0.99+
2000	DATE	0.99+
jeremy	PERSON	0.99+
200	QUANTITY	0.99+
50 factories	QUANTITY	0.99+
Jerome Lecat	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
100 gig	QUANTITY	0.99+
three types	QUANTITY	0.99+
jerome	PERSON	0.99+
katz	PERSON	0.99+
six month	QUANTITY	0.99+
chris	PERSON	0.99+
50 terabyte	QUANTITY	0.99+
next year	DATE	0.99+
10 years	QUANTITY	0.99+
$4	QUANTITY	0.99+
20	QUANTITY	0.99+
chris tucker	PERSON	0.99+
both	QUANTITY	0.99+
Hewlett Packard Enterprises	ORGANIZATION	0.99+
two	QUANTITY	0.99+
one	QUANTITY	0.99+
each	QUANTITY	0.99+
yesterday	DATE	0.99+
Palo alto California	LOCATION	0.99+
10 years ago	DATE	0.99+
First	QUANTITY	0.99+
11 years ago	DATE	0.99+
Edge Co	ORGANIZATION	0.99+
chris Tinker	PERSON	0.99+
third thing	QUANTITY	0.99+
a million cars	QUANTITY	0.98+
15 years later	DATE	0.98+
L 3 80	COMMERCIAL_ITEM	0.98+
two strategies	QUANTITY	0.98+
one application	QUANTITY	0.98+
25 years ago	DATE	0.98+
second thing	QUANTITY	0.98+
first application	QUANTITY	0.98+
second	QUANTITY	0.98+
third type	QUANTITY	0.98+
2022	DATE	0.98+
one server	QUANTITY	0.98+
first department	QUANTITY	0.97+
five	DATE	0.97+
three themes	QUANTITY	0.97+
one thing	QUANTITY	0.97+
three letter	QUANTITY	0.97+
Both ways	QUANTITY	0.97+
one line	QUANTITY	0.97+
today	DATE	0.96+
Apollo 4500	COMMERCIAL_ITEM	0.96+
H P E.	ORGANIZATION	0.96+
11 VM	QUANTITY	0.96+

Michael Jordan & Matt Whitbourne, IBM | IBM Think 2020

>>Yeah. >>From the Cube Studios in Palo Alto and Boston. It's the Cube covering IBM. Think brought to you by IBM. >>Welcome back to IBM. Think Digital 2020. This is the Cube, and we're really excited to have two great guests on Michael Jordan is the distinguished engineer with IBM Z Security. Michael, good to see you again. Welcome back. >>Thank you. It's good to be back. >>And, Matt, what Born is the program director and offering lead for Z 15. Good to see that. >>Thank you for having me, >>guys. Easy. Easy is a good place to be. Great corner, 61% growth. You got to love it. Regulations. It'll be feeling pretty good. I mean, other than what we're going through. But from a business standpoint, Z powered through, didn't it? >>It did. I mean, we're really pleased with the contribution that Z continues to make for our clients. Especially right now, given everything that's going on, business continuity, scale, resilient security. They're just so important for our clients in the platform. >>Yes. So we're gonna We're gonna talk a lot about this. Maybe Matt could start with you just in terms of, you know, you talk about. Ah, cyber resiliency. Hear that a lot? Um, e I think it may be. Means a lot of different things to a lot of different people. What does it mean? Busy? >>Yeah, for us. I mean, you know, we kind of start in many ways with, like that, this definition on that which talks about the ability to anticipate, withstand, recover, adapt all of these adverse conditions, might face or stresses compromises in attacks in your systems and your just cyber results. It's so it's a really important top of mind talking point from other clients who are thinking about this both from, I guess, the resilience when it comes to the systems and also the data as well. From our standpoint, you know, Z has been at the forefront of resilience for many, many generations. Now, whether that's the scale that systems we're able to provide, the ability to tap into more capacity is needed, whether on a temporary or permanent basis, cause you never know when a when a spike might be occurring on day, especially with clients going through digital transformation as well. The fact that we can talk about solutions being designed for seven nines of availability on. But the reason why clients like Tesco or alliances for their resilient banking platform or Department of Treasury in Puerto Rico depend on us or for a highly available solution. So it's never been more important for by us. >>So, Michael, from a technical standpoint, um, I mean, I go back to the rack f days and and I I used to ask, why is it that, you know, the mainframe had, you know, such good security, and it was explained to me years ago? Well, cause you knew everything that went on who touched what? You know, there was a clear understanding of that clear visibility of that. Um, but maybe you could explain just for laypeople from just from a technical standpoint. Why is it that Z has such strong cyber resiliency? >>Sure. So So some of it, I think, is there's 22 aspects that I want to mention first is, you know, culture, right? You know, the IBM Z, you know, development team and broader, you know, design team. We have in our culture to build systems that are secure and robust, that that's kind of part of our DNA. And so it's that mindset when you look at, you know, technologies like parallel system, flex and geographic geographically dispersed, parallel, parallel suspects, GPS. You know, those are ingrained in those technologies, but the other capability that we have or I should say, um, you know, benefit that we we have is we own the whole stack, right? We own, you know, the hardware we own the firmware, um, and we own the software that sits on top of there in the middle, where and so whether it's resiliency or whether it's security when we want to design and build solutions, you know, to make optimal solutions, you know any of those spaces we can actually design and architect the solutions, you know, both at the right point in the stack and across the stack as needed to really deliver on these capabilities. >>So, Matt, one of our partners, ET are holds these CEO roundtables, and one of the CEO said we really weren't ready from a resiliency standpoint. We're too focused on on er and kind of missed the boat on business continuity to narrow focus. I presume you're hearing a lot of that these days. I wonder if you could just tell us about some of the things that you're seeing with clients, Maybe the conversations you're having and how you're helping Sort of broaden that capability. >>Yeah, sure. I mean, to your point. I mean, nobody really could have quite predicted. You know what we're dealing with right now, but, you know, we have had over many generations of the Z platform, you know, clients deeply partnered with us to try and make sure they have a a highly available environment for business continuity. And, you know, just thinking about things from a Dell perspective. You know what they can do to fortify and make their solution sort of more resilient on the day by day basis. I mean, one of the things you might be talking about, some of the inherent capabilities we have a hassle. The fact that we build, you know, our systems with the additional capacity kind of baked in. Which means that for so many of our clients, you know, in the first in the first quarter, where they were seeing the huge amounts of peak workload kind of coming in, that they needed to be able to deal with the fact that we design our systems to be able to just kind of gobble up that work. With that we call dark capacity to be turned on at the drop of a hat. It's tremendously important because not only need to be offsite, just resilient in terms of the applications, but you need to get a deal with growth. You're going through that. The other aspect, which is a new capability with the 15 that kind of builds on what we could do with that dark past thing is this concept of instant recovery. But what we're actually helping clients do there in terms of fortifying and making their environment more resilient, is letting them attack into that dark capacity when they're going through restart activities of partitions, not just thinking about unplanned scenarios, but actually planned out just as well. So what that really helps with is because you always have to do planned maintenance. You know, when your systems, you know when you're partitions your your system because the environment. So what we're doing is saying when you're going through that restart sort of process, whether it's the shutdown, whether it's to bring up of the partition or the middleware or even in fact, actually helping you catch up. Kind of for what? You what you lost one weren't sort of processing workflow. We turn on that extra capacity in the system automatically for this boost window that were that we're helping our clients with. Not only we do that. Mike's point about owning a stack means that we can deliver that in a way that there's no increase in IBM software cost a reliever. So we're always kind of looking about what we can do to kind of move the ball forward to make a client's environment even more resilient as well. >>I've always, I learned from my mainframe days many, many years ago. And what when a vendor comes in and shows a new product, they always ask you what happens when something goes wrong? It's all about recovery that's always been one of the main frame strength. Mike, I want to ask you about data protection. I mean, it's a topic that again means a lot of things to a lot of people you know doesn't mean backup. There's data privacy. There's data Providence. There's data sovereignty. We talk about data protection from a Z prism. >>Sure, so So our point of view on data protection is is we view it as a as a multi layered proposition. It's not. It's not just one thing. In effect, we viewed the lens of a broader, you know, layered cybersecurity strategy where you know, data protection. And, you know, in this case, you know, talking about encryption and being another encrypt data on a massive scale is the foundation for, you know, a layered cyber security strategy, um, and providing capabilities for appliance. Do you protect data at the disk level with the 15? We also introduced the ability of actually being able to protect the data as it flows through their storage area network through something we call fibre channel endpoint security and then layering on top of that, you know, host based encryption capabilities, you know, in the operating system, whether it's, you know, buy or or data set level encryption and you know, then on top of that, they can layer additional capabilities for things like multi factor authentication to protect your privileged identities from being compromised or being able to do damage to your system and then, you know, building and layering. On top of that things like security, intelligence and being able to monitor and understand You know what, what's happening across the system. >>So I was talking with Developer the other day in cloud app pretty, you know, non mission critical. But ask them to use encryption and he said, Yeah, we could, but we don't cause it slows us down a little bit. So I'm wondering how you deal with that trade off performance versus Protection Z. How does he deal with that? >>Sure, So that's that. That's a great That's a great question. And that actually goes back to you know what we did with with our Z 14 so that the generation before and I think we've we've improved that with with the 15 and then I'll get to that in a bit. But one of the barriers that we recognized is exactly what you said is the You know, the cost of doing encryption is prohibitive, Um, and what we did is we have, ah, a cryptographic accelerator that's integrated into our micro processor that's capable of encrypting so each or it's capable of encrypting up to 14 gigabytes of data per second. And if you multiply that by the number of cores that you have. You know, a fully configured you nosy 15 met. What does it have any cores? Do we have in that 100 >>90 with >>190 So So do the math right? 190 times, you know, 14 gigabytes per second. It's an encryption powerhouse, and that can all be done synchronously with extremely low latency. So we have the horsepower to do encryption on a very broad scale with very, very low overhead. And that's what our clients are leveraging and taking advantage of. And with the Zy 15. That being we announced it and made available last year. We actually have now compression that's built into the micro processor so you can actually compress the data, Um, first and then encrypted. And there's a twofold benefits that first is now. I have less data to encrypt, so I have lowered my encryption overhead, and at the same time I've managed to preserve my storage efficiency. So it's a It's a twofold benefit there, >>you know. People talk off about Z, they talk about it, it's open. It's kind of all started back when you guys brought in Lennox. And now, of course, it's It's much more than that. Um, but I'm wondering how open plays into this notion of cyber resiliency in some respects there. Counter poised. But But how do you sort of square that circle for me? >>Yeah, I mean, it's kind of look at it is when it comes to openness and digital transformation, it's kind of doing it without compromise on. That's kind of the way I look at the Z platform because you're right. I mean the fact that we have the likes of open shift support on the seat platform or you can use, you know, answerable for for doing automation. I mean, were always looking to try and make sure that we support from A from a management standpoint or development standpoint. We'll use whichever tool frameworks languages are appropriate on the platform and integrated to a hyper cloud wherever you want to go. That's why when we look at it from the perspective of what it really means to have mission critical applications and why, it's why that is the key point about banks. Insurance companies, etcetera continue to trust. Z is there is the home for their system of record because they want to get the benefits. You know, the best of both worlds. So they want to be able to have the security, the resilience and the scale of the platform. But the same time they want to have flexibility to be able to use cloud native technologies to be able to deploy them on our platform. And then this micro sort of talking about the exciting thing for us is even going one step further. That says, if you do want your data to move around your hybrid cloud for very good reasons for certain scenarios, being able to have that capability to protect the data, not just encrypted that manage the privacy over the data as it flows out and see to kind of take those characteristics into the hybrid cloud is something that a lot of that clients been really, really excited to take advantage of it. It's >>about this conference. You might get certain >>charting Matt into a security guide. You see that? >>Yeah, >>I think everybody's got to be a security person these days. I want to ask about zero trust. You know, that term is thrown around a lot of, uh, you know, you can get kind of buzz, wordy. You see, people always have substance. I want to ask you guys what zero trust means the Io. >>So So I think there's, you know, my view of zeros where we're at from an industry from from zero. Trust is is very similar to where we're at with cloud, you know, going back a handful of years where if you ask 10 different people what you know, cloud was you get 10 different answers. Um, and none of them were probably wrong. And so I think, you know, we're very similar state in terms of our understanding and, you know, market maturity around zero trust. But there's, you know, at its for, you know, the the the The idea is, you know, we've been focused on protecting, you know, our environments using a castle and moat of approach. Um, and, you know, you know, protecting the perimeter. Yeah, and then trusting everything inside of inside of that. You know that that mode, if you will, um and what the zero trust is a recognition that that's not sufficient. And, you know, and then if you look at that in the context of our evolving and changing in environment and moving to hybrid multi clouds where, um, the notion of a perimeter is gone. You know that that strategy and approach for protection, it doesn't hold up. And so we need to evolve that, um And we need to have, you know, you know, move from the notion of, um, operational trust to a notion of technical trust and building, you know, building more sophisticated mechanisms for doing authentication, understanding broader what's happening across the environment and feeding that into, you know, decisions that are made in terms of who gets to access. What data. So, >>yeah, good, Matt, bring us home overnight. You know, this pandemic has really heightened our awareness of cyber resiliency. Business continuity have changed our our mindset and definition of those two things. But give us your final thoughts on this top. >>I think it's probably just been into sharp focus, really what? It what it means to have mission critical applications that are right at the heart of your of your business. And, you know, you come to realize very quickly. But if those services are not available to your clients, I mean it can have such a long lasting implications So I think people embittering you know their strategy when it comes to, you know, millions off applications with infrastructure and all of that in the context of business continuity, I think people are gonna gonna have a much sharper focus in the future to really see, you know, what is what does it mean? And it's the lifeblood of their business is not able todo operate and serve their clients. And probably as well, more and more applications that maybe weren't considered mission critical in the past will be considered mission critical now because it's not just the back end services, but it's the way the community a reply. It's so a lot of that, I think, is going to play out the way that people think about their business continuity strategy in the future. >>Yeah, you're right. Video conferencing has become mission critical, isn't it? Guys, thanks so much for coming on the Cube again. You know, keep up the good work. Uh, I really appreciate your time and your insights. Always, always great talking, talking Z. So thanks again. >>Thank you. >>All right. Thank you for watching. Everybody. This is Dave Volante for the Cube. Our wall to wall coverage of the think 2020 digital event experience. Keep right there. Right back after this short break. >>Yeah, yeah, yeah.

Published Date : May 5 2020

SUMMARY :

Think brought to you by IBM. Michael, good to see you again. It's good to be back. Good to see that. You got to love it. I mean, we're really pleased with the contribution that Z continues of, you know, you talk about. I mean, you know, we kind of start in many ways with, like that, this definition on that which talks about the you know, the mainframe had, you know, such good security, and it was explained to me years ago? design and architect the solutions, you know, both at the right point in the stack and of missed the boat on business continuity to narrow focus. generations of the Z platform, you know, clients deeply partnered with us lot of people you know doesn't mean backup. of a broader, you know, layered cybersecurity strategy where you know, you know, non mission critical. that we recognized is exactly what you said is the You know, the cost of doing encryption 190 times, you know, It's kind of all started back when you guys brought in Lennox. are appropriate on the platform and integrated to a hyper cloud wherever you want to You might get certain You see that? You know, that term is thrown around a lot of, uh, you know, you can get kind of buzz, um And we need to have, you know, you know, move from the notion of, You know, have a much sharper focus in the future to really see, you know, what is what does it mean? thanks so much for coming on the Cube again. Thank you for watching.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Michael Jordan	PERSON	0.99+
Matt	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Mike	PERSON	0.99+
Matt Whitbourne	PERSON	0.99+
last year	DATE	0.99+
22 aspects	QUANTITY	0.99+
Boston	LOCATION	0.99+
Z 14	COMMERCIAL_ITEM	0.99+
61%	QUANTITY	0.99+
one	QUANTITY	0.99+
first	QUANTITY	0.99+
190 times	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
15	QUANTITY	0.99+
Puerto Rico	LOCATION	0.99+
100	QUANTITY	0.98+
both	QUANTITY	0.98+
two things	QUANTITY	0.98+
each	QUANTITY	0.97+
seven nines	QUANTITY	0.97+
millions	QUANTITY	0.97+
Department of Treasury	ORGANIZATION	0.97+
two great guests	QUANTITY	0.95+
10 different people	QUANTITY	0.94+
both worlds	QUANTITY	0.94+
think 2020	EVENT	0.93+
Lennox	ORGANIZATION	0.92+
15	COMMERCIAL_ITEM	0.92+
ET	ORGANIZATION	0.92+
zero	QUANTITY	0.92+
one thing	QUANTITY	0.91+
years	DATE	0.91+
zeros	QUANTITY	0.9+
10 different answers	QUANTITY	0.89+
Z	TITLE	0.88+
14 gigabytes per second	QUANTITY	0.88+
twofold	QUANTITY	0.85+
one step	QUANTITY	0.84+
190	QUANTITY	0.84+
Zy 15	COMMERCIAL_ITEM	0.84+
Z	PERSON	0.82+
zero trust	QUANTITY	0.81+
Cube Studios	ORGANIZATION	0.79+
IBM Z	ORGANIZATION	0.78+
Z 15	TITLE	0.78+
Security	ORGANIZATION	0.74+
Cube	COMMERCIAL_ITEM	0.72+
up to 14 gigabytes of data per second	QUANTITY	0.72+
many years ago	DATE	0.72+
pandemic	EVENT	0.63+
first quarter	QUANTITY	0.62+
twofold benefits	QUANTITY	0.61+
Think 2020	EVENT	0.57+
around	QUANTITY	0.53+
Think Digital 2020	ORGANIZATION	0.53+
Born	PERSON	0.49+
>90	QUANTITY	0.49+
Cube	ORGANIZATION	0.46+
Think	ORGANIZATION	0.46+
Providence	ORGANIZATION	0.45+
Z	COMMERCIAL_ITEM	0.36+

Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

Tricia Wang, Sudden Compass | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE covering IBM Data Science For All brought to you by IBM. >> Welcome back here on theCUBE. We are live in New York continuing our coverage here for Data Science for All where all things happen. Big things are happening. In fact, there's a huge event tonight I'm going to tell you about a little bit later on, but Tricia Wang who is our next guest is a part of that panel discussion that you'll want to tune in for live on ibmgo.com. 6 o'clock, but more on that a little bit later on. Along with Dave Vellante, John Walls here, and Tricia Wang now joins us. A first ever for us. How are you doing? >> Good. >> A global tech ethnographer. >> You said it correctly, yay! >> I learned a long time ago when you're not sure slow down. >> A plus already. >> Slow down and breathe. >> Slow down. >> You did a good job. Want to do it one more time? >> A global tech ethnographer. >> Tricia: Good job. >> Studying ethnography and putting ethnography into practice. How about that? >> Really great. >> That's taking on the challenge stretch. >> Now say it 10 times faster in a row. >> How about when we're done? Also co-founder of Sudden Compass. So first off, let's tell our viewers a little bit about Sudden Compass. Then I want to get into the ethnography and how that relates to tech. So let's go first off about Sudden Compass and the origins there. >> So Sudden Compass, we're a consulting firm based in New York City, and we help our partners embrace and understand the complexity of their customers. So whenever there are, wherever there's data and wherever there's people, we are there to help them make sure that they can understand their customers at the end of the day. And customers are really the most unpredictable, the most unknown, and the most difficult to quantify thing for any business. We see a lot of our partners really investing in big data data science tools and they're hiring the most amazing data scientists, but we saw them still struggling to make the right decisions, they still weren't getting their ROI, and they certainly weren't growing their customer base. And what we are helping them do is to say, "Look, you can't just rely only on data science. "You can't put it all into only the tool. "You have to think about how to operationalize that "and build a culture around it "and get the right skillsets in place, "and incorporate what we call the thick data, "which is the stuff that's very difficult to quantify, "the unknown, "and then you can figure out "how to best mathematically scale your data models "when it's actually based on real human behavior, "which is what the practice of ethnography is there to help "is to help you understand what do humans actually do, "what is unquantifiable. "And then once you find out those unquantifiable bits "you then have the art and science of figuring out "how do you scale it into a data model." >> Yeah, see that's what I find fascinating about this is that you've got hard and fast, right, data, objective, black and white, very clear, and then you've got people, you know? We all react differently. We have different influences, and different biases, and prejudices, and all that stuff, aptitudes. So you are meshing this art and science. >> Tricia: Absolutely. >> And what is that telling you then about how best to your clients and how to use data (mumbles)? >> Well, we tell our clients that because people are, there are biases, and people are not objective and there's emotions, that all ends up in the data set. To think that your data set, your quantitative data set, is free of biases and has some kind of been scrubbed of emotion is a total fallacy and it's something that needs to be corrected, because that means decision makers are making decisions based off of numbers thinking that they're objective when in fact they contain all the biases of the very complexity of the humans that they're serving. So, there is an art and science of making sure that when you capture that complexity ... We're saying, "Don't scrub it away." Traditional marketing wants to say, "Put your customers in boxes. "Put them in segments. "Use demographic variables like education, income. "Then you can just put everyone in a box, "figure out where you want to target, "figure out the right channels, "and you buy against that and you reach them." That's not how it works anymore. Customers now are moving faster than corporations. The new net worth customer of today has multiple identities is better understood when in relationship to other people. And we're not saying get rid of the data science. We're saying absolutely have it. You need to have scale. What is thick data going to offer you? Not scale, but it will offer you depth. So, that's why you need to combine both to be able to make effective decisions. >> So, I presume you work with a lot of big consumer brands. Is that a safe assumption? >> Absolutely. >> Okay. So, we work with a lot of big tech brands, like IBM and others, and they tend to move at the speed of the CIO, which tends to be really slow and really risk averse, and they're afraid to over rotate and get ahead over their skis. What do you tell folks like that? Is that a mistake being so cautious in this digital age? >> Well, I think the new CIO is on the cutting edge. I was just at Constellation Research Annual Conference in Half Moon Bay at-- >> Our friend Ray Wang. >> Yeah, Ray Wang. And I just spoke about this at their Constellation Connected Enterprise where they had the most, I would have to say the most amazing forward thinking collection of CIOs, CTOs, CDOs all in one room. And the conversation there was like, "We cannot afford to be slow anymore. "We have to be on the edge "of helping our companies push the ground." So, investing in tools is not enough. It is no longer enough to be the buyer, and to just have a relationship with your vendor and assume that they will help you deliver all the understanding. So, CIOs and CTOs need to ensure that their teams are diverse, multi-functional, and that they're totally integrated embedded into the business. And I don't mean just involve a business analyst as if that's cutting edge. I'm saying, "No, you need to make sure that every team "has qualitative people, "and that they're embedded and working closely together." The problem is we don't teach these skills. We're not graduating data scientists or ethnographers who even want to talk to each other. In fact, each side thinks the other side is useless. We're saying, "No, "we need to be able to have these skills "being taught within companies." And you don't need to hire a PhD data scientist or a PhD ethnographer. What we're saying is that these skills can be taught. We need to teach people to be data literate. You've hired the right experts, you have bought the right tools, but we now need to make sure that we're creating data literacy among decision makers so that we can turn these data into insights and then into action. >> Let's peel that a little bit. Data literate, you're talking about creativity, visualization, combining different perspectives? Where should the educational focus be? >> The educational focus should be on one storytelling. Right now, you cannot just be assuming that you can have a decision maker make a decision based on a number or some long PowerPoint report. We have to teach people how to tell compelling stories with data. And when I say data I'm talking about it needs the human component and it needs the numbers. And so one of the things that I saw, this is really close to my heart, was when I was at Nokia, and I remember I spent a decade understanding China. I really understood China. And when I finally had the insight where I was like, "Look, after spending 10 years there, "following 100 to 200 families around, "I had the insight back in 2009 that look, "your company is about to go out of business because "people don't want to buy your feature phones anymore. "They're going to want to buy smartphones." But, I only had qualitative data, and I needed to work alongside the business analysts and the data scientists. I needed access to their data sets, but I needed us to play together and to be on a team together so that I could scale my insights into quantitative models. And the problem was that, your question is, "What does that look like?" That looks like sitting on a team, having a mandate to say, "You have to play together, "and be able to tell an effective story "to the management and to leadership." But back then they were saying, "No, "we don't even consider your data set "to be worthwhile to even look at." >> We love our candy bar phone, right? It's a killer. >> Tricia: And we love our numbers. We love our surveys that tell us-- >> Market share was great. >> Market share is great. We've done all of the analysis. >> Forget the razor. >> Exactly. I'm like, "Look, of course your market share was great, "because your surveys were optimized "for your existing business model." So, big data is great if you want to optimize your supply chain or in systems that are very contained and quantifiable that's more or less fine. You can get optimization. You can get that one to two to five percent. But if you really want to grow your company and you want to ensure its longevity, you cannot just rely on your quantitative data to tell you how to do that. You actually need thick data for discovery, because you need to find the unknown. >> One of the things you talk about your passion is to understand how human perspectives shape the technology we build and how we use it. >> Tricia: Yes, you're speaking my language. >> Okay, so when you think about the development of the iPhone, it wasn't a bunch of surveys that led Steve Jobs to develop the iPhone. I guess the question is does technology lead and shape human perspectives or do human perspectives shape technology? >> Well, it's a dialectical relationship. It's like does a hamburger ... Does a bun shape the burger or does the bun shape the burger? You would never think of asking someone who loves a hamburger that question, because they both shape each other. >> Okay. (laughing) >> So, it's symbiote here, totally symbiotic. >> Surprise answer. You weren't expecting that. >> No, but it is kind of ... Okay, so you're saying it's not a chicken and egg, it's both. >> Absolutely. And the best companies are attuned to both. The best companies know that. The most powerful companies of the 21st century are obsessed with their customers and they're going to do a great job at leveraging human models to be scaled into data models, and that gap is going to be very, very narrow. You get big data. We're going to see more AI or ML disasters when their data models are really far from their actual human models. That's how we get disasters like Tesco or Target, or even when Google misidentified black people as gorillas. It's because their model of their data was so far from the understanding of humans. And the best companies of the future are going to know how to close that gap, and that means they will have the thick data and big data closely integrated. >> Who's doing that today? It seems like there are no ethics in AI. People are aggressively AI for profit and not really thinking about the human impacts and the societal impacts. >> Let's look at IBM. They're doing it. I would say that some of the most innovative projects that are happening at IBM with Watson, where people are using AI to solve meaningful social problems. I don't think that has to be-- >> Like IBM For Social Good. >> Exactly, but it's also, it's not just experimental. I think IBM is doing really great stuff using Watson to understand, identify skin cancer, or looking at the ways that people are using AI to understand eye diseases, things that you can do at scale. But also businesses are also figuring out how to use AI for actually doing better things. I think some of the most interesting ... We're going to see more examples of people using AI for solving meaningful social problems and making a profit at the same time. I think one really great example is WorkIt is they're using AI. They're actually working with Watson. Watson is who they hired to create their engine where union workers can ask questions of Watson that they may not want to ask or may be too costly to ask. So you can be like, "If I want to take one day off, "will this affect my contract or my job?" That's a very meaningful social problem that unions are now working with, and I think that's a really great example of how Watson is really pushing the edge to solve meaningful social problems at the same time. >> I worry sometimes that that's like the little device that you put in your car for the insurance company to see how you drive. >> How do you brake? How do you drive? >> Do people trust feeding that data to Watson because they're afraid Big Brother is watching? >> That's why we always have to have human intelligence working with machine intelligence. This idea of AI versus humans is a false binary, and I don't even know why we're engaging in those kinds of questions. We're not clearly, but there are people who are talking about it as if it's one or the other, and I find it to be a total waste of time. It's like clearly the best AI systems will be integrated with human intelligence, and we need the human training the data with machine learning systems. >> Alright, I'll play the yeah but. >> You're going to play the what? >> Yeah but! >> Yeah but! (crosstalk) >> That machines are replacing humans in cognitive functions. You walk into an airport and there are kiosks. People are losing jobs. >> Right, no that's real. >> So okay, so that's real. >> That is real. >> You agree with that. >> Job loss is real and job replacement is real. >> And I presume you agree that education is at least a part the answer, and training people differently than-- >> Tricia: Absolutely. >> Just straight reading, writing, and arithmetic, but thoughts on that. >> Well what I mean is that, yes, AI is replacing jobs, but the fact that we're treating AI as some kind of rogue machine that is operating on its own without human guidance, that's not happening, and that's not happening right now, and that's not happening in application. And what is more meaningful to talk about is how do we make sure that humans are more involved with the machines, that we always have a human in the loop, and that they're always making sure that they're training in a way where it's bringing up these ethical questions that are very important that you just raised. >> Right, well, and of course a lot of AI people would say is about prediction and then automation. So think about some of the brands that you serve, consult with, don't they want the machines to make certain decisions for them so that they can affect an outcome? >> I think that people want machines to surface things that is very difficult for humans to do. So if a machine can efficiently surface here is a pattern that's going on then that is very helpful. I think we have companies that are saying, "We can automate your decisions," but when you actually look at what they can automate it's in very contained, quantifiable systems. It's around systems around their supply chain or logistics. But, you really do not want your machine automating any decision when it really affects people, in particular your customers. >> Okay, so maybe changing the air pressure somewhere on a widget that's fine, but not-- >> Right, but you still need someone checking that, because will that air pressure create some unintended consequences later on? There's always some kind of human oversight. >> So I was looking at your website, and I always look for, I'm intrigued by interesting, curious thoughts. >> Tricia: Okay, I have a crazy website. >> No, it's very good, but back in your favorite quotes, "Rather have a question I can't answer "than an answer I can't question." So, how do you bring that kind of there's no fear of failure to the boardroom, to people who have to make big leaps and big decisions and enter this digital transformative world? >> I think that a lot of companies are so fearful of what's going to happen next, and that fear can oftentimes corner them into asking small questions and acting small where they're just asking how do we optimize something? That's really essentially what they're asking. "How do we optimize X? "How do we optimize this business?" What they're not really asking are the hard questions, the right questions, the discovery level questions that are very difficult to answer that no big data set can answer. And those are questions ... The questions about the unknown are the most difficult, but that's where you're going to get growth, because when something is unknown that means you have not either quantified it yet or you haven't found the relationship yet in your data set, and that's your competitive advantage. And that's where the boardroom really needs to set the mandate to say, "Look, I don't want you guys only answering "downstream, company-centric questions like, "'How do we optimize XYZ?"'" which is still important to answer. We're saying you absolutely need to pay attention to that, but you also need to ask upstream very customer-centric questions. And that's very difficult, because all day you're operating inside a company . You have to then step outside of your shoes and leave the building and see the world from a customer's perspective or from even a non existing customer's perspective, which is even more difficult. >> The whole know your customer meme has taken off in a big way right now, but I do feel like the pendulum is swinging. Well, I'm sanguined toward AI. It seems to me that ... It used to be that brands had all the power. They had all the knowledge, they knew the pricing, and the consumers knew nothing. The Internet changed all that. I feel like digital transformation and all this AI is an attempt to create that asymmetry again back in favor of the brand. I see people getting very aggressive toward, certainly you see this with Amazon, Amazon I think knows more about me than I know about myself. Should we be concerned about that and who protects the consumer, or is just maybe the benefits outweigh the risks there? >> I think that's such an important question you're asking and it's totally important. A really great TED talk just went up by Zeynep Tufekci where she talks about the most brilliant data scientists, the most brilliant minds of our day, are working on ad tech platforms that are now being created to essentially do what Kenyatta Jeez calls advertising terrorism, which is that all of this data is being collected so that advertisers have this information about us that could be used to create the future forms of surveillance. And that's why we need organizations to ask the kind of questions that you did. So two organizations that I think are doing a really great job to look at are Data & Society. Founder is Danah Boyd. Based in New York City. This is where I'm an affiliate. And they have all these programs that really look at digital privacy, identity, ramifications of all these things we're looking at with AI systems. Really great set of researchers. And then Vint Cerf (mumbles) co-founded People-Centered Internet. And I think this is another organization that we really should be looking at, it's based on the West Coast, where they're also asking similar questions of like instead of just looking at the Internet as a one-to-one model, what is the Internet doing for communities, and how do we make sure we leverage the role of communities to protect what the original founders of the Internet created? >> Right, Danah Boyd, CUBE alum. Shout out to Jeff Hammerbacher, founder of Cloudera, the originator of the greatest minds of my generation are trying to get people to click on ads. Quit Cloudera and now is working at Mount Sinai as an MD, amazing, trying to solve cancer. >> John: A lot of CUBE alums out there. >> Yeah. >> And now we have another one. >> Woo-hoo! >> Tricia, thank you for being with us. >> You're welcome. >> Fascinating stuff. >> Thanks for being on. >> It really is. >> Great questions. >> Nice to really just change the lens a little bit, look through it a different way. Tricia, by the way, part of a panel tonight with Michael Li and Nir Kaldero who we had earlier on theCUBE, 6 o'clock to 7:15 live on ibmgo.com. Nate Silver also joining the conversation, so be sure to tune in for that live tonight 6 o'clock. Back with more of theCUBE though right after this. (techno music)

Published Date : Nov 1 2017

SUMMARY :

brought to you by IBM. I'm going to tell you about a little bit later on, Want to do it one more time? and putting ethnography into practice. the challenge stretch. and how that relates to tech. and the most difficult to quantify thing for any business. and different biases, and prejudices, and all that stuff, and it's something that needs to be corrected, So, I presume you work with a lot of big consumer brands. and they tend to move at the speed of the CIO, I was just at Constellation Research Annual Conference and assume that they will help you deliver Where should the educational focus be? and to be on a team together We love our candy bar phone, right? We love our surveys that tell us-- We've done all of the analysis. You can get that one to two to five percent. One of the things you talk about your passion that led Steve Jobs to develop the iPhone. or does the bun shape the burger? Okay. You weren't expecting that. but it is kind of ... and that gap is going to be very, very narrow. and the societal impacts. I don't think that has to be-- and making a profit at the same time. that you put in your car for the insurance company and I find it to be a total waste of time. You walk into an airport and there are kiosks. but thoughts on that. that are very important that you just raised. So think about some of the brands that you serve, But, you really do not want your machine Right, but you still need someone checking that, and I always look for, to the boardroom, and see the world from a customer's perspective and the consumers knew nothing. that I think are doing a really great job to look at Shout out to Jeff Hammerbacher, Nice to really just change the lens a little bit,

ENTITIES

Entity	Category	Confidence
Diane Greene	PERSON	0.99+
Eric Herzog	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
Diane	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Mark Albertson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Rebecca Knight	PERSON	0.99+
Jennifer	PERSON	0.99+
Colin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Tricia Wang	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Singapore	LOCATION	0.99+
James Scott	PERSON	0.99+
Scott	PERSON	0.99+
Ray Wang	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Brian Walden	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Jeff Bezos	PERSON	0.99+
Rachel Tobik	PERSON	0.99+
Alphabet	ORGANIZATION	0.99+
Zeynep Tufekci	PERSON	0.99+
Tricia	PERSON	0.99+
Stu	PERSON	0.99+
Tom Barton	PERSON	0.99+
Google	ORGANIZATION	0.99+
Sandra Rivera	PERSON	0.99+
John	PERSON	0.99+
Qualcomm	ORGANIZATION	0.99+
Ginni Rometty	PERSON	0.99+
France	LOCATION	0.99+
Jennifer Lin	PERSON	0.99+
Steve Jobs	PERSON	0.99+
Seattle	LOCATION	0.99+
Brian	PERSON	0.99+
Nokia	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Peter Burris	PERSON	0.99+
Scott Raynovich	PERSON	0.99+
Radisys	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Eric	PERSON	0.99+
Amanda Silver	PERSON	0.99+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Satyen Sangani, Alation | SAP Sapphire Now 2017

>> Narrator: It's theCUBE covering Sapphire Now 2017 brought to you by SAP Cloud Platform and HANA Enterprise Cloud. >> Welcome back everyone to our special Sapphire Now 2017 coverage in our Palo Alto Studios. We have folks on the ground in Orlando. It's the third day of Sapphire Now and we're bringing our friends and experts inside our new 4500 square foot studio where we're starting to get our action going and covering events anywhere they are from here. If we can't get there we'll do it from here in Palo Alto. Our next guest is Satyen Sangani, CEO of Alation. A hot start-up funded by Custom Adventures, Catalyst Data Collective, and I think Andreessen Horowitz is also an investor? >> Satyen: That's right. >> Satyen, welcome to the cube conversation here. >> Thank you for having me. >> So we are doing this special coverage, and I wanted to bring you in and discuss Sapphire Now as it relates to the context of the biggest wave hitting the industry, with waves are ones cloud. We've known that for a while. People surfing that one, then the data wave is coming fast, and I think this is a completely different animal in the sense of it's going to look different, but be just as big. Your business is in the data business. You help companies figure this out. Give us the update on, first take a minute talk about Alation, for the folks who aren't following you, what do you guys do, and then let's talk about data. >> Yeah. So for those of you that don't know about what Alation is, it's basically a data catalog. You know, if you think about all of the databases that exist in the enterprise, stuff on Prem, stuff in the cloud, all the BI tools like Tableau and MicroStrategy, and Business Objects. When you've got a lot of data that sits inside the enterprise today and a wide variety of legacy and modern tools, and what Alation does is, it creates a catalog, crawling all of those systems like Google crawls the web and effectively looks at all the logs inside of those systems, to understand how the data is interrelated and we create this data social graph, and it kind of looks >> John: It's a metadata catalog? >> We call you know, we don't use the word metadata because metadata is the word that people use when you know that's that's Johnny back in the corner office, Right? And people don't want to talk about metadata if you're a business person you think about metadata you're like, I don't, not my thing. >> So you guys are democratizing what data means to an organization? That's right. >> We just like to talk about context. We basically say, look in the same way that information, or in the same way when you're eating your food, you need, you know organic labeling to understand whether or not that's good or bad, we have on some level a provenance problem, a trust problem inside of data in the enterprise, and you need a layer of you know trust, and understanding in context. >> So you guys are a SAS, or you guys are a SAS solution, or are you a software subscription? >> We are both. Most of this is actually on Prem because most of the people that have the problem that Alation solves are very big complicated institutions, or institutions with a lot of data, or a lot of people trying to analyze it, but we do also have a SAS offering, and actually that's how we intersect with SAP Altiscale, and so we have a cloud base that's offering that we work with. >> Tell me about your relation SAP because you kind of backdoored in through an acquisition, quickly note that we'll get into the conversation. >> Yeah that's right, So Altiscale to big intersections, big data, and then they do big data in the cloud SAP acquired them last year and what we do is we provide a front-end capability for people to access that data in the cloud, so that as analysts want to analyze that data, as data governance folks want to manage that data, we provide them with a single catalog to do that. >> So talk about the dynamics in the industry because SAP clearly the big news there is the Leonardo, they're trying to create this framework, we just announced an alpha because everyone's got these names of dead creative geniuses, (Satyen laughs) We just ingest our Nostradamus products, Since they have Leonardo and, >> That's right. >> SAP's got Einstein, and IBM's got Watson, and Informatica has got Claire, so who thought maybe we just get our own version, but anyway, everyone's got some sort of like bot, or like AI program. >> Yep. >> I mean I get that, but the reality is, the trend is, they're trying to create a tool chest of platform re-platforming around tooling >> Satyen: Yeah. >> To make things easier. >> Satyen: Yeah. >> You have a lot of work in this area, through relation, trying to make things easier. >> Satyen: Yeah. >> And also they get the cloud, On-premise, HANA Enterprise Cloud, SAV cloud platform, meaning developers. So the convergence between developers, cloud, and data are happening. What's your take on that strategy? You think SAP's got a good move by going multi cloud, or should they, should be taking a different approach? >> Well I think they have to, I mean I think the economics in cloud, and the unmanageability, you know really human economics, and being able to have more and more being managed by third-party providers that are, you know, effectively like AWS, and how they skill, in the capability to manage at scale, and you just really can't compete if you're SAP, and you can't compete if your customers are buying, and assembling the toolkits On-premise, so they've got to go there, and I think every IT provider has to >> John: Got to go to the cloud you mean? >> They've got to go to the cloud, I think there's no question about it, you know I think that's at this point, a foregone conclusion in the world of enterprise IT. >> John: Yeah it's pretty obvious, I mean hybrid cloud is happening, that's really a gateway to multi-cloud, the submission is when I build Norton, a guest in latency multi-cloud issues there, but the reality is not every workloads gone there yet, a lot of analytics going on in the cloud. >> Satyen: Yeah. >> DevTest, okay check the box on DevTest >> Satyen: That's right. >> Analytics is all a ballgame right now, in terms of state of the art, your thoughts on the trends in how companies are using the cloud for analytics, and things that are challenges and opportunities. >> Yeah, I think there's, I think the analytics story in the cloud is a little bit earlier. I think that the transaction processing and the new applications, and the new architectures, and new integrations, certainly if you're going to build a new project, you're going to do that in the cloud, but I think the analytics in a stack, first of all there's like data gravity, right, you know there's a lot of gravity to that data, and moving it all into the cloud, and so if you're transaction processing, your behavioral apps are in the cloud, then it makes sense to keep the data in an AWS, or in the cloud. Conversely you know if it's not, then you're not going to take a whole bunch of data that sits on Prem and move it whole hog all the way to the cloud just because, right, that's super expensive, >> Yeah. >> You've got legacy. >> A lot of risks too and a lot of governance and a lot of compliance stuff as well. >> That's exactly right I mean if you're trying to comply with Basel II or GDPR, and you know you want to manage all that privacy information. How are you going to do that if you're going to move your data at the same time >> John: Yeah. >> And so it's a tough >> John: Great point. >> It's a tough move, I think from our perspective, and I think this is really important, you know we sort of say look, in a world where data is going to be on Prem, on the cloud, you know in BI tools, in databases and no SQL databases, on Hadoop, you're going to have data everywhere, and in that world where data is going to be in multiple locations and multiple technologies you got to figure out a way to manage. >> Yeah. I mean data sprawls all over the place, it's a big problem, oh and this oh and by the way that's a good thing, store it to your storage is getting cheaper and cheaper, data legs are popping out, but you have data links, for all you have data everywhere. >> Satyen: That's right. >> How are you looking at that problem as a start-up, and how a customer's dealing with that, and what is this a real issue, or is this still too early to talk about data sprawl? >> It's a real issue, I mean it, we liken it to the advent of the Internet in the time of traditional media, right, so you had you had traditional media, there were single sort of authoritative sources we all watched it may be CNN may be CBS we had the nightly news we had Newsweek, we got our information, also the Internet comes along, and anybody can blog about anything, right and so the cost of creating information is now this much lower anybody can create any reality anybody can store data anywhere, right, and so now you've got a world where, with tableau, with Hadoop, with redshift, you can build any stack you want to at any cost, and so now what do you do? Because everybody's creating their own thing, every Dev is doing their own thing, everybody's got new databases, new applications, you know software is eating the world right? >> And data it is eating software. >> And data is eating software, and so now you've got this problem where you're like look I got all this stuff, and I don't know I don't know what's fake news, what's real, what's alternative fact, what doesn't make any sense, and so you've got a signal and noise problem, and I think in that world you got to figure out how to get to truth, right, >> John: Yeah. And what's the answer to that in your mind, not that you have the answer, if you did, we'd be solving it better. >> Yeah. >> But I mean directionally where's the vector going in your mind? I try to talk to Paul Martino about this at bullpen capital he's a total analytics geek he doesn't think this big data can solve that yet but they started to see some science around trying to solve these problems with data. What's your vision on this? >> Satyen: Yeah you know so I believe that every I think that every developer is going to start building applications based on data I think that every business person is going to have an analytical role in their job because if they're not dealing with the world on the certainty, and they're not using all the evidence, at their disposable, they're not making the best decisions and obviously they're going to be more and more analysts and so you know at some level everybody is an analyst >> I wrote a post in 2008, my old blog was hosted on WordPress, before I started SilicionANGLE, data is the new developer kid. >> That's right. >> And I saw that early, and it was still not as clear to this now as obvious as least to us because we're in the middle, in this industry, but it's now part of the software fabric, it's like a library, like as developer you'd call a library of code software to come in and be part of your program >> Yeah >> Building blocks approach, Lego blocks, but now data as Lego blocks completely changes the game on things if you think of it that way. Where are we on that notion of you really using data as a development component, I mean it seems to be early, I don't, haven't seen any proof points, that says, well that company's actually using the data programmatically with software. >> Satyen: Yeah. well I mean look I think there's features in almost every software application whether it's you know 27% of the people clicked on this button into this particular thing, I mean that's a data based application right and so I think there is this notion that we talked a lot about, which is data literacy, right, and so that's kind of a weird thing, so what does that exactly mean? Well data is just information like a news article is information, and you got to decide whether it's good or it's bad, and whether you can come to a conclusion, or whether you can't, just as if you're using an API from a third-party developer you need documentation, you need context about that data, and people have to be intelligent about how they use it. >> And literacies also makes it, makes it addressable. >> That's right. >> If you have knowledge about data, at some point it's named and addressed at some point in a network. >> Satyen: Yeah. >> Especially Jada in motion, I mean data legs I get, data at rest, we start getting into data in motion, real-time data, every piece of data counts. Right? >> That's exactly right. And so now you've got to teach people about how to use this stuff you've got to give them the right data you got to make that discoverable you got to make that information usable you've got to get people to know who the experts are about the data, so they can ask questions, you know these are tougher problems, especially as you get more and more systems. >> All right, as a start up, you're a growing start-up, you guys are, are lean and mean, doing well. You have to go compete in this war. It's a lot of, you know a lot of big whales in there, I mean you got Oracle, SAP, IBM, they're all trying to transform, everybody is transforming all the incumbent winners, potential buyers of your company, or potentially you displacing this, as a young CEO, they you know eat their lunch, you have to go compete in a big game. How are you guys looking at that compass, I see your focus so I know a little bit about your plan, but take us through the mindset of a start-up CEO, that has to go into this world, you guys have to be good, I mean this is a big wave, see it's a big wave. >> Yeah. Nobody buys from a start-up unless you get, and a start-up could be even a company, less than a 100-200 people, I mean nobody's buying from a company unless there's a 10x return to value relative to the next best option, and so in that world how do you build 10x value? Well one you've got to have great technology, and then that's the start point, but the other thing is you've got to have deep focus on your customers, right, and so I think from our perspective, we build focus by just saying, look nobody understands data in your company, and by and large you've got to make money by understanding this data, as you do the digital transformation stuff, a big part of that is differentiating and making better products and optimizing based upon understanding your data because that helps you and your business make better decisions, >> John: Yeah. >> And so what we're going to do is help you understand that data better and faster than any other company can do. >> You really got to pick your shots, but what you're saying, if I hear you saying is as a start-up you got to hit the beachhead segment you want to own. >> Satyen: That's right. >> And own it. >> Satyen: That's exactly. >> No other decision, just get it, and then maybe get to a bigger scope later, and sequence around, and grow it that way. >> Satyen: You can't solve 10 problems >> Can't be groping for a beachhead if you don't know what you want, you're never going to get it. >> That's right. You can't solve 10 problems unless you solve one, right, and so you know I think we're at a phase where we've proven that we can scalably solved one, we've got customers like, you know Pfizer and Intuit and Citrix and Tesco and Tesla and eBay and Munich Reinsurance and so these are all you know amazing brands that are traditionally difficult to sell into, but you know I think from our perspective it's really about focus and just helping customers that are making that digital analytical transformation. Do it faster, and do it by enabling their people. >> But a lot going on this week for events, we had Informatica world this week, we got V-mon. We had Google I/O. We had Sapphire. It's a variety of other events going on, but I want to ask you kind of a more of a entrepreneurial industry question, which is, if we're going through the so-called digital transformation, that means a new modern era an old one movie transformed, yet I go to every event, and everyone's number one at something, that's like I was just at Informatica, they're number one in six squadrons. Michael Dell we're number in four every character, Mark Hurr at the press meeting said they're number one in all categories, Ross Perot think quote about you could be number one depends on how you slice the market, seems to be in play, my point is I kind of get a little bit, you know weirded out by that, but that is okay, you know I guess theCUBE's number one in overall live videos produced at an enterprise event, you know I, so we're number one at something, but my point is. >> Satyen: You really are. >> My point is, in a new transformation, what is the new scoreboard going to look like because a lot of things that you're talking about is horizontally integrated, there's new use cases developing, a new environment is coming online, so if someone wanted to actually try to keep score of who number one is and who's winning, besides customer wins, because that's clearly the one that you can point to and say hey they're winning customers, customer growth is good, outside of customer growth, what do you think will be the key requirements to get some sort of metric on who's really doing well these are the others, I mean we're not yet there with >> Yeah it's a tough problem, I mean you know used to be the world was that nobody gets fired for choosing choosing IBM. >> John: Yeah. >> Right, and I think that that brand credibility worked in a world where you could be conservative right, in this world I think, that looking for those measures, it is going to be really tough, and I think on some level that quest for looking for what is number one, or who is the best is actually the sort of fool's errand, and if that's what you're looking for, if you're looking for, you know what's the best answer for me based upon social signal, you know it's kind of like you know I'm going to go do the what the popular kids do in high school, I mean that could lead to you know a path, but it doesn't lead to the one that's going to actually get you satisfaction, and so on some level I think that customers, like you are the best signal, you know, always, >> John: Yeah, I mean it's hard, it's a rhetorical question, we ask it because, you know, we're trying to see not mystical with the path of fact called the fashion, what's fashionable. >> Satyen: Yeah. >> That's different. I mean talk about like really a cure metro, in the old days market share is one, actually IDC used a track who had market shares, and they would say based upon the number of shipments products, this is the market share winner, right? yeah that's pretty clean, I mean that's fairly clean, so just what it would be now? Number of instances, I mean it's so hard to figure out anyway, I digress. >> No, I think that's right, I mean I think I think it's really tough, that I think customers stories that, sort of map to your case. >> Yeah. It all comes back down to customer wins, how many customers you have was the >> Yeah and how much value they are getting out of your stuff. >> Yeah. That 10x value, and I think that's the multiplier minimum, if not more and with clouds and the scale is happening, you agree? >> Satyen: Yeah. >> It's going to get better. Okay thanks for coming on theCUBE. We have Satyen Sangani. CEO, co-founder of Alation, great start-up. Follow them on Twitter, these guys got some really good focus, learning about your data, because once you understand the data hygiene, you start think about ethics, and all the cool stuff happening with data. Thanks so much for coming on CUBE. More coverage, but Sapphire after the short break. (techno music)

Published Date : May 19 2017

SUMMARY :

brought to you by SAP Cloud Platform and I think Andreessen Horowitz is also an investor? and I wanted to bring you in and discuss So for those of you that don't know about what Alation is, that people use when you know that's So you guys are democratizing and you need a layer of you know trust, and so we have a cloud base that's offering because you kind of backdoored in through an acquisition, and then they do big data in the cloud and IBM's got Watson, You have a lot of work in this area, through relation, and data are happening. you know I think that's at this point, a lot of analytics going on in the cloud. and things that are challenges and opportunities. you know there's a lot of gravity to that data, and a lot of compliance stuff as well. and you know you want to and multiple technologies you got to figure out but you have data links, not that you have the answer, but they started to see some science data is the new developer kid. the game on things if you think of it that way. and you got to decide whether it's good or it's bad, And literacies also makes it, If you have knowledge about data, I mean data legs I get, you know these are tougher problems, I mean you got Oracle, SAP, IBM, and so in that world how do you build 10x value? is help you understand that data better and faster the beachhead segment you want to own. and then maybe get to a bigger scope later, if you don't know what you want, and so you know I think we're at a phase you know I guess theCUBE's number one in overall I mean you know you know, I mean it's so hard to figure out anyway, I mean I think I think it's really tough, how many customers you have was the Yeah and how much value they are getting and I think that's the multiplier minimum, and all the cool stuff happening with data.

ENTITIES

Entity	Category	Confidence
Michael Dell	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Paul Martino	PERSON	0.99+
John	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
Ross Perot	PERSON	0.99+
Mark Hurr	PERSON	0.99+
Palo Alto	LOCATION	0.99+
2008	DATE	0.99+
27%	QUANTITY	0.99+
Satyen	PERSON	0.99+
Satyen Sangani	PERSON	0.99+
10 problems	QUANTITY	0.99+
Orlando	LOCATION	0.99+
Catalyst Data Collective	ORGANIZATION	0.99+
CBS	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
4500 square foot	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
CNN	ORGANIZATION	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Basel II	TITLE	0.99+
eBay	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
10x	QUANTITY	0.99+
Custom Adventures	ORGANIZATION	0.99+
six squadrons	QUANTITY	0.99+
this week	DATE	0.99+
Andreessen Horowitz	PERSON	0.99+
both	QUANTITY	0.99+
Tableau	TITLE	0.98+
Informatica	ORGANIZATION	0.98+
GDPR	TITLE	0.98+
2017	DATE	0.97+
MicroStrategy	TITLE	0.97+
Intuit	ORGANIZATION	0.97+
first	QUANTITY	0.97+
third day	QUANTITY	0.96+
Norton	ORGANIZATION	0.96+
Jada	PERSON	0.96+
Johnny	PERSON	0.96+
Sapphire	ORGANIZATION	0.95+
Twitter	ORGANIZATION	0.94+
Munich Reinsurance	ORGANIZATION	0.94+
HANA Enterprise Cloud	TITLE	0.94+
less than a 100-200 people	QUANTITY	0.94+
single	QUANTITY	0.94+
Claire	PERSON	0.93+
Business Objects	TITLE	0.93+
big	EVENT	0.93+
one	QUANTITY	0.93+
Google	ORGANIZATION	0.92+
Leonardo	ORGANIZATION	0.91+
IDC	ORGANIZATION	0.9+
DevTest	TITLE	0.9+
Alation	PERSON	0.89+
Cloud Platform	TITLE	0.89+
Einstein	PERSON	0.88+
four	QUANTITY	0.87+

Stephanie McReynolds, Alation & Lee Paries, Think Big Analytics - #BigDataSV - #theCUBE

>> Voiceover: San Jose, California, tt's theCUBE, covering Big Data Silicon Valley 2017. (techno music) >> Hey, welcome back everyone. Live in Silicon Valley for Big Data SV. This is theCUBE coverage in conjunction with Strata + Hadoop. I'm John Furrier with George Gilbert at Wikibon. Two great guests. We have Stephanie McReynolds, Vice President of startup Alation, and Lee Paries who is the VP of Think Big Analytics. Thanks for coming back. Both been on theCUBE, you have been on theCUBE before, but Think Big has been on many times. Good to see you. What's new, what are you guys up to? >> Yeah, excited to be here and to be here with Lee. Lee and I have a personal relationship that goes back quite aways in the industry. And then what we're talking about today is the integration between Kylo, which was recently announced as an open source project from Think Big, and Alation's capability to sit on top of Kylo and to gather to increase the velocity of data lake initiatives, kind of going from zero to 60 in a pretty short amount of time to get both technical value from Kylo and business value from Alation. >> So talk about Alation's traction, because you guys has been an interesting startup, a lot of great press. George is a big fan. He's going to jump in with some questions, but some good product fit with the market. What's the update? What's some of the status on the traction in terms of the company and customers and whatnot? >> Yeah, we've been growing pretty rapidly for a startup. We've doubled our production customer count from last time we talked. Some great brand names. Munich Reinsurance this morning was talking about their implementation. So they have 600 users of Alation in their organization. We've entered Europe, not only with Munich Reinsurance but Tesco is a large account of ours in Europe now. And here in the States we've seen broad adoption across a wide range of industries, every one from Pfizer in the healthcare space to eBay, who's been our longest standing customer. They have about 1,000 weekly users on Alation. So not only a great increase in number of logos, but also organic growth internally at many of these companies across data scientists, data analysts, business analysts, a wide range of users of the product, as well. >> It's been interesting. What I like about your approach, and we talk about Think Big about it before, we let every guest come in so far that's been in the same area is talking about metadata layers, and so this is interesting, there's a metadata data addressability if you will for lack of a better description, but yet human usable has to be integrating into human processes, whether it's virtualization, or any kind of real time app or anything. So you're seeing this convergence between I need to get the data into an app, whether it's IoT data or something else, really really fast, so really kind of the discovery pieces now, the interesting layer, how competitive is it, and what's the different solutions that you guys see in this market? >> Yeah, I think it's interesting, because metadata has kind of had a revival, right? Everyone is talking about the importance in metadata and open integration with metadata. I think really our angle is as Alation is that having open transfer of technical metadata is very important for the foundation of analytics, but what really brings that technical metadata to life is also understanding what is the business context of what's happening technically in the system? What's the business context of data? What's the behavioral context of how that data has been used that might inform me as an analyst? >> And what's your unique approach to that? Because that's like the Holy Grail. It's like translating geek metadata, indexing stuff into like usable business outcomes. It's been a cliche for years, you know. >> The approach is really based on machine learning and AI technology to make recommendations to business users about what might be interesting to them. So we're at a state in the market where there is so much data that is available and that you can access, either in Hadoop as a data lake or in a data warehouse in a database like Teradata, that today what you need as state of the art is the system to start to recommend to you what might be interesting data for you to use as a data scientist or an analyst, and not just what's the data you could use, but how accurate is that data, how trustworthy is it? I think there's a whole nother theme of governance that's rising that's tied to that metadata discussion, which is it's not enough to just shove bits and bytes between different systems anymore. You really need to understand how has this data been manipulated and used and how does that influence my security considerations, my privacy considerations, the value I'm going to be able to get out of that data set? >> What's your take on this, 'cause you guys have a relationship. How is Think Big doing? Then talk about the partnership you guys have with Alation. >> Sure, so I mean when you look at what we've done specifically to an open source project it's the first one that Teradata has fully sponsored and released based on Apache 2.0 called Kylo, it's really about the enablement of the full data lake platform and the full framework, everywhere from ingest, to securing it, to governing it, which part of that is collecting is part of that process, the basic technical and business metadata so later you can hand it over to the user so they could sample, they could profile the data, they can find, they can search in a Google like manner, and then you can enable the organization with that data. So when you look at it from a standpoint of partnering together, it's really about collecting that data specifically within Hadoop to enable it, yet with the ability then to hand it off to more the enterprise wide solution like Alation through API connections that connect to that, and then for them they enrich it in a way that they go about it with the social collaboration and the business to extend it from there. >> So that's the accelerant then. So you're accelerating the open source project in through this new, with Alation. So you're still going to rock and roll with the open source. >> Very much going to rock and roll with the open source. So it's really been based on five years of Think Big's work in the marketplace over about 150 data lakes. The IT we've built around that to do things repeatedly, consistently, and then releasing that in the last two years, dedicated development based on Apache Spark and NiFi to stand that out. >> Great work by the way. Open sources continue to be more relevant. But I got to get your perspective on a meme that's been floating around day one here, and maybe it's because of the election, but someone said, "We got to drain the data swamp, "and make data great again." And not a play on Trump, but the data lake is going through a transition and saying, "Okay, we've got data lakes," but now this year it's been a focus on making that much more active and cleaner and making sure it doesn't become a swamp if you will. So there's been a focus of taking data lake content and getting it into real time, and IoT has kind of I think been a forcing function. But you guys, do you guys have a perspective on that on where data lakes are going? Certainly it's been trending conversation here at the show. >> Yeah, I think IoT has been part of drain that data swamp, but I think also now you have a mass of business analysts that are starting to get access to that data in the lake. These Hadoop implementations are maturing to the stage where you have-- >> John: To value coming out of it. >> Yeah, and people are trying to wring value out of that lake, and sometimes finding that it is harder than they expected because the data hasn't been pre-prepared for them. This old world of IT would pre-prepare the data, and then I got a single metric or I got a couple metrics to choose from is now turned on its head. People are taking a more exploratory, discovery oriented approach to navigating through their data and finding that the nuisances of data really matter when trying to evolve an insight. So the literacy in these organizations and their awareness of some of the challenges of a lake are coming to the forefront, and I think that's a healthy conversation for us all to have. If you're going to have a data driven organization, you have to really understand the nuisances of your data to know where to apply it appropriately to decision making. >> So (mumbles) actually going back quite a few years when he started at Microsoft said, Internet software has changed paradigm so much in that we have this new set of actions where it was discover, learn, try, buy, recommend, and it sounds like as a consumer of data in a data lake we've added or preppended this discovery step. Where in a well curated data warehouse it was learn, you had your X dimensions that were curated and refined, and you don't have that as much with the data lake. I guess I'm wondering, it's almost like if you're going to take, as we were talking to the last team with AtScale and moving OLAP to be something you consume on a data lake the way you consume on a data warehouse, it's almost like Alation and a smart catalog is as much a requirement as a visualization tool is by itself on a data warehouse? >> I think what we're seeing is this notion of data needing to be curated, and including many brains and many different perspectives in that curation process is something that's defining the future of analytics and how people use technical metadata, and what does it mean for the devops organization to get involved in draining that swamp? That means not only looking at the elements of the data that are coming in from a technical perspective, but then collaborating with a business to curate the value on top of that data. >> So in other words it's not just to help the user, the business analyst, navigate, but it's also to help the operational folks do a better job of curating once they find out who's using it, who's using the data and how. >> That's right. They kind of need to know how this data is going to be used in the organization. The volumes are so high that they couldn't possibly curate every bit and byte that is stored in the data lake. So by looking at how different individuals in the organization and different groups are trying to access that data that gives early signal to where should we be spending more time or less time in processing this data and helping the organization really get to their end goals of usage. >> Lee, I want to ask you a question. On your blog post, I just was pointed out earlier, you guys quote a Gartner stat which says, which is pretty doom and gloom, which said, "70% of Hadoop deployments in 2017 "will either fail or deliver their estimated cost savings "of their predicted revenue." And then it says, "That's a dim view, "but not shared by the Kylo community." How are you guys going to make the Kylo data lake software work well? What's your thoughts on that? Because I think people, that's the number one, again, question that I highlighted earlier is okay, I don't want a swamp, so that's fear, whether they get one or not, so they worry about data cleansing and all these things. So what's Kylo doing that's going to accelerate, or lower that number, of fails in the data lake world? >> Yeah sure, so again, a lot of it's through experience of going out there and seeing what's done. A lot of people have been doing a lot of different things within the data lakes, but when you go in there there's certain things they're not doing, and then when you're doing them it's about doing them over consistently and continually improving upon that, and that's what Kylo is, it's really a framework that we keep adding to, and as the community grows and other projects come in there can enhance it we bring the value. But a lot of times when we go in it it's basically end users can't get to the data, either one because they're not allowed to because maybe it's not secured and relied to turn it over to them and let them drive with it, or they don't know the data is there, which goes back to basic collecting the basic metadata and data (mumbles) to know it's there to leverage it. So a lot of times it's going back and looking at and leveraging what we have to build that solid foundation so IT and operations can feel like they can hand that over in a template format so business users could get to the data and start acting off of that. >> You just lost your mic there, but Stephanie, I got to ask you a question. So just on a point of clarification, so you guys, are you supporting Kylo? Is that the relationship, or how does that work? >> So we're integrated with Kylo. So Kylo will ingest data into the lake, manage that data lake from a security perspective giving folks permissions, enables some wrangling on that data, and what Alation is receiving then from Kylo is that technical metadata that's being created along that entire path. >> So you're certified with Kylo? How does that all work from the customer standpoint? >> That's a very much integration partnership that we'd be working together. >> So from a customer standpoint it's clean and you then provide the benefits on the other side? >> Correct. >> Yeah, absolutely. We've been working with data lake implementations for some time, since our founding really, and I think this is an extension of our philosophy that the data lakes are going to play an important role that are going to complement databases and analytics tools, business intelligence tools, and the analytics environment, and the open source is part of the future of how folks are building these environments. So we're excited to support the Kylo initiative. We've had a longstanding relationship with Teradata as a partner, so it's a great way to work together. >> Thanks for coming on theCUBE. Really appreciate it, and thank... What do you think of the show you guys so far? What's the current vibe of the show? >> Oh, it's been good so far. I mean, it's one day into it, but very good vibe so far. Different topics and different things-- >> AI machine learning. You couldn't be more happier with that machine learning-- >> Great to see machine learning taking a forefront, people really digging into the details around what it means when you apply it. >> Stephanie, thanks for coming on theCUBE, really appreciate it. More CUBE coverage after the show break. Live from Silicon Valley, I'm John Furrier with George Gilbert. We'll be right back after this short break. (techno music)

Published Date : Mar 15 2017

SUMMARY :

(techno music) What's new, what are you guys up to? and to gather to increase He's going to jump in with some questions, And here in the States we've seen broad adoption that you guys see in this market? Everyone is talking about the importance in metadata Because that's like the Holy Grail. is the system to start to recommend to you Then talk about the partnership you guys have with Alation. and the business to extend it from there. So that's the accelerant then. and NiFi to stand that out. and maybe it's because of the election, to the stage where you have-- and finding that the nuisances of data really matter to be something you consume on a data lake and many different perspectives in that curation process but it's also to help the operational folks and helping the organization really get in the data lake world? and data (mumbles) to know it's there to leverage it. but Stephanie, I got to ask you a question. and what Alation is receiving then from Kylo that we'd be working together. that the data lakes are going to play an important role What's the current vibe of the show? Oh, it's been good so far. You couldn't be more happier with that machine learning-- people really digging into the details More CUBE coverage after the show break.

ENTITIES

Entity	Category	Confidence
Stephanie McReynolds	PERSON	0.99+
George Gilbert	PERSON	0.99+
Europe	LOCATION	0.99+
Stephanie	PERSON	0.99+
Lee	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
Lee Paries	PERSON	0.99+
George	PERSON	0.99+
Trump	PERSON	0.99+
2017	DATE	0.99+
John	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Think Big	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
70%	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
Alation	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Think Big Analytics	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Gartner	ORGANIZATION	0.99+
zero	QUANTITY	0.99+
Kylo	ORGANIZATION	0.99+
60	QUANTITY	0.99+
600 users	QUANTITY	0.98+
AtScale	ORGANIZATION	0.98+
eBay	ORGANIZATION	0.98+
Google	ORGANIZATION	0.98+
today	DATE	0.98+
first one	QUANTITY	0.98+
Hadoop	TITLE	0.98+
Both	QUANTITY	0.98+
both	QUANTITY	0.97+
Two great guests	QUANTITY	0.97+
this year	DATE	0.97+
about 1,000 weekly users	QUANTITY	0.97+
one day	QUANTITY	0.95+
single metric	QUANTITY	0.95+
Apache Spark	ORGANIZATION	0.94+
Kylo	TITLE	0.93+
Wikibon	ORGANIZATION	0.93+
NiFi	ORGANIZATION	0.92+
about 150 data lakes	QUANTITY	0.92+
Apache 2.0	TITLE	0.89+
this morning	DATE	0.88+
couple	QUANTITY	0.86+
Big Data Silicon Valley 2017	EVENT	0.84+
day one	QUANTITY	0.83+
Vice President	PERSON	0.81+
Strata	TITLE	0.77+
Kylo	PERSON	0.77+
#theCUBE	ORGANIZATION	0.76+
Big Data	ORGANIZATION	0.75+
last two years	DATE	0.71+
one	QUANTITY	0.7+
Munich Reinsurance	ORGANIZATION	0.62+
CUBE	ORGANIZATION	0.52+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Tesco: