Sam Lightstone, IBM - Chief Data Scientist, USA - #theCUBE

hey welcome back here ready Jeff Rick here with the key we're at the chief data scientist USA conference in downtown San Francisco and we're really excited to have a representative from IBM Sam Lightstone distinguished engineer from IBM join us Sam great to se you thank you very much pleasure to be here absolutely so we cover a ton of IBM events we're at world of Watson world lots of developer conference big the big event in New York earlier this year around strata so you know we're big fans of all the things that IBM is doing and in Rob Thomas and the SPARC group so I could go on and on but we won't go there we'll talk about what you were talking about earlier today and kind of let the cat out of the bag which is always exciting breaking news or breaking Bay there I don't know exactly how we would describe it but you talked about something new IBM data confluence yeah you could share this what's that all about yeah so it's a it's a whole new idea a whole new paradigm that were that we were incubating right now inside of IBM and it's not yet available but we're hoping to start trials in January ish timeframe but it comes from a realization that so much data is about to come upon us from distributed data sources you know everybody's got not only your cell phone but increasingly data is coming from Internet of Things you're gonna have data coming from your car data come from your glasses some smart meters on your house and it's deluge of data and the way that people like to do data science on this data today is they pull this data from these devices and put it into a central repository which is which is a perfectly legitimate strategy but it means that you're creating copies of the data and there's a certain complexity of dragging that data through the internet into some central repository so the idea that we had with data confluence is to leave the data where it is and create and allow the data all these different data sources if you can imagine cars you can imagine cell phones or smart meters on buildings allow them to find one another and collaborate on data science problems like a computational mesh so that we can bring hundreds thousands millions of microprocessors to bear on the data where it lives without moving it around and our theory is not only is that simpler for everyone because the data doesn't have to move around but we can actually bring more computation to bear because every one of those data sources has compute and has persistence and you can multiply the the opportunities right and you took a chance you ran a live demo which is you know always risky business at any anything but but there was a really interesting because concepts that you highlighted kind of organically forming adapting constellation right of these of these sources and the example you use they were solar panels but for them to do this kind of automatically if you will as opposed to someone going in and scripting and building the structure because tomorrow as you demonstrated in your demo you might want to add more or add more so exactly that dynamic functions are pretty pretty interesting yeah and it's a very powerful concept and a very necessary concept and the reason it's so necessary is these devices could be anywhere right and you could have most your devices in New York but a few of them in the Yukon or Alaska or something and you don't want them to all be equally connected right so it's important to be sensitive to create this network that is sort of geospatially aware and connectivity aware not not just sort of hard-coded you know so that so one aspect of that is to be sensitive to network latency and topology that's one reason why it has to be automatic the other reason has to be automatic is if you really want this to scale to thousands of devices you can't have some programmer trying to figure out who connects to what right it's just too hard right so making it really adaptive and automatic is super important another thing that's really important for the Internet of Things is depending on the on the circumstance but if you can imagine cell phones for example you can have a network of thousands millions of phones but at any point in time somebody some of those funds are gonna be turned off so the network has to be adaptive to the possibility that devices go offline right are there intentionally like a phone perhaps unintentionally because they break you know if you have a device on a smart meter it may simply break and then that particular device is offline for a period of time right so the network has to be resilient to that and that's part of what we've been building in particular using technology that we incubated in our UK labs in Hursley so it's it's been a great collaboration across IBM this is not just you know one you know one set of people in one lab but actually a corporate collaboration and really our goal is to make this as you say automatic but I would I would say beyond automatic to make it resilient right there's got to be resilient and fault tolerant because the complexities that we could be dealing with are just too large for human being to deal with right and clearly and distributed right that's the big thing guys we're leveraging IBM bluemix cloud you know all this stuff doesn't happen with with cloud capabilities and the demo you did here you were here the data center was concerned San Jose and the actual data elements were in in Toronto so just you know Amazon and Microsoft and Google are always you know get talked about a lot it within the cloud space but really iBM is making major players and it if not in that top three certainly right there in the fourth position as a leader in cloud and then what this cloud enables and then really kind of with the whole cognitive push you know that's a priority for Ginni and the team to really bring more intelligence he's exactly right and what data confluence you know what we're hoping not only to tap in to data science on distributed systems for IOT and also for enterprise use cases as well but really to take it to the next level of hybrid cloud because these data sources could be in the cloud and they could be on-premises they could be anywhere in the world and you can mix and match and that's really a very powerful capability for our customers many companies now struggling as their data is now part cloud and part on-premises right and in the compute as well right you could deal shift exactly compute from the edge to the cloud you know a dynamic fashion based on what the kind of optimal solution is or as you said sometimes over the edges off lined and you can't do it there it's exactly right so kind of a cool story you said this came out of a out of something called blue unicorn what is blue you know fantastic so blue unicorn was an initiative that a few of us got together on inside of IBM you probably know some of these folks Rob Thomas so I think you've interviewed gears from Karachay Leah and myself and the three of us got together and we said you know we want to find a more effective way to tap in to the creative juices of our staff we got some of the greatest minds in the world working at IBM we hire brilliant people PhDs masters of the top schools all over the world and all too often we hire these people and we tell them what they should be working on that wouldn't it be better if we could find a repeatable process for them to come to us and say here's the next big innovation that IBM ssin should have and blue unicorn came out of that desire to tap into and and nurture this creative passion of of our staff and was really designed almost like an internal VC initiative so people would would come to us with proposals and we've got those proposals we start out with hundreds and feted it down to dozens that down to just a small few that we would fund from the ones that we funded you know that would go through periodic reviews until eventually we ended up with a very small set that are still being incubated and and did a confluence happen to have been one of those projects awesome so it's different than kind of the 10% thing this is actually almost like an internal you you put your proposal together you pitch it whereas if it was an internal VC you get funded and then you go do that with your team right one thing I would say is one of the you know as we're setting up we're trying to find ways to make it work make it efficient one of the best filtering factors that we came up with is that people had to show us running code before it was funded right right and that was amazing because that meant people had to work nights and weekends they had to have that level of passion and commitment for their idea to get to that level of vetting and that was incredible that that definitely filtered the people who were super passionate about what they were doing and the people just said yeah I'd like to tinker and that was tremendous okay and then you're here at the show melting a small show tight group kind of multi industry any good takeaway surprises from the last couple days here at the chief data science USA show you know it's been an amazing conference actually and some great speakers some great insights I think one of the most useful insights for me was was I was curious to hear from this audience what is the duration of data that is important to them do they need to see data from the last hour the last month the last year the last 10 years and of course it does vary from problem to problem but many people said you know for the work that I do I need about three months to build a model and then once I have a model I'm really looking at the last two to four weeks of data to gain data science insight and that was a very important point for me especially as we continue on our work on analytics data science and IBM it's very important for us to understand the range of data that that people are using shorter than you seem sure yeah it's shorter because I know certainly in the data warehousing space that I've been working a lot of my career in people do data analytics on you know six months a year or three years right so this is this is it definitely is somewhat of a shift and it tells us something about our society that things are moving faster and the idea that's older than six months is is usually not as interesting anymore yeah really shows kind of the dynamic real-time nature it's not this is analyzing just the old stuff is interesting but not nearly as interesting as being on top of where's the spark stream somebody's other thing is funny Beth Comstock kicked off the GU minds and machines event a couple days ago she said we even walk faster in cities they've done so everything is continuing to speed up right all right so you're from now you're back here what are we gonna be talking about Wow okay well you know we just launched a few months or a few weeks ago actually the the Watson Data Platform a huge event for us and it really is for us the foundation the data foundation of all the cognitive computing that we're that IBM is coming out with it's gonna bring together data science and data storage and collaboration across you know amongst analysts and data scientists together all all one platform for all your data needs I'm hoping that a year from now I'm going to speak to you about how data confluence is a core part of that of that platform and we're gonna be raeng analytics on millions of devices all over the world all right Sam well thanks for taking a few minutes I know you gotta go catch an airplane for stopping by and sharing your insight thank you all right Sam lights on I'm Jeff Creek you're watching the cube thanks for watching

Published Date : Nov 18 2016

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Jeff Rick	PERSON	0.99+
New York	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
Toronto	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Karachay Leah	PERSON	0.99+
Alaska	LOCATION	0.99+
San Jose	LOCATION	0.99+
Sam Lightstone	PERSON	0.99+
Sam Lightstone	PERSON	0.99+
Jeff Creek	PERSON	0.99+
three years	QUANTITY	0.99+
Beth Comstock	PERSON	0.99+
hundreds	QUANTITY	0.99+
thousands	QUANTITY	0.99+
three	QUANTITY	0.99+
January	DATE	0.99+
millions of devices	QUANTITY	0.99+
thousands of devices	QUANTITY	0.99+
10%	QUANTITY	0.99+
Ginni	PERSON	0.99+
Yukon	LOCATION	0.99+
last year	DATE	0.98+
Hursley	LOCATION	0.98+
dozens	QUANTITY	0.98+
SPARC	ORGANIZATION	0.98+
UK	LOCATION	0.98+
four weeks	QUANTITY	0.98+
Sam lights	PERSON	0.97+
tomorrow	DATE	0.97+
Sam	PERSON	0.97+
earlier this year	DATE	0.97+
about three months	QUANTITY	0.97+
one reason	QUANTITY	0.97+
older than six months	QUANTITY	0.97+
last month	DATE	0.97+
today	DATE	0.96+
six months a year	QUANTITY	0.96+
fourth position	QUANTITY	0.96+
iBM	ORGANIZATION	0.95+
blue unicorn	TITLE	0.94+
hundreds thousands millions of microprocessors	QUANTITY	0.92+
blue unicorn	TITLE	0.92+
one	QUANTITY	0.92+
one lab	QUANTITY	0.91+
earlier today	DATE	0.9+
a few weeks ago	DATE	0.9+
a couple days ago	DATE	0.85+
chief data science	ORGANIZATION	0.84+
one aspect	QUANTITY	0.83+
millions of phones	QUANTITY	0.82+
downtown San Francisco	LOCATION	0.82+
top three	QUANTITY	0.82+
USA	LOCATION	0.81+
one of those projects	QUANTITY	0.78+
last 10 years	DATE	0.77+
a year from	DATE	0.77+
one set of people	QUANTITY	0.74+
a few months	DATE	0.72+
last couple days	DATE	0.7+
Chief Data Scientist	PERSON	0.68+
last	QUANTITY	0.66+
last hour	DATE	0.63+
two	QUANTITY	0.63+
lots of developer	QUANTITY	0.61+
bluemix	COMMERCIAL_ITEM	0.56+
Watson	EVENT	0.51+
Watson	ORGANIZATION	0.46+
strata	ORGANIZATION	0.4+
blue	TITLE	0.38+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for blue unicorn: