Democratizing AI & Advanced Analytics with Dataiku x Snowflake | Snowflake Data Cloud Summit

>> My name is Dave Vellante. And with me are two world-class technologists, visionaries and entrepreneurs. Benoit Dageville, he co-founded Snowflake and he's now the President of the Product Division, and Florian Douetteau is the Co-founder and CEO of Dataiku. Gentlemen, welcome to the cube to first timers, love it. >> Yup, great to be here. >> Now Florian you and Benoit, you have a number of customers in common, and I've said many times on theCUBE, that the first era of cloud was really about infrastructure, making it more agile, taking out costs. And the next generation of innovation, is really coming from the application of machine intelligence to data with the cloud, is really the scale platform. So is that premise relevant to you, do you buy that? And why do you think Snowflake, and Dataiku make a good match for customers? >> I think that because it's our values that aligned, when it gets all about actually today, and knowing complexity of our customers, so you close the gap. Where we need to commoditize the access to data, the access to technology, it's not only about data. Data is important, but it's also about the impacts of data. How can you make the best out of data as fast as possible, as easily as possible, within an organization. And another value is about just the openness of the platform, building a future together. Having a platform that is not just about the platform, but also for the ecosystem of partners around it, bringing the level of accessibility, and flexibility you need for the 10 years of that. >> Yeah, so that's key, that it's not just data. It's turning data into insights. Now Benoit, you came out of the world of very powerful, but highly complex databases. And we know we all know that you and the Snowflake team, you get very high marks for really radically simplifying customers' lives. But can you talk specifically about the types of challenges that your customers are using Snowflake to solve? >> Yeah, so the challenge before snowflake, I would say, was really to put all the data in one place, and run all the computes, all the workloads that you wanted to run against that data. And of course existing legacy platforms were not able to support that level of concurrency, many workload, we talk about machine learning, data science, data engineering, data warehouse, big data workloads, all running in one place didn't make sense at all. And therefore be what customers did this to create silos, silos of data everywhere, with different system, having a subset of the data. And of course now, you cannot analyze this data in one place. So Snowflake, we really solved that problem by creating a single architecture where you can put all the data into cloud. So it's a really cloud native. We really thought about how solve that problem, how to create, leverage cloud, and the elasticity of cloud to really put all the data in one place. But at the same time, not run all workload at the same place. So each workload that runs in Snowflake, at its dedicated compute resources to run. And that makes it agile, right? Florian talked about data scientist having to run analysis, so they need a lot of compute resources, but only for a few hours. And with Snowflake, they can run these new workload, add this workload to the system, get the compute resources that they need to run this workload. And then when it's over, they can shut down their system, it will automatically shut down. Therefore they would not pay for the resources that they don't use. So it's a very agile system, where you can do this analysis when you need, and you have all the power to run all these workload at the same time. >> Well, it's profound what you guys built. I mean to me, I mean of course everybody's trying to copy it now, it was like, I remember that bringing the notion of bringing compute to the data, in the Hadoop days. And I think that, as I say, everybody is sort of following your suit now or trying to. Florian, I got to say the first data scientist I ever interviewed on theCUBE, it was the amazing Hillary Mason, right after she started at Bitly, and she made data sciences sounds so compelling, but data science is a hard. So same question for you, what do you see as the biggest challenges for customers that they're facing with data science? >> The biggest challenge from my perspective, is that once you solve the issue of the data silo, with Snowflake, you don't want to bring another silo, which will be a silo of skills. And essentially, thanks to the talent gap, between the talent available to the markets, or are released to actually find recruits, train data scientists, and what needs to be done. And so you need actually to simplify the access to technologies such as, every organization can make it, whatever the talent, by bridging that gap. And to get there, there's a need of actually backing up the silos. Having a collaborative approach, where technologies and business work together, and actually all puts up their ends into those data projects together. >> It makes sense, Florain let's stay with you for a minute, if I can. Your observation space, it's pretty, pretty global. And so you have a unique perspective on how can companies around the world might be using data, and data science. Are you seeing any trends, maybe differences between regions, or maybe within different industries? What are you seeing? >> Yeah, definitely I do see trends that are not geographic, that much, but much more in terms of maturity of certain industries and certain sectors. Which are, that certain industries invested a lot, in terms of data, data access, ability to store data. As well as experience, and know region level of maturity, where they can invest more, and get to the next steps. And it's really relying on the ability of certain leaders, certain organizations, actually, to have built these long-term data strategy, a few years ago when no stats reaping of the benefits. >> A decade ago, Florian, Hal Varian famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of changed that to data scientist. And then everybody, all the statisticians became data scientists, and they got a raise. But data science requires more than just statistics acumen. What skills do you see as critical for the next generation of data science? >> Yeah, it's a great question because I think the first generation of data scientists, became data scientists because they could have done some Python quickly, and be flexible. And I think that the skills of the next generation of data scientists will definitely be different. It will be, first of all, being able to speak the language of the business, meaning how you translates data insight, predictive modeling, all of this into actionable insights of business impact. And it would be about how you collaborate with the rest of the business. It's not just how fast you can build something, how fast you can do a notebook in Python, or do predictive models of some sorts. It's about how you actually build this bridge with the business, and obviously those things are important, but we also must be cognizant of the fact that technology will evolve in the future. There will be new tools, new technologies, and they will still need to keep this level of flexibility to understand quickly what are the next tools they need to use a new languages, or whatever to get there. >> As you look back on 2020, what are you thinking? What are you telling people as we head into next year? >> Yeah, I think it's very interesting, right? This crises has told us that the world really can change from one day to the next. And this has dramatic and perform the aspects. For example companies all of a sudden, show their revenue line dropping, and they had to do less with data. And some other companies was the reverse, right? All of a sudden, they were online like Instacart, for example, and their business completely changed from one day to the other. So this agility of adjusting the resources that you have to do the task, and need that can change, using solution like Snowflake really helps that. Then we saw both in our customers. Some customers from one day to the next, were growing like big time, because they benefited from COVID, and their business benefited. But others had to drop. And what is nice with cloud, it allows you to adjust compute resources to your business needs, and really address it in house. The other aspect is understanding what happening, right? You need to analyze. We saw all our customers basically, wanted to understand what is the going to be the impact on my business? How can I adapt? How can I adjust? And for that, they needed to analyze data. And of course, a lot of data which are not necessarily data about their business, but also they are from the outside. For example, COVID data, where is the States, what is the impact, geographic impact on COVID, the time. And access to this data is critical. So this is the premise of the data cloud, right? Having one single place, where you can put all the data of the world. So our customer obviously then, started to consume the COVID data from that our data marketplace. And we had delete already thousand customers looking at this data, analyzing these data, and to make good decisions. So this agility and this, adapting from one hour to the next is really critical. And that goes with data, with cloud, with interesting resources, and that doesn't exist on premise. So indeed I think the lesson learned is we are living in a world, which is changing all the time, and we have to understand it. We have to adjust, and that's why cloud some ways is great. >> Excellent thank you. In theCUBE we like to talk about disruption, of course, who doesn't? And also, I mean, you look at AI, and the impact that it's beginning to have, and kind of pre-COVID. You look at some of the industries that were getting disrupted by, everyone talks about digital transformation. And you had on the one end of the spectrum, industries like publishing, which are highly disrupted, or taxis. And you can say, okay, well that's Bits versus Adam, the old Negroponte thing. But then the flip side of, you say look at financial services that hadn't been dramatically disrupted, certainly healthcare, which is ripe for disruption, defense. So there a number of industries that really hadn't leaned into digital transformation, if it ain't broke, don't fix it. Not on my watch. There was this complacency. And then of course COVID broke everything. So Florian I wonder if you could comment, what industry or industries do you think are going to be most impacted by data science, and what I call machine intelligence, or AI, in the coming years and decade? >> Honestly, I think it's all of them, or at least most of them, because for some industries, the impact is very visible, because we have talking about brand new products, drones, flying cars, or whatever that are very visible for us. But for others, we are talking about a part from changes in the way you operate as an organization. Even if financial industry itself doesn't seem to be so impacted, when you look at it from the consumer side, or the outside insights in Germany, it's probably impacted just because the way you use data (mumbles) for flexibility you need. Is there kind of the cost gain you can get by leveraging the latest technologies, is just the numbers. And so it's will actually comes from the industry that also. And overall, I think that 2020, is a year where, from the perspective of AI and analytics, we understood this idea of maturity and resilience, maturity meaning that when you've got to crisis you actually need data and AI more than before, you need to actually call the people from data in the room to take better decisions, and look for one and a backlog. And I think that's a very important learning from 2020, that will tell things about 2021. And the resilience, it's like, data analytics today is a function transforming every industries, and is so important that it's something that needs to work. So the infrastructure needs to work, the infrastructure needs to be super resilient, so probably not on prem or not fully on prem, at some point. And the kind of resilience where you need to be able to blend for literally anything, like no hypothesis in terms of BLOs, can be taken for granted. And that's something that is new, and which is just signaling that we are just getting to a next step for data analytics. >> I wonder Benoir if you have anything to add to that. I mean, I often wonder, when are machines going to be able to make better diagnoses than doctors, some people say already. Will the financial services, traditional banks lose control of payment systems? What's going to happen to big retail stores? I mean, maybe bring us home with maybe some of your finals thoughts. >> Yeah, I would say I don't see that as a negative, right? The human being will always be involved very closely, but then the machine, and the data can really help, see correlation in the data that would be impossible for human being alone to discover. So I think it's going to be a compliment not a replacement. And everything that has made us faster, doesn't mean that we have less work to do. It means that we can do more. And we have so much to do, that I will not be worried about the effect of being more efficient, and bare at our work. And indeed, I fundamentally think that data, processing of images, and doing AI on these images, and discovering patterns, and potentially flagging disease way earlier than it was possible. It is going to have a huge impact in health care. And as Florian was saying, every industry is going to be impacted by that technology. So, yeah, I'm very optimistic. >> Great, guys, I wish we had more time. I've got to leave it there, but so thanks so much for coming on theCUBE. It was really a pleasure having you.

Published Date : Nov 9 2020

SUMMARY :

and Florian Douetteau is the And the next generation of innovation, the access to data, about the types of challenges all the workloads that you of bringing compute to the And essentially, thanks to the talent gap, And so you have a unique perspective And it's really relying on the that the sexy job in the next 10 years of the next generation the resources that you have and the impact that And the kind of resilience where you need Will the financial services, and the data can really help, I've got to leave it there,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Benoit	PERSON	0.99+
Florian Douetteau	PERSON	0.99+
Florian	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
Dataiku	ORGANIZATION	0.99+
2020	DATE	0.99+
Hillary Mason	PERSON	0.99+
Hal Varian	PERSON	0.99+
10 years	QUANTITY	0.99+
Python	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
one hour	QUANTITY	0.99+
both	QUANTITY	0.99+
next year	DATE	0.99+
Bitly	ORGANIZATION	0.99+
one day	QUANTITY	0.98+
2021	DATE	0.98+
A decade ago	DATE	0.98+
one place	QUANTITY	0.97+
Snowflake Data Cloud Summit	EVENT	0.97+
Snowflake	TITLE	0.96+
each workload	QUANTITY	0.96+
today	DATE	0.96+
first generation	QUANTITY	0.96+
Benoir	PERSON	0.95+
snowflake	EVENT	0.94+
first era	QUANTITY	0.92+
COVID	OTHER	0.92+
single architecture	QUANTITY	0.91+
thousand customers	QUANTITY	0.9+
first data scientist	QUANTITY	0.9+
one	QUANTITY	0.88+
one single place	QUANTITY	0.87+
few years ago	DATE	0.86+
Negroponte	PERSON	0.85+
Florain	ORGANIZATION	0.82+
two world	QUANTITY	0.81+
first	QUANTITY	0.8+
Instacart	ORGANIZATION	0.75+
next 10 years	DATE	0.7+
hours	QUANTITY	0.67+
Snowflake	EVENT	0.59+
a minute	QUANTITY	0.58+
theCUBE	ORGANIZATION	0.55+
Adam	PERSON	0.49+

Wrap | Adobe Imagine 2019

>> Live, from Las Vegas, it's theCUBE, covering Magento Imagine 2019, brought to you Adobe. >> Welcome back to theCUBE, Lisa Martin with Jeff Frick. We have been covering Imagine 2019 in Vegas, all day today, talking all things eCommerce, innovation, technology, the customer experience. Jeff, one of the biggest themes, I think, that we've heard today, from all of our guests, is how strong this community is, how naturally it was developed in the last ten years, and how influential it is to delivering exceptional customer experience technology. >> In fact, Jason said without the community, there would be no Magento. So it's, it's ingrained in the culture. It's ingrained in the DNA. I think, you know, doing some of the research, you know, there was people talking about the dark days of Magento, as it went into eBay, and apparently whatever that plan was, that didn't work. And then out of eBay into private equity. Out of private equity into, now, Adobe. And it sounds like the community's kind of been following along, and maybe they were holding their breath a little bit, a year ago, but it sounds like they kind of got through that, that kind of concern knothole, if you will, and kind of popped out the other side, and realized there's a whole lot of opportunity that comes to Magento, via being part of Adobe now that they didn't have before. So I think, it sounds like they're good with it, and they're ready to go, and nothing but opportunity ahead. >> Yeah, you know, I think with any acquisition, and, you know, we cover so many technology shows, and we've been part of acquisitions before at different companies. They're challenging. There's always, I think, natural trepidation. I think it's just a natural response that anybody, probably, from an executive to an individual contributor level, is going to have. But one of the things that came up so resolutely, was how organic the Magento community has been developed over time. That, like you said, as Jason was saying, without it, there is no Magento. Not only are they influential. It's very much a symbiotic relationship, that pleasantly, surprisingly, sounds like it's been integrated very nicely, into Adobe. And to your point, they now are seeing, wow, there's a tremendous amount of technology and resources that we didn't have the opportunity to leverage before. Talking about the experience, the digital experience business of Adobe's, which is growing. Grew 20% year over year, 2017 to 2018. On a very strong trajectory this year. A lot of opportunity to enable merchants of any size to have this really 360 degree of the customer experience, and manage it with analytics, and advertising, and marketing, and add the commerce piece, so that they can take that marketing interaction and actually convert it to revenue. >> Right, right. I mean, look at Adobe. I mean, they brought in Magento, which we know, late last year. They also brought in Marketo at almost about the same time, $4.7 billion. So they're making huge moves. And I think it's a pretty unique situation, where, again, they come from the creative, and now, with the data, and a sophisticated platform, and you talk about the AB testing, again. It used to be just AB, now it's AB times literally millions and millions of customized experiences delivered to the client. And then now, again, I think really an interesting point of view is where then you bring the commerce to the point of engagement rather than trying to use the engagement as a way to drive people to commerce. I mean, they seem really well positioned, I think they're going to really enjoy people like Accenture, and some of the of the other big system integrators that now are going to be, you know, behind this platform. So it seems to be a fit, a marriage made in heaven. It almost makes you wonder why Adobe was so late to have an eCommerce platform, which is the thing that kind of surprises me, I think, the most. >> Yeah, well, it also gives them the opportunity to compete with Shopify and with Salesforce Commerce, and kind of harness this brand power. But you talked about something that we've talked about all day, and that's bringing the transaction and the commerce experience to me as a consumer wherever I am, whether it's in app shopping through Instagram. Rather than, you know, delivering me a personalized experience, leveraging the power of these technologies, to understand the right things about me as a consumer, to deliver me an experience that is frictionless. It's going to allow me to have a seamless experience. We talked about that with progressive web apps, and how that's going to enable next generation shopping for merchants of all sizes to enable. Don't just engage me on my mobile, if that's where I want to be. If you don't have the opportunity to convert me seamlessly to actually transact, there's a huge adjustable market or gap in converting that to revenue, which Jason Woolsey also talked about. Kind of thinking about next steps for Adobe and what they're going to be able to do to help those merchants capture in real time, leveraging the power of technology, emerging technologies like AI, in real-time to make that shoppable moment turn into dollars for the merchant. >> Right, lot of great things. I thought it was interesting having TJ Gamble on, and talked about coopetition. Right? Coopetition is such a fundamental part of Silicon Valley and the world in which we live in. And he said, you know, if you're making fat margin, as Jeff Bezos loves to say, your margin is my opportunity. You're going to compete with Amazon, but in the meantime, you got to compete with them. So to enable integration into the Amazon platform with your Magento store, the integration into Google Shopping, integration into Instagram purchases, in app purchases, I mean, these really opening up the opportunities for these smaller retailers, mid-sized retailers, to compete in a really complicated and super hyper-competitive world. But now they can, again, focus on their brand, which we hear over and over and over, focus on their experience, focus on their community, and leverage some of this special breed technology under the covers across platform, across different modes of buying. Because the other thing we hear over and over and over is you got to give people choice. You can't say no. So if they want to buy it through Amazon, let 'em buy it through Amazon. If they want to buy it through Instagram, let 'em buy it through Instagram. If they want to come to you eCommerce site, let 'em come to your eCommerce site But, you know, in opening up all those channels for the merchant to be able to execute their transactions regardless of how the customer got to them, or how, more importantly, they got to the customer. >> And, you know, the SMB front is really key that you brought up, because, in the last year, since the acquisition was announced, about a year ago, and completed, I think in September of 2018, there was not just concern from the community, that we talked about at the beginning of this segment, but also the small and the medium business. Like, well, Adobe has a really big presence in enterprise. Is that going to be cannibalized with this acquisition of Magento, who had such a strong presence with those smaller merchants? And you mentioned some of thee things with Amazon and Google that we heard yesterday and today. I think really assuaging some of those concerns that the smaller businesses had, but also, allowing these smaller merchants to sort of level the playing field, and have access to the power of a branded Amazon storefront that allows a smaller business to get some differentiation, whereas before they didn't have that. So I think we heard a lot about that today, and how, I think, those smaller brands are probably, maybe breathing a sign of relief, that this acquisition is really going to enable them, with a lot more tools, but not at the, you know, cannibalizing what they have been doing with Magento for so long. >> Right, right. And some other fun discussions. I really enjoyed the time with Tina, talking about influencer marketing. It's amazing how that continues to evolve at a really fast pace. Right? A derivation of professional endorsement, which is something we've known ever since Joe Namath put on stockings many moons ago. But to see it go from big influencers, to micro-influencers, you know. How do you sponsor people, give them money, engage as a brand, and still maintain that they legitimately like your product, use your product. I think it's a really fascinating space to, again, to be able to purchase within that Instagram application, I think, is really interesting. And then a lot of conversations about the post transaction engagement. You know, send them not one email confirmation that your items are coming, but send them two. And really to think about lifetime value of the customer, and engaging the customer via content, and, oh, by the way, there'll be some transactions in commerce as well. I think it's really forward-looking, and really enjoyed that conversation as well. >> I did too. I didn't know the difference between an influencer and a micro-influencer, and you kind of infer based on just the name alone. But also how brands have the opportunity to leverage data, to evaluate maybe we should actually make more investments in somebody with a thousand followers, for example, than somebody with a hundred thousand. Because the revenue attribution, or the website traffic lift that they're going to get from a micro-influencer could far outweigh the benefits, financially, than going with somebody, a celebrity or what not, that, as you said, back to, you know, Joe Namath, many decades ago. So that was interesting, but it's also a good use of using data to build brand reputation, build, increase customer lifetime value, but also get so much more targeted, and really understand how to operationalize the commerce portion of your business, and through whom, through which channels you're going to see the biggest bang for your buck. >> Yeah, it's really interesting times, you know, this idea that the apps follow you. I mean, my favorite example is Spotify. Super sophisticated app. Right? I can be listening to my phone. I get into my car. It follows me. I go into my office. It follows me on my computer. I go out on my bike. It follows me. It stays the same state. And so, for the commerce and the community to be able to follow you around is a really interesting idea. And again, it was Hillary Mason, actually, that first came up with the term that, you know, AI, and good recommendations done well are magic, and done poorly, are creepy. I think it's always going to be this interesting fine line. Again, I think the whole concept of, you know, using old data and how fast do you update it, and that's kind of the example. I've been looking at tents. I bought a tent. I don't want to see ads for tents anymore. Right? It's time to see an ad for a sleeping bag, or a camp stove. And these are really happening in real-time. You know, we've heard about Omnichannel. We've heard about 360 view of the customer, ad nauseam. You've been in this business for a long time. But it sounds like it's finally coming together, and it's finally where we have the data, we have the access to the data, the speed of the analytics, and just the raw horsepower in modeling that we can now start to apply this real-time, ML, to data, in-flight, to be able to serve up the not creepy but correct recommendations, at the right time to the right person. It's getting closer and closer to reality. >> It is getting closer, and as you were talking about that, one of the things that popped into my head is going from the creepy to the magic that is, you think, wow, is really leveraging this data and using the power of machine learning and AI, a great facilitator. Or is the bottom foundation order management? If you don't have the, or inventory management. If you don't have the inventory, it's great to have all these capabilities to transact in real time, but if you can't fulfill it, you're going to sink. >> Yeah. >> So Magento, with, you know, some of their core technology enabling this. Really enabling, not just enabling the 360 degree customer view, but being able to fulfill it. Those are table stakes, and game changers. >> Right. >> For merchants of any size. >> Right, and I think they do have to engage. I mean, they have to be brands. Right? Because a commodity item I can go get anywhere. There's got to be a reason to come. Lot of conversations, not so much here, but at the Adobe summit, in terms of the content piece, and having an ongoing dialog and an ongoing content relationship, with your client. Now you can slice and dice and serve that up lots of different ways based on who they are and the context. But if you don't have that, you can't just compete on price. You just can't compete on inventory, 'cause Amazon is going to win. Right? You can't stock, my favorite thing is, is shirt, shirt little pins in here. How do you stock those? You can't. They don't cost any money, and you don't sell that many. Amazon can. So, find you niche, you know. Engage your customers. Engage your community, and there'll be some transactions that come along with this. And I think it's really reinforced that, I think, its probably really timely for Magento to be part of Adobe, because eCommerce, just purely by itself, is going to be tougher and tougher to do unless you've got this deeper relationship with your customers, beyond simply transacting something. >> Exactly. So I enjoyed hosting, as I always do with you, Jeff. Learned a lot today, and excited to hear about what's next for this event, now that Adobe is leveraging the power of Magento. >> Well, we heard the announcements, Gary's going to make the announcement tomorrow. So hang out for the keynote tomorrow to find out more about Imagine 2020. We'll be there. >> 2020, yes. >> 2020, because we'll know everything in 2020. >> We will know. That's right. I can't wait. >> 2020 hindsight. >> I'm waiting for that. Well, Jeff, as I said, always a pleasure hosting with you. >> You too, Lisa. >> I brought the sea urchin necklace out. >> I like it. I like it. >> This is just for Jeff. It's making it's appearance on theCUBE. We want to thank you for watching, for Jeff Frick, I'm Lisa Martin, and you've been watching theCUBE live from Imagine 19 at The Wynn Las Vegas. Thanks for watching. (upbeat music)

Published Date : May 15 2019

SUMMARY :

brought to you Adobe. Welcome back to theCUBE, Lisa Martin with Jeff Frick. and they're ready to go, and nothing but opportunity ahead. and actually convert it to revenue. that now are going to be, you know, behind this platform. and the commerce experience to me as a consumer for the merchant to be able to execute their transactions and have access to the power of a branded Amazon storefront I really enjoyed the time with Tina, But also how brands have the opportunity to leverage data, to be able to follow you around going from the creepy to the magic that is, you think, but being able to fulfill it. I mean, they have to be brands. and excited to hear about what's next for this event, Gary's going to make the announcement tomorrow. I can't wait. Well, Jeff, as I said, always a pleasure hosting with you. I like it. We want to thank you for watching, for Jeff Frick,

ENTITIES

Entity	Category	Confidence
Jason	PERSON	0.99+
Tina	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Jeff	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Gary	PERSON	0.99+
Jason Woolsey	PERSON	0.99+
Jeff Bezos	PERSON	0.99+
September of 2018	DATE	0.99+
Google	ORGANIZATION	0.99+
Joe Namath	PERSON	0.99+
$4.7 billion	QUANTITY	0.99+
2018	DATE	0.99+
2020	DATE	0.99+
Vegas	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
2017	DATE	0.99+
two	QUANTITY	0.99+
Omnichannel	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
today	DATE	0.99+
360 degree	QUANTITY	0.99+
Lisa	PERSON	0.99+
Magento	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
20%	QUANTITY	0.99+
Shopify	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
last year	DATE	0.99+
yesterday	DATE	0.99+
Accenture	ORGANIZATION	0.99+
Google Shopping	TITLE	0.98+
millions	QUANTITY	0.98+
eBay	ORGANIZATION	0.98+
a year ago	DATE	0.98+
Magento	TITLE	0.97+
late last year	DATE	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.96+
TJ Gamble	PERSON	0.96+
Imagine 2019	TITLE	0.95+
Instagram	ORGANIZATION	0.95+
Imagine 19	ORGANIZATION	0.94+
this year	DATE	0.94+
Imagine	TITLE	0.93+
about 360 view	QUANTITY	0.91+
one email	QUANTITY	0.9+
Magento Imagine 2019	TITLE	0.89+
a thousand followers	QUANTITY	0.89+
Salesforce	ORGANIZATION	0.88+
Magento store	TITLE	0.86+
about a year ago	DATE	0.85+

Panel Discussion | IBM Fast Track Your Data 2017

>> Narrator: Live, from Munich, Germany, it's the CUBE. Covering IBM, Fast Track Your Data. Brought to you by IBM. >> Welcome to Munich everybody. This is a special presentation of the CUBE, Fast Track Your Data, brought to you by IBM. My name is Dave Vellante. And I'm here with my cohost, Jim Kobielus. Jim, good to see you. Really good to see you in Munich. >> Jim: I'm glad I made it. >> Thanks for being here. So last year Jim and I hosted a panel at New York City on the CUBE. And it was quite an experience. We had, I think it was nine or 10 data scientists and we felt like that was a lot of people to organize and talk about data science. Well today, we're going to do a repeat of that. With a little bit of twist on topics. And we've got five data scientists. We're here live, in Munich. And we're going to kick off the Fast Track Your Data event with this data science panel. So I'm going to now introduce some of the panelists, or all of the panelists. Then we'll get into the discussions. I'm going to start with Lillian Pierson. Lillian thanks very much for being on the panel. You are in data science. You focus on training executives, students, and you're really a coach but with a lot of data science expertise based in Thailand, so welcome. >> Thank you, thank you so much for having me. >> Dave: You're very welcome. And so, I want to start with sort of when you focus on training people, data science, where do you start? >> Well it depends on the course that I'm teaching. But I try and start at the beginning so for my Big Data course, I actually start back at the fundamental concepts and definitions they would even need to understand in order to understand the basics of what Big Data is, data engineering. So, terms like data governance. Going into the vocabulary that makes up the very introduction of the course, so that later on the students can really grasp the concepts I present to them. You know I'm teaching a deep learning course as well, so in that case I start at a lot more advanced concepts. So it just really depends on the level of the course. >> Great, and we're going to come back to this topic of women in tech. But you know, we looked at some CUBE data the other day. About 17% of the technology industry comprises women. And so we're a little bit over that on our data science panel, we're about 20% today. So we'll come back to that topic. But I don't know if there's anything you would add? >> I'm really passionate about women in tech and women who code, in particular. And I'm connected with a lot of female programmers through Instagram. And we're supporting each other. So I'd love to take any questions you have on what we're doing in that space. At least as far as what's happening across the Instagram platform. >> Great, we'll circle back to that. All right, let me introduce Chris Penn. Chris, Boston based, all right, SMI. Chris is a marketing expert. Really trying to help people understand how to get, turn data into value from a marketing perspective. It's a very important topic. Not only because we get people to buy stuff but also understanding some of the risks associated with things like GDPR, which is coming up. So Chris, tell us a little bit about your background and your practice. >> So I actually started in IT and worked at a start up. And that's where I made the transition to marketing. Because marketing has much better parties. But what's really interesting about the way data science is infiltrating marketing is the technology came in first. You know, everything went digital. And now we're at a point where there's so much data. And most marketers, they kind of got into marketing as sort of the arts and crafts field. And are realizing now, they need a very strong, mathematical, statistical background. So one of the things, Adam, the reason why we're here and IBM is helping out tremendously is, making a lot of the data more accessible to people who do not have a data science background and probably never will. >> Great, okay thank you. I'm going to introduce Ronald Van Loon. Ronald, your practice is really all about helping people extract value out of data, driving competitive advantage, business advantage, or organizational excellence. Tell us a little bit about yourself, your background, and your practice. >> Basically, I've three different backgrounds. On one hand, I'm a director at a data consultancy firm called Adversitement. Where we help companies to become data driven. Mainly large companies. I'm an advisory board member at Simply Learn, which is an e-learning platform, especially also for big data analytics. And on the other hand I'm a blogger and I host a series of webinars. >> Okay, great, now Dez, Dez Blanchfield, I met you on Twitter, you know, probably a couple of years ago. We first really started to collaborate last year. We've spend a fair amount of time together. You are a data scientist, but you're also a jack of all trades. You've got a technology background. You sit on a number of boards. You work very active with public policy. So tell us a little bit more about what you're doing these days, a little bit more about your background. >> Sure, I think my primary challenge these days is communication. Trying to join the dots between my technical background and deeply technical pedigree, to just plain English, every day language, and business speak. So bridging that technical world with what's happening in the boardroom. Toe to toe with the geeks to plain English to execs in boards. And just hand hold them and steward them through the journey of the challenges they're facing. Whether it's the enormous rapid of change and the pace of change, that's just almost exhaustive and causing them to sprint. But not just sprint in one race but in multiple lanes at the same time. As well as some of the really big things that are coming up, that we've seen like GDPR. So it's that communication challenge and just hand holding people through that journey and that mix of technical and commercial experience. >> Great, thank you, and finally Joe Caserta. Founder and president of Caserta Concepts. Joe you're a practitioner. You're in the front lines, helping organizations, similar to Ronald. Extracting value from data. Translate that into competitive advantage. Tell us a little bit about what you're doing these days in Caserta Concepts. >> Thanks Dave, thanks for having me. Yeah, so Caserta's been around. I've been doing this for 30 years now. And natural progressions have been just getting more from application development, to data warehousing, to big data analytics, to data science. Very, very organically, that's just because it's where businesses need the help the most, over the years. And right now, the big focus is governance. At least in my world. Trying to govern when you have a bunch of disparate data coming from a bunch of systems that you have no control over, right? Like social media, and third party data systems. Bringing it in and how to you organize it? How do you ingest it? How do you govern it? How do you keep it safe? And also help to define ownership of the data within an organization within an enterprise? That's also a very hot topic. Which ties back into GDPR. >> Great, okay, so we're going to be unpacking a lot of topics associated with the expertise that these individuals have. I'm going to bring in Jim Kobielus, to the conversation. Jim, the newest Wikibon analyst. And newest member of the SiliconANGLE Media Team. Jim, get us started off. >> Yeah, so we're at an event, at an IBM event where machine learning and data science are at the heart of it. There are really three core themes here. Machine learning and data science, on the one hand. Unified governance on the other. And hybrid data management. I want to circle back or focus on machine learning. Machine learning is the coin of the realm, right now in all things data. Machine learning is the heart of AI. Machine learning, everybody is going, hiring, data scientists to do machine learning. I want to get a sense from our panel, who are experts in this area, what are the chief innovations and trends right now on machine learning. Not deep learning, the core of machine learning. What's super hot? What's in terms of new techniques, new technologies, new ways of organizing teams to build and to train machine learning models? I'd like to open it up. Let's just start with Lillian. What are your thoughts about trends in machine learning? What's really hot? >> It's funny that you excluded deep learning from the response for this, because I think the hottest space in machine learning is deep learning. And deep learning is machine learning. I see a lot of collaborative platforms coming out, where people, data scientists are able to work together with other sorts of data professionals to reduce redundancies in workflows. And create more efficient data science systems. >> Is there much uptake of these crowd sourcing environments for training machine learning wells. Like CrowdFlower, or Amazon Mechanical Turk, or Mighty AI? Is that a huge trend in terms of the workflow of data science or machine learning, a lot of that? >> I don't see that crowdsourcing is like, okay maybe I've been out of the crowdsourcing space for a while. But I was working with Standby Task Force back in 2013. And we were doing a lot of crowdsourcing. And I haven't seen the industry has been increasing, but I could be wrong. I mean, because there's no, if you're building automation models, most of the, a lot of the work that's being crowdsourced could actually be automated if someone took the time to just build the scripts and build the models. And so I don't imagine that, that's going to be a trend that's increasing. >> Well, automation machine learning pipeline is fairly hot, in terms of I'm seeing more and more research. Google's doing a fair amount of automated machine learning. The panel, what do you think about automation, in terms of the core modeling tasks involved in machine learning. Is that coming along? Are data scientists in danger of automating themselves out of a job? >> I don't think there's a risk of data scientist's being put out of a job. Let's just put that on the thing. I do think we need to get a bit clearer about this meme of the mythical unicorn. But to your call point about machine learning, I think what you'll see, we saw the cloud become baked into products, just as a given. I think machine learning is already crossed this threshold. We just haven't necessarily noticed or caught up. And if we look at, we're at an IBM event, so let's just do a call out for them. The data science experience platform, for example. Machine learning's built into a whole range of things around algorithm and data classification. And there's an assisted, guided model for how you get to certain steps, where you don't actually have to understand how machine learning works. You don't have to understand how the algorithms work. It shows you the different options you've got and you can choose them. So you might choose regression. And it'll give you different options on how to do that. So I think we've already crossed this threshold of baking in machine learning and baking in the data science tools. And we've seen that with Cloud and other technologies where, you know, the Office 365 is not, you can't get a non Cloud Office 365 account, right? I think that's already happened in machine learning. What we're seeing though, is organizations even as large as the Googles still in catch up mode, in my view, on some of the shift that's taken place. So we've seen them write little games and apps where people do doodles and then it runs through the ML library and says, "Well that's a cow, or a unicorn, or a duck." And you get awards, and gold coins, and whatnot. But you know, as far as 12 years ago I was working on a project, where we had full size airplanes acting as drones. And we mapped with two and 3-D imagery. With 2-D high res imagery and LiDAR for 3-D point Clouds. We were finding poles and wires for utility companies, using ML before it even became a trend. And baking it right into the tools. And used to store on our web page and clicked and pointed on. >> To counter Lillian's point, it's not crowdsourcing but crowd sharing that's really powering a lot of the rapid leaps forward. If you look at, you know, DSX from IBM. Or you look at Node-RED, huge number of free workflows that someone has probably already done the thing that you are trying to do. Go out and find in the libraries, through Jupyter and R Notebooks, there's an ability-- >> Chris can you define before you go-- >> Chris: Sure. >> This is great, crowdsourcing versus crowd sharing. What's the distinction? >> Well, so crowdsourcing, kind of, where in the context of the question you ask is like I'm looking for stuff that other people, getting people to do stuff that, for me. It's like asking people to mine classifieds. Whereas crowd sharing, someone has done the thing already, it already exists. You're not purpose built, saying, "Jim, help me build this thing." It's like, "Oh Jim, you already "built this thing, cool. "So can I fork it and make my own from it?" >> Okay, I see what you mean, keep going. >> And then, again, going back to earlier. In terms of the advancements. Really deep learning, it probably is a good idea to just sort of define these things. Machine learning is how machines do things without being explicitly programmed to do them. Deep learning's like if you can imagine a stack of pancakes, right? Each pancake is a type of machine learning algorithm. And your data is the syrup. You pour the data on it. It goes from layer, to layer, to layer, to layer, and what you end up with at the end is breakfast. That's the easiest analogy for what deep learning is. Now imagine a stack of pancakes, 500 or 1,000 high, that's where deep learning's going now. >> Sure, multi layered machine learning models, essentially, that have the ability to do higher levels of abstraction. Like image analysis, Lillian? >> I had a comment to add about automation and data science. Because there are a lot of tools that are able to, or applications that are able to use data science algorithms and output results. But the reason that data scientists aren't in risk of losing their jobs, is because just because you can get the result, you also have to be able to interpret it. Which means you have to understand it. And that involves deep math and statistical understanding. Plus domain expertise. So, okay, great, you took out the coding element but that doesn't mean you can codify a person's ability to understand and apply that insight. >> Dave: Joe, you have something to add? >> I could just add that I see the trend. Really, the reason we're talking about it today is machine learning is not necessarily, it's not new, like Dez was saying. But what's different is the accessibility of it now. It's just so easily accessible. All of the tools that are coming out, for data, have machine learning built into it. So the machine learning algorithms, which used to be a black art, you know, years ago, now is just very easily accessible. That you can get, it's part of everyone's toolbox. And the other reason that we're talking about it more, is that data science is starting to become a core curriculum in higher education. Which is something that's new, right? That didn't exist 10 years ago? But over the past five years, I'd say, you know, it's becoming more and more easily accessible for education. So now, people understand it. And now we have it accessible in our tool sets. So now we can apply it. And I think that's, those two things coming together is really making it becoming part of the standard of doing analytics. And I guess the last part is, once we can train the machines to start doing the analytics, right? And get smarter as it ingests more data. And then we can actually take that and embed it in our applications. That's the part that you still need data scientists to create that. But once we can have standalone appliances that are intelligent, that's when we're going to start seeing, really, machine learning and artificial intelligence really start to take off even more. >> Dave: So I'd like to switch gears a little bit and bring Ronald on. >> Okay, yes. >> Here you go, there. >> Ronald, the bromide in this sort of big data world we live in is, the data is the new oil. You got to be a data driven company and many other cliches. But when you talk to organizations and you start to peel the onion. You find that most companies really don't have a good way to connect data with business impact and business value. What are you seeing with your clients and just generally in the community, with how companies are doing that? How should they do that? I mean, is that something that is a viable approach? You don't see accountants, for example, quantifying the value of data on a balance sheet. There's no standards for doing that. And so it's sort of this fuzzy concept. How are and how should organizations take advantage of data and turn it into value. >> So, I think in general, if you look how companies look at data. They have departments and within the departments they have tools specific for this department. And what you see is that there's no central, let's say, data collection. There's no central management of governance. There's no central management of quality. There's no central management of security. Each department is manages their data on their own. So if you didn't ask, on one hand, "Okay, how should they do it?" It's basically go back to the drawing table and say, "Okay, how should we do it?" We should collect centrally, the data. And we should take care for central governance. We should take care for central data quality. We should take care for centrally managing this data. And look from a company perspective and not from a department perspective what the value of data is. So, look at the perspective from your whole company. And this means that it has to be brought on one end to, whether it's from C level, where most of them still fail to understand what it really means. And what the impact can be for that company. >> It's a hard problem. Because data by its' very nature is now so decentralized. But Chris you have a-- >> The thing I want to add to that is, think about in terms of valuing data. Look at what it would cost you for data breach. Like what is the expensive of having your data compromised. If you don't have governance. If you don't have policy in place. Look at the major breaches of the last couple years. And how many billions of dollars those companies lost in market value, and trust, and all that stuff. That's one way you can value data very easily. "What will it cost us if we mess this up?" >> So a lot of CEOs will hear that and say, "Okay, I get it. "I have to spend to protect myself, "but I'd like to make a little money off of this data thing. "How do I do that?" >> Well, I like to think of it, you know, I think data's definitely an asset within an organization. And is becoming more and more of an asset as the years go by. But data is still a raw material. And that's the way I think about it. In order to actually get the value, just like if you're creating any product, you start with raw materials and then you refine it. And then it becomes a product. For data, data is a raw material. You need to refine it. And then the insight is the product. And that's really where the value is. And the insight is absolutely, you can monetize your insight. >> So data is, abundant insights are scarce. >> Well, you know, actually you could say that intermediate between insights and the data are the models themselves. The statistical, predictive, machine learning models. That are a crystallization of insights that have been gained by people called data scientists. What are your thoughts on that? Are statistical, predictive, machine learning models something, an asset, that companies, organizations, should manage governance of on a centralized basis or not? >> Well the models are essentially the refinery system, right? So as you're refining your data, you need to have process around how you exactly do that. Just like refining anything else. It needs to be controlled and it needs to be governed. And I think that data is no different from that. And I think that it's very undisciplined right now, in the market or in the industry. And I think maturing that discipline around data science, I think is something that's going to be a very high focus in this year and next. >> You were mentioning, "How do you make money from data?" Because there's all this risk associated with security breaches. But at the risk of sounding simplistic, you can generate revenue from system optimization, or from developing products and services. Using data to develop products and services that better meet the demands and requirements of your markets. So that you can sell more. So either you are using data to earn more money. Or you're using data to optimize your system so you have less cost. And that's a simple answer for how you're going to be making money from the data. But yes, there is always the counter to that, which is the security risks. >> Well, and my question really relates to, you know, when you think of talking to C level executives, they kind of think about running the business, growing the business, and transforming the business. And a lot of times they can't fund these transformations. And so I would agree, there's many, many opportunities to monetize data, cut costs, increase revenue. But organizations seem to struggle to either make a business case. And actually implement that transformation. >> Dave, I'd love to have a crack at that. I think this conversation epitomizes the type of things that are happening in board rooms and C suites already. So we've really quickly dived into the detail of data. And the detail of machine learning. And the detail of data science, without actually stopping and taking a breath and saying, "Well, we've "got lots of it, but what have we got? "Where is it? "What's the value of it? "Is there any value in it at all?" And, "How much time and money should we invest in it?" For example, we talk of being about a resource. I look at data as a utility. When I turn the tap on to get a drink of water, it's there as a utility. I counted it being there but I don't always sample the quality of the water and I probably should. It could have Giardia in it, right? But what's interesting is I trust the water at home, in Sydney. Because we have a fairly good experience with good quality water. If I were to go to some other nation. I probably wouldn't trust that water. And I think, when you think about it, what's happening in organizations. It's almost the same as what we're seeing here today. We're having a lot of fun, diving into the detail. But what we've forgotten to do is ask the question, "Well why is data even important? "What's the reasoning to the business? "Why are we in business? "What are we doing as an organization? "And where does data fit into that?" As opposed to becoming so fixated on data because it's a media hyped topic. I think once you can wind that back a bit and say, "Well, we have lot's of data, "but is it good data? "Is it quality data? "Where's it coming from? "Is it ours? "Are we allowed to have it? "What treatment are we allowed to give that data?" As you said, "Are we controlling it? "And where are we controlling it? "Who owns it?" There's so many questions to be asked. But the first question I like to ask people in plain English is, "Well is there any value "in data in the first place? "What decisions are you making that data can help drive? "What things are in your organizations, "KPIs and milestones you're trying to meet "that data might be a support?" So then instead of becoming fixated with data as a thing in itself, it becomes part of your DNA. Does that make sense? >> Think about what money means. The Economists' Rhyme, "Money is a measure for, "a systems for, a medium, a measure, and exchange." So it's a medium of exchange. A measure of value, a way to exchange something. And a way to store value. Data, good clean data, well governed, fits all four of those. So if you're trying to figure out, "How do we make money out of stuff." Figure out how money works. And then figure out how you map data to it. >> So if we approach and we start with a company, we always start with business case, which is quite clear. And defined use case, basically, start with a team on one hand, marketing people, sales people, operational people, and also the whole data science team. So start with this case. It's like, defining, basically a movie. If you want to create the movie, You know where you're going to. You know what you want to achieve to create the customer experience. And this is basically the same with a business case. Where you define, "This is the case. "And this is how we're going to derive value, "start with it and deliver value within a month." And after the month, you check, "Okay, where are we and how can we move forward? "And what's the value that we've brought?" >> Now I as well, start with business case. I've done thousands of business cases in my life, with organizations. And unless that organization was kind of a data broker, the business case rarely has a discreet component around data. Is that changing, in your experience? >> Yes, so we guide companies into be data driven. So initially, indeed, they don't like to use the data. They don't like to use the analysis. So that's why, how we help. And is it changing? Yes, they understand that they need to change. But changing people is not always easy. So, you see, it's hard if you're not involved and you're not guiding it, they fall back in doing the daily tasks. So it's changing, but it's a hard change. >> Well and that's where this common parlance comes in. And Lillian, you, sort of, this is what you do for a living, is helping people understand these things, as you've been sort of evangelizing that common parlance. But do you have anything to add? >> I wanted to add that for organizational implementations, another key component to success is to start small. Start in one small line of business. And then when you've mastered that area and made it successful, then try and deploy it in more areas of the business. And as far as initializing big data implementation, that's generally how to do it successfully. >> There's the whole issue of putting a value on data as a discreet asset. Then there's the issue, how do you put a value on a data lake? Because a data lake, is essentially an asset you build on spec. It's an exploratory archive, essentially, of all kinds of data that might yield some insights, but you have to have a team of data scientists doing exploration and modeling. But it's all on spec. How do you put a value on a data lake? And at what point does the data lake itself become a burden? Because you got to store that data and manage it. At what point do you drain that lake? At what point, do the costs of maintaining that lake outweigh the opportunity costs of not holding onto it? >> So each Hadoop note is approximately $20,000 per year cost for storage. So I think that there needs to be a test and a diagnostic, before even inputting, ingesting the data and storing it. "Is this actually going to be useful? "What value do we plan to create from this?" Because really, you can't store all the data. And it's a lot cheaper to store data in Hadoop then it was in traditional systems but it's definitely not free. So people need to be applying this test before even ingesting the data. Why do we need this? What business value? >> I think the question we need to also ask around this is, "Why are we building data lakes "in the first place? "So what's the function it's going to perform for you?" There's been a huge drive to this idea. "We need a data lake. "We need to put it all somewhere." But invariably they become data swamps. And we only half jokingly say that because I've seen 90 day projects turn from a great idea, to a really bad nightmare. And as Lillian said, it is cheaper in some ways to put it into a HDFS platform, in a technical sense. But when we look at all the fully burdened components, it's actually more expensive to find Hadoop specialists and Spark specialists to maintain that cluster. And invariably I'm finding that big data, quote unquote, is not actually so much lots of data, it's complex data. And as Lillian said, "You don't always "need to store it all." So I think if we go back to the question of, "What's the function of a data lake in the first place? "Why are we building one?" And then start to build some fully burdened cost components around that. We'll quickly find that we don't actually need a data lake, per se. We just need an interim data store. So we might take last years' data and tokenize it, and analyze it, and do some analytics on it, and just keep the meta data. So I think there is this rush, for a whole range of reasons, particularly vendor driven. To build data lakes because we think they're a necessity, when in reality they may just be an interim requirement and we don't need to keep them for a long term. >> I'm going to attempt to, the last few questions, put them all together. And I think, they all belong together because one of the reasons why there's such hesitation about progress within the data world is because there's just so much accumulated tech debt already. Where there's a new idea. We go out and we build it. And six months, three years, it really depends on how big the idea is, millions of dollars is spent. And then by the time things are built the idea is pretty much obsolete, no one really cares anymore. And I think what's exciting now is that the speed to value is just so much faster than it's ever been before. And I think that, you know, what makes that possible is this concept of, I don't think of a data lake as a thing. I think of a data lake as an ecosystem. And that ecosystem has evolved so much more, probably in the last three years than it has in the past 30 years. And it's exciting times, because now once we have this ecosystem in place, if we have a new idea, we can actually do it in minutes not years. And that's really the exciting part. And I think, you know, data lake versus a data swamp, comes back to just traditional data architecture. And if you architect your data lake right, you're going to have something that's substantial, that's you're going to be able to harness and grow. If you don't do it right. If you just throw data. If you buy Hadoop cluster or a Cloud platform and just throw your data out there and say, "We have a lake now." yeah, you're going to create a mess. And I think taking the time to really understand, you know, the new paradigm of data architecture and modern data engineering, and actually doing it in a very disciplined way. If you think about it, what we're doing is we're building laboratories. And if you have a shabby, poorly built laboratory, the best scientist in the world isn't going to be able to prove his theories. So if you have a well built laboratory and a clean room, then, you know a scientist can get what he needs done very, very, very efficiently. And that's the goal, I think, of data management today. >> I'd like to just quickly add that I totally agree with the challenge between on premise and Cloud mode. And I think one of the strong themes of today is going to be the hybrid data management challenge. And I think organizations, some organizations, have rushed to adopt Cloud. And thinking it's a really good place to dump the data and someone else has to manage the problem. And then they've ended up with a very expensive death by 1,000 cuts in some senses. And then others have been very reluctant as a result of not gotten access to rapid moving and disruptive technology. So I think there's a really big challenge to get a basic conversation going around what's the value using Cloud technology as in adopting it, versus what are the risks? And when's the right time to move? For example, should we Cloud Burst for workloads? Do we move whole data sets in there? You know, moving half a petabyte of data into a Cloud platform back is a non-trivial exercise. But moving a terabyte isn't actually that big a deal anymore. So, you know, should we keep stuff behind the firewalls? I'd be interested in seeing this week where 80% of the data, supposedly is. And just push out for Cloud tools, machine learning, data science tools, whatever they might be, cognitive analytics, et cetera. And keep the bulk of the data on premise. Or should we just move whole spools into the Cloud? There is no one size fits all. There's no silver bullet. Every organization has it's own quirks and own nuances they need to think through and make a decision themselves. >> Very often, Dez, organizations have zonal architectures so you'll have a data lake that consists of a no sequel platform that might be used for say, mobile applications. A Hadoop platform that might be used for unstructured data refinement, so forth. A streaming platform, so forth and so on. And then you'll have machine learning models that are built and optimized for those different platforms. So, you know, think of it in terms of then, your data lake, is a set of zones that-- >> It gets even more complex just playing on that theme, when you think about what Cisco started, called Folk Computing. I don't really like that term. But edge analytics, or computing at the edge. We've seen with the internet coming along where we couldn't deliver everything with a central data center. So we started creating this concept of content delivery networks, right? I think the same thing, I know the same thing has happened in data analysis and data processing. Where we've been pulling social media out of the Cloud, per se, and bringing it back to a central source. And doing analytics on it. But when you think of something like, say for example, when the Dreamliner 787 from Boeing came out, this airplane created 1/2 a terabyte of data per flight. Now let's just do some quick, back of the envelope math. There's 87,400 fights a day, just in the domestic airspace in the USA alone, per day. Now 87,400 by 1/2 a terabyte, that's 43 point five petabytes a day. You physically can't copy that from quote unquote in the Cloud, if you'll pardon the pun, back to the data center. So now we've got the challenge, a lot of our Enterprise data's behind a firewall, supposedly 80% of it. But what's out at the edge of the network. Where's the value in that data? So there are zonal challenges. Now what do I do with my Enterprise versus the open data, the mobile data, the machine data. >> Yeah, we've seen some recent data from IDC that says, "About 43% of the data "is going to stay at the edge." We think that, that's way understated, just given the examples. We think it's closer to 90% is going to stay at the edge. >> Just on the airplane topic, right? So Airbus wasn't going to be outdone. Boeing put 4,000 sensors or something in their 787 Dreamliner six years ago. Airbus just announced an 83, 81,000 with 10,000 sensors in it. Do the same math. Now the FAA in the US said that all aircraft and all carriers have to be, by early next year, I think it's like March or April next year, have to be at the same level of BIOS. Or the same capability of data collection and so forth. It's kind of like a mini GDPR for airlines. So with the 83, 81,000 with 10,000 sensors, that becomes two point five terabytes per flight. If you do the math, it's 220 petabytes of data just in one day's traffic, domestically in the US. Now, it's just so mind boggling that we're going to have to completely turn our thinking on its' head, on what do we do behind the firewall? What do we do in the Cloud versus what we might have to do in the airplane? I mean, think about edge analytics in the airplane processing data, as you said, Jim, streaming analytics in flight. >> Yeah that's a big topic within Wikibon, so, within the team. Me and David Floyer, and my other colleagues. They're talking about the whole notion of edge architecture. Not only will most of the data be persisted at the edge, most of the deep learning models like TensorFlow will be executed at the edge. To some degree, the training of those models will happen in the Cloud. But much of that will be pushed in a federated fashion to the edge, or at least I'm predicting. We're already seeing some industry moves in that direction, in terms of architectures. Google has a federated training, project or initiative. >> Chris: Look at TensorFlow Lite. >> Which is really fascinating for it's geared to IOT, I'm sorry, go ahead. >> Look at TensorFlow Lite. I mean in the announcement of having every Android device having ML capabilities, is Google's essential acknowledgment, "We can't do it all." So we need to essentially, sort of like a setting at home. Everyone's smartphone top TV box just to help with the processing. >> Now we're talking about this, this sort of leads to this IOT discussion but I want to underscore the operating model. As you were saying, "You can't just "lift and shift to the Cloud." You're not going to, CEOs aren't going to get the billion dollar hit by just doing that. So you got to change the operating model. And that leads to, this discussion of IOT. And an entirely new operating model. >> Well, there are companies that are like Sisense who have worked with Intel. And they've taken this concept. They've taken the business logic and not just putting it in the chip, but actually putting it in memory, in the chip. So as data's going through the chip it's not just actually being processed but it's actually being baked in memory. So level one, two, and three cache. Now this is a game changer. Because as Chris was saying, even if we were to get the data back to a central location, the compute load, I saw a real interesting thing from I think it was Google the other day, one of the guys was doing a talk. And he spoke about what it meant to add cognitive and voice processing into just the Android platform. And they used some number, like that had, double the amount of compute they had, just to add voice for free, to the Android platform. Now even for Google, that's a nontrivial exercise. So as Chris was saying, I think we have to again, flip it on its' head and say, "How much can we put "at the edge of the network?" Because think about these phones. I mean, even your fridge and microwave, right? We put a man on the moon with something that these days, we make for $89 at home, on the Raspberry Pie computer, right? And even that was 1,000 times more powerful. When we start looking at what's going into the chips, we've seen people build new, not even GPUs, but deep learning and stream analytics capable chips. Like Google, for example. That's going to make its' way into consumer products. So that, now the compute capacity in phones, is going to, I think transmogrify in some ways because there is some magic in there. To the point where, as Chris was saying, "We're going to have the smarts in our phone." And a lot of that workload is going to move closer to us. And only the metadata that we need to move is going to go centrally. >> Well here's the thing. The edge isn't the technology. The edge is actually the people. When you look at, for example, the MIT language Scratch. This is kids programming language. It's drag and drop. You know, kids can assemble really fun animations and make little movies. We're training them to build for IOT. Because if you look at a system like Node-RED, it's an IBM interface that is drag and drop. Your workflow is for IOT. And you can push that to a device. Scratch has a converter for doing those. So the edge is what those thousands and millions of kids who are learning how to code, learning how to think architecturally and algorithmically. What they're going to create that is beyond what any of us can possibly imagine. >> I'd like to add one other thing, as well. I think there's a topic we've got to start tabling. And that is what I refer to as the gravity of data. So when you think about how planets are formed, right? Particles of dust accrete. They form into planets. Planets develop gravity. And the reason we're not flying into space right now is that there's gravitational force. Even though it's one of the weakest forces, it keeps us on our feet. Oftentimes in organizations, I ask them to start thinking about, "Where is the center "of your universe with regard to the gravity of data." Because if you can follow the center of your universe and the gravity of your data, you can often, as Chris is saying, find where the business logic needs to be. And it could be that you got to think about a storage problem. You can think about a compute problem. You can think about a streaming analytics problem. But if you can find where the center of your universe and the center of your gravity for your data is, often you can get a really good insight into where you can start focusing on where the workloads are going to be where the smarts are going to be. Whether it's small, medium, or large. >> So this brings up the topic of data governance. One of the themes here at Fast Track Your Data is GDPR. What it means. It's one of the reasons, I think IBM selected Europe, generally, Munich specifically. So let's talk about GDPR. We had a really interesting discussion last night. So let's kind of recreate some of that. I'd like somebody in the panel to start with, what is GDPR? And why does it matter, Ronald? >> Yeah, maybe I can start. Maybe a little bit more in general unified governance. So if i talk to companies and I need to explain to them what's governance, I basically compare it with a crime scene. So in a crime scene if something happens, they start with securing all the evidence. So they start sealing the environment. And take care that all the evidence is collected. And on the other hand, you see that they need to protect this evidence. There are all kinds of policies. There are all kinds of procedures. There are all kinds of rules, that need to be followed. To take care that the whole evidence is secured well. And once you start, basically, investigating. So you have the crime scene investigators. You have the research lab. You have all different kind of people. They need to have consent before they can use all this evidence. And the whole reason why they're doing this is in order to collect the villain, the crook. To catch him and on the other hand, once he's there, to convict him. And we do this to have trust in the materials. Or trust in basically, the analytics. And on the other hand to, the public have trust in everything what's happened with the data. So if you look to a company, where data is basically the evidence, this is the value of your data. It's similar to like the evidence within a crime scene. But most companies don't treat it like this. So if we then look to GDPR, GDPR basically shifts the power and the ownership of the data from the company to the person that created it. Which is often, let's say the consumer. And there's a lot of paradox in this. Because all the companies say, "We need to have this customer data. "Because we need to improve the customer experience." So if you make it concrete and let's say it's 1st of June, so GDPR is active. And it's first of June 2018. And I go to iTunes, so I use iTunes. Let's go to iTunes said, "Okay, Apple please "give me access to my data." I want to see which kind of personal information you have stored for me. On the other end, I want to have the right to rectify all this data. I want to be able to change it and give them a different level of how they can use my data. So I ask this to iTunes. And then I say to them, okay, "I basically don't like you anymore. "I want to go to Spotify. "So please transfer all my personal data to Spotify." So that's possible once it's June 18. Then I go back to iTunes and say, "Okay, I don't like it anymore. "Please reduce my consent. "I withdraw my consent. "And I want you to remove all my "personal data for everything that you use." And I go to Spotify and I give them, let's say, consent for using my data. So this is a shift where you can, as a person be the owner of the data. And this has a lot of consequences, of course, for organizations, how to manage this. So it's quite simple for the consumer. They get the power, it's maturing the whole law system. But it's a big consequence of course for organizations. >> This is going to be a nightmare for marketers. But fill in some of the gaps there. >> Let's go back, so GDPR, the General Data Protection Regulation, was passed by the EU in 2016, in May of 2016. It is, as Ronald was saying, it's four basic things. The right to privacy. The right to be forgotten. Privacy built into systems by default. And the right to data transfer. >> Joe: It takes effect next year. >> It is already in effect. GDPR took effect in May of 2016. The enforcement penalties take place the 25th of May 2018. Now here's where, there's two things on the penalty side that are important for everyone to know. Now number one, GDPR is extra territorial. Which means that an EU citizen, anywhere on the planet has GDPR, goes with them. So say you're a pizza shop in Nebraska. And an EU citizen walks in, orders a pizza. Gives her the credit card and stuff like that. If you for some reason, store that data, GDPR now applies to you, Mr. Pizza shop, whether or not you do business in the EU. Because an EU citizen's data is with you. Two, the penalties are much stiffer then they ever have been. In the old days companies could simply write off penalties as saying, "That's the cost of doing business." With GDPR the penalties are up to 4% of your annual revenue or 20 million Euros, whichever is greater. And there may be criminal sanctions, charges, against key company executives. So there's a lot of questions about how this is going to be implemented. But one of the first impacts you'll see from a marketing perspective is all the advertising we do, targeting people by their age, by their personally identifiable information, by their demographics. Between now and May 25th 2018, a good chunk of that may have to go away because there's no way for you to say, "Well this person's an EU citizen, this person's not." People give false information all the time online. So how do you differentiate it? Every company, regardless of whether they're in the EU or not will have to adapt to it, or deal with the penalties. >> So Lillian, as a consumer this is designed to protect you. But you had a very negative perception of this regulation. >> I've looked over the GDPR and to me it actually looks like a socialist agenda. It looks like (panel laughs) no, it looks like a full assault on free enterprise and capitalism. And on its' face from a legal perspective, its' completely and wholly unenforceable. Because they're assigning jurisdictional rights to the citizen. But what are they going to do? They're going to go to Nebraska and they're going to call in the guy from the pizza shop? And call him into what court? The EU court? It's unenforceable from a legal perspective. And if you write a law that's unenforceable, you know, it's got to be enforceable in every element. It can't be just, "Oh, we're only "going to enforce it for Facebook and for Google. "But it's not enforceable for," it needs to be written so that it's a complete and actionable law. And it's not written in that way. And from a technological perspective it's not implementable. I think you said something like 652 EU regulators or political people voted for this and 10 voted against it. But what do they know about actually implementing it? Is it possible? There's all sorts of regulations out there that aren't possible to implement. I come from an environmental engineering background. And it's absolutely ridiculous because these agencies will pass laws that actually, it's not possible to implement those in practice. The cost would be too great. And it's not even needed. So I don't know, I just saw this and I thought, "You know, if the EU wants to," what they're essentially trying to do is regulate what the rest of the world does on the internet. And if they want to build their own internet like China has and police it the way that they want to. But Ronald here, made an analogy between data, and free enterprise, and a crime scene. Now to me, that's absolutely ridiculous. What does data and someone signing up for an email list have to do with a crime scene? And if EU wants to make it that way they can police their own internet. But they can't go across the world. They can't go to Singapore and tell Singapore, or go to the pizza shop in Nebraska and tell them how to run their business. >> You know, EU overreach in the post Brexit era, of what you're saying has a lot of validity. How far can the tentacles of the EU reach into other sovereign nations. >> What court are they going to call them into? >> Yeah. >> I'd like to weigh in on this. There are lots of unknowns, right? So I'd like us to focus on the things we do know. We've already dealt with similar situations before. In Australia, we introduced a goods and sales tax. Completely foreign concept. Everything you bought had 10% on it. No one knew how to deal with this. It was a completely new practice in accounting. There's a whole bunch of new software that had to be written. MYRB had to have new capability, but we coped. No one actually went to jail yet. It's decades later, for not complying with GST. So what it was, was a framework on how to shift from non sales tax related revenue collection. To sales tax related revenue collection. I agree that there are some egregious things built into this. I don't disagree with that at all. But I think if I put my slightly broader view of the world hat on, we have well and truly gone past the point in my mind, where data was respected, data was treated in a sensible way. I mean I get emails from companies I've never done business with. And when I follow it up, it's because I did business with a credit card company, that gave it to a service provider, that thought that I was going to, when I bought a holiday to come to Europe, that I might want travel insurance. Now some might say there's value in that. And other's say there's not, there's the debate. But let's just focus on what we're talking about. We're talking about a framework for governance of the treatment of data. If we remove all the emotive component, what we are talking about is a series of guidelines, backed by laws, that say, "We would like you to do this," in an ideal world. But I don't think anyone's going to go to jail, on day one. They may go to jail on day 180. If they continue to do nothing about it. So they're asking you to sort of sit up and pay attention. Do something about it. There's a whole bunch of relief around how you approach it. The big thing for me, is there's no get out of jail card, right? There is no get out of jail card for not complying. But there's plenty of support. I mean, we're going to have ambulance chasers everywhere. We're going to have class actions. We're going to have individual suits. The greatest thing to do right now is get into GDPR law. Because you seem to think data scientists are unicorn? >> What kind of life is that if there's ambulance chasers everywhere? You want to live like that? >> Well I think we've seen ad blocking. I use ad blocking as an example, right? A lot of organizations with advertising broke the internet by just throwing too much content on pages, to the point where they're just unusable. And so we had this response with ad blocking. I think in many ways, GDPR is a regional response to a situation where I don't think it's the exact right answer. But it's the next evolutional step. We'll see things evolve over time. >> It's funny you mentioned it because in the United States one of the things that has happened, is that with the change in political administrations, the regulations on what companies can do with your data have actually been laxened, to the point where, for example, your internet service provider can resell your browsing history, with or without your consent. Or your consent's probably buried in there, on page 47. And so, GDPR is kind of a response to saying, "You know what? "You guys over there across the Atlantic "are kind of doing some fairly "irresponsible things with what you allow companies to do." Now, to Lillian's point, no one's probably going to go after the pizza shop in Nebraska because they don't do business in the EU. They don't have an EU presence. And it's unlikely that an EU regulator's going to get on a plane from Brussels and fly to Topeka and say, or Omaha, sorry, "Come on Joe, let's get the pizza shop in order here." But for companies, particularly Cloud companies, that have offices and operations within the EU, they have to sit up and pay attention. So if you have any kind of EU operations, or any kind of fiscal presence in the EU, you need to get on board. >> But to Lillian's point it becomes a boondoggle for lawyers in the EU who want to go after deep pocketed companies like Facebook and Google. >> What's the value in that? It seems like regulators are just trying to create work for themselves. >> What about the things that say advertisers can do, not so much with the data that they have? With the data that they don't have. In other words, they have people called data scientists who build models that can do inferences on sparse data. And do amazing things in terms of personalization. What do you do about all those gray areas? Where you got machine learning models and so forth? >> But it applies-- >> It applies to personally identifiable information. But if you have a talented enough data scientist, you don't need the PII or even the inferred characteristics. If a certain type of behavior happens on your website, for example. And this path of 17 pages almost always leads to a conversion, it doesn't matter who you are or where you're coming from. If you're a good enough data scientist, you can build a model that will track that. >> Like you know, target, infer some young woman was pregnant. And they inferred correctly even though that was never divulged. I mean, there's all those gray areas that, how can you stop that slippery slope? >> Well I'm going to weigh in really quickly. A really interesting experiment for people to do. When people get very emotional about it I say to them, "Go to Google.com, "view source, put it in seven point Courier "font in Word and count how many pages it is." I guess you can't guess how many pages? It's 52 pages of seven point Courier font, HTML to render one logo, and a search field, and a click button. Now why do we need 52 pages of HTML source code and Java script just to take a search query. Think about what's being done in that. It's effectively a mini operating system, to figure out who you are, and what you're doing, and where you been. Now is that a good or bad thing? I don't know, I'm not going to make a judgment call. But what I'm saying is we need to stop and take a deep breath and say, "Does anybody need a 52 page, "home page to take a search query?" Because that's just the tip of the iceberg. >> To that point, I like the results that Google gives me. That's why I use Google and not Bing. Because I get better search results. So, yeah, I don't mind if you mine my personal data and give me, our Facebook ads, those are the only ads, I saw in your article that GDPR is going to take out targeted advertising. The only ads in the entire world, that I like are Facebook ads. Because I actually see products I'm interested in. And I'm happy to learn about that. I think, "Oh I want to research that. "I want to see this new line of products "and what are their competitors?" And I like the targeted advertising. I like the targeted search results because it's giving me more of the information that I'm actually interested in. >> And that's exactly what it's about. You can still decide, yourself, if you want to have this targeted advertising. If not, then you don't give consent. If you like it, you give consent. So if a company gives you value, you give consent back. So it's not that it's restricting everything. It's giving consent. And I think it's similar to what happened and the same type of response, what happened, we had the Mad Cow Disease here in Europe, where you had the whole food chain that needed to be tracked. And everybody said, "No, it's not required." But now it's implemented. Everybody in Europe does it. So it's the same, what probably going to happen over here as well. >> So what does GDPR mean for data scientists? >> I think GDPR is, I think it is needed. I think one of the things that may be slowing data science down is fear. People are afraid to share their data. Because they don't know what's going to be done with it. If there are some guidelines around it that should be enforced and I think, you know, I think it's been said but as long as a company could prove that it's doing due diligence to protect your data, I think no one is going to go to jail. I think when there's, you know, we reference a crime scene, if there's a heinous crime being committed, all right, then it's going to become obvious. And then you do go directly to jail. But I think having guidelines and even laws around privacy and protection of data is not necessarily a bad thing. You can do a lot of data, really meaningful data science, without understanding that it's Joe Caserta. All of the demographics about me. All of the characteristics about me as a human being, I think are still on the table. All that they're saying is that you can't go after Joe, himself, directly. And I think that's okay. You know, there's still a lot of things. We could still cure diseases without knowing that I'm Joe Caserta, right? As long as you know everything else about me. And I think that's really at the core, that's what we're trying to do. We're trying to protect the individual and the individual's data about themselves. But I think as far as how it affects data science, you know, a lot of our clients, they're afraid to implement things because they don't exactly understand what the guideline is. And they don't want to go to jail. So they wind up doing nothing. So now that we have something in writing that, at least, it's something that we can work towards, I think is a good thing. >> In many ways, organizations are suffering from the deer in the headlight problem. They don't understand it. And so they just end up frozen in the headlights. But I just want to go back one step if I could. We could get really excited about what it is and is not. But for me, the most critical thing there is to remember though, data breaches are happening. There are over 1,400 data breaches, on average, per day. And most of them are not trivial. And when we saw 1/2 a billion from Yahoo. And then one point one billion and then one point five billion. I mean, think about what that actually means. There were 47,500 Mongodbs breached in an 18 hour window, after an automated upgrade. And they were airlines, they were banks, they were police stations. They were hospitals. So when I think about frameworks like GDPR, I'm less worried about whether I'm going to see ads and be sold stuff. I'm more worried about, and I'll give you one example. My 12 year old son has an account at a platform called Edmodo. Now I'm not going to pick on that brand for any reason but it's a current issue. Something like, I think it was like 19 million children in the world had their username, password, email address, home address, and all this social interaction on this Facebook for kids platform called Edmodo, breached in one night. Now I got my hands on a copy. And everything about my son is there. Now I have a major issue with that. Because I can't do anything to undo that, nothing. The fact that I was able to get a copy, within hours on a dark website, for free. The fact that his first name, last name, email, mobile phone number, all these personal messages from friends. Nobody has the right to allow that to breach on my son. Or your children, or our children. For me, GDPR, is a framework for us to try and behave better about really big issues. Whether it's a socialist issue. Whether someone's got an issue with advertising. I'm actually not interested in that at all. What I'm interested in is companies need to behave much better about the treatment of data when it's the type of data that's being breached. And I get really emotional when it's my son, or someone else's child. Because I don't care if my bank account gets hacked. Because they hedge that. They underwrite and insure themselves and the money arrives back to my bank. But when it's my wife who donated blood and a blood donor website got breached and her details got lost. Even things like sexual preferences. That they ask questions on, is out there. My 12 year old son is out there. Nobody has the right to allow that to happen. For me, GDPR is the framework for us to focus on that. >> Dave: Lillian, is there a comment you have? >> Yeah, I think that, I think that security concerns are 100% and definitely a serious issue. Security needs to be addressed. And I think a lot of the stuff that's happening is due to, I think we need better security personnel. I think we need better people working in the security area where they're actually looking and securing. Because I don't think you can regulate I was just, I wanted to take the microphone back when you were talking about taking someone to jail. Okay, I have a background in law. And if you look at this, you guys are calling it a framework. But it's not a framework. What they're trying to do is take 4% of your business revenues per infraction. They want to say, "If a person signs up "on your email list and you didn't "like, necessarily give whatever "disclaimer that the EU said you need to give. "Per infraction, we're going to take "4% of your business revenue." That's a law, that they're trying to put into place. And you guys are talking about taking people to jail. What jail are you? EU is not a country. What jurisdiction do they have? Like, you're going to take pizza man Joe and put him in the EU jail? Is there an EU jail? Are you going to take them to a UN jail? I mean, it's just on its' face it doesn't hold up to legal tests. I don't understand how they could enforce this. >> I'd like to just answer the question on-- >> Security is a serious issue. I would be extremely upset if I were you. >> I personally know, people who work for companies who've had data breaches. And I respect them all. They're really smart people. They've got 25 plus years in security. And they are shocked that they've allowed a breach to take place. What they've invariably all agreed on is that a whole range of drivers have caused them to get to a bad practice. So then, for example, the donate blood website. The young person who was assist admin with all the right skills and all the right experience just made a basic mistake. They took a db dump of a mysql database before they upgraded their Wordpress website for the business. And they happened to leave it in a folder that was indexable by Google. And so somebody wrote a radio expression to search in Google to find sql backups. Now this person, I personally respect them. I think they're an amazing practitioner. They just made a mistake. So what does that bring us back to? It brings us back to the point that we need a safety net or a framework or whatever you want to call it. Where organizations have checks and balances no matter what they do. Whether it's an upgrade, a backup, a modification, you know. And they all think they do, but invariably we've seen from the hundreds of thousands of breaches, they don't. Now on the point of law, we could debate that all day. I mean the EU does have a remit. If I was caught speeding in Germany, as an Australian, I would be thrown into a German jail. If I got caught as an organization in France, breaching GDPR, I would be held accountable to the law in that region, by the organization pursuing me. So I think it's a bit of a misnomer saying I can't go to an EU jail. I don't disagree with you, totally, but I think it's regional. If I get a speeding fine and break the law of driving fast in EU, it's in the country, in the region, that I'm caught. And I think GDPR's going to be enforced in that same approach. >> All right folks, unfortunately the 60 minutes flew right by. And it does when you have great guests like yourselves. So thank you very much for joining this panel today. And we have an action packed day here. So we're going to cut over. The CUBE is going to have its' interview format starting in about 1/2 hour. And then we cut over to the main tent. Who's on the main tent? Dez, you're doing a main stage presentation today. Data Science is a Team Sport. Hillary Mason, has a breakout session. We also have a breakout session on GDPR and what it means for you. Are you ready for GDPR? Check out ibmgo.com. It's all free content, it's all open. You do have to sign in to see the Hillary Mason and the GDPR sessions. And we'll be back in about 1/2 hour with the CUBE. We'll be running replays all day on SiliconAngle.tv and also ibmgo.com. So thanks for watching everybody. Keep it right there, we'll be back in about 1/2 hour with the CUBE interviews. We're live from Munich, Germany, at Fast Track Your Data. This is Dave Vellante with Jim Kobielus, we'll see you shortly. (electronic music)

Published Date : Jun 24 2017

SUMMARY :

Brought to you by IBM. Really good to see you in Munich. a lot of people to organize and talk about data science. And so, I want to start with sort of can really grasp the concepts I present to them. But I don't know if there's anything you would add? So I'd love to take any questions you have how to get, turn data into value So one of the things, Adam, the reason I'm going to introduce Ronald Van Loon. And on the other hand I'm a blogger I met you on Twitter, you know, and the pace of change, that's just You're in the front lines, helping organizations, Trying to govern when you have And newest member of the SiliconANGLE Media Team. and data science are at the heart of it. It's funny that you excluded deep learning of the workflow of data science And I haven't seen the industry automation, in terms of the core And baking it right into the tools. that's really powering a lot of the rapid leaps forward. What's the distinction? It's like asking people to mine classifieds. to layer, and what you end up with the ability to do higher levels of abstraction. get the result, you also have to And I guess the last part is, Dave: So I'd like to switch gears a little bit and just generally in the community, And this means that it has to be brought on one end to, But Chris you have a-- Look at the major breaches of the last couple years. "I have to spend to protect myself, And that's the way I think about it. and the data are the models themselves. And I think that it's very undisciplined right now, So that you can sell more. And a lot of times they can't fund these transformations. But the first question I like to ask people And then figure out how you map data to it. And after the month, you check, kind of a data broker, the business case rarely So initially, indeed, they don't like to use the data. But do you have anything to add? and deploy it in more areas of the business. There's the whole issue of putting And it's a lot cheaper to store data And then start to build some fully is that the speed to value is just the data and someone else has to manage the problem. So, you know, think of it in terms on that theme, when you think about from IDC that says, "About 43% of the data all aircraft and all carriers have to be, most of the deep learning models like TensorFlow geared to IOT, I'm sorry, go ahead. I mean in the announcement of having "lift and shift to the Cloud." And only the metadata that we need And you can push that to a device. And it could be that you got to I'd like somebody in the panel to And on the other hand, you see that But fill in some of the gaps there. And the right to data transfer. a good chunk of that may have to go away So Lillian, as a consumer this is designed to protect you. I've looked over the GDPR and to me You know, EU overreach in the post Brexit era, But I don't think anyone's going to go to jail, on day one. And so we had this response with ad blocking. And so, GDPR is kind of a response to saying, a boondoggle for lawyers in the EU What's the value in that? With the data that they don't have. leads to a conversion, it doesn't matter who you are And they inferred correctly even to figure out who you are, and what you're doing, And I like the targeted advertising. And I think it's similar to what happened I think no one is going to go to jail. and the money arrives back to my bank. "disclaimer that the EU said you need to give. I would be extremely upset if I were you. And I think GDPR's going to be enforced in that same approach. And it does when you have great guests like yourselves.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
David Floyer	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ronald	PERSON	0.99+
Lillian Pierson	PERSON	0.99+
Dave	PERSON	0.99+
Lillian	PERSON	0.99+
Jim	PERSON	0.99+
Joe Caserta	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dez	PERSON	0.99+
Nebraska	LOCATION	0.99+
Adam	PERSON	0.99+
Europe	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
87,400	QUANTITY	0.99+
Topeka	LOCATION	0.99+
Airbus	ORGANIZATION	0.99+
Thailand	LOCATION	0.99+
Brussels	LOCATION	0.99+
Australia	LOCATION	0.99+
EU	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Dez Blanchfield	PERSON	0.99+
Chris Penn	PERSON	0.99+
Omaha	LOCATION	0.99+
Munich	LOCATION	0.99+
May of 2016	DATE	0.99+
May 25th 2018	DATE	0.99+
Sydney	LOCATION	0.99+
nine	QUANTITY	0.99+
Germany	LOCATION	0.99+
17 pages	QUANTITY	0.99+
Joe	PERSON	0.99+
80%	QUANTITY	0.99+
$89	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
France	LOCATION	0.99+
June 18	DATE	0.99+
83, 81,000	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Ronald Van Loon	PERSON	0.99+
Google	ORGANIZATION	0.99+
USA	LOCATION	0.99+
thousands	QUANTITY	0.99+
2013	DATE	0.99+
one point	QUANTITY	0.99+
100%	QUANTITY	0.99+

Next-Generation Analytics Social Influencer Roundtable - #BigDataNYC 2016 #theCUBE

>> Narrator: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors, CISCO, IBM, NVIDIA, and our ecosystem sponsors, now here's your host, Dave Valante. >> Welcome back to New York City, everybody, this is the Cube, the worldwide leader in live tech coverage, and this is a cube first, we've got a nine person, actually eight person panel of experts, data scientists, all alike. I'm here with my co-host, James Cubelis, who has helped organize this panel of experts. James, welcome. >> Thank you very much, Dave, it's great to be here, and we have some really excellent brain power up there, so I'm going to let them talk. >> Okay, well thank you again-- >> And I'll interject my thoughts now and then, but I want to hear them. >> Okay, great, we know you well, Jim, we know you'll do that, so thank you for that, and appreciate you organizing this. Okay, so what I'm going to do to our panelists is ask you to introduce yourself. I'll introduce you, but tell us a little bit about yourself, and talk a little bit about what data science means to you. A number of you started in the field a long time ago, perhaps data warehouse experts before the term data science was coined. Some of you started probably after Hal Varian said it was the sexiest job in the world. (laughs) So think about how data science has changed and or what it means to you. We're going to start with Greg Piateski, who's from Boston. A Ph.D., KDnuggets, Greg, tell us about yourself and what data science means to you. >> Okay, well thank you Dave and thank you Jim for the invitation. Data science in a sense is the second oldest profession. I think people have this built-in need to find patterns and whatever we find we want to organize the data, but we do it well on a small scale, but we don't do it well on a large scale, so really, data science takes our need and helps us organize what we find, the patterns that we find that are really valid and useful and not just random, I think this is a big challenge of data science. I've actually started in this field before the term Data Science existed. I started as a researcher and organized the first few workshops on data mining and knowledge discovery, and the term data mining became less fashionable, became predictive analytics, now it's data science and it will be something else in a few years. >> Okay, thank you, Eves Mulkearns, Eves, I of course know you from Twitter. A lot of people know you as well. Tell us about your experiences and what data scientist means to you. >> Well, data science to me is if you take the two words, the data and the science, the science it holds a lot of expertise and skills there, it's statistics, it's mathematics, it's understanding the business and putting that together with the digitization of what we have. It's not only the structured data or the unstructured data what you store in the database try to get out and try to understand what is in there, but even video what is coming on and then trying to find, like George already said, the patterns in there and bringing value to the business but looking from a technical perspective, but still linking that to the business insights and you can do that on a technical level, but then you don't know yet what you need to find, or what you're looking for. >> Okay great, thank you. Craig Brown, Cube alum. How many people have been on the Cube actually before? >> I have. >> Okay, good. I always like to ask that question. So Craig, tell us a little bit about your background and, you know, data science, how has it changed, what's it all mean to you? >> Sure, so I'm Craig Brown, I've been in IT for almost 28 years, and that was obviously before the term data science, but I've evolved from, I started out as a developer. And evolved through the data ranks, as I called it, working with data structures, working with data systems, data technologies, and now we're working with data pure and simple. Data science to me is an individual or team of individuals that dissect the data, understand the data, help folks look at the data differently than just the information that, you know, we usually use in reports, and get more insights on, how to utilize it and better leverage it as an asset within an organization. >> Great, thank you Craig, okay, Jennifer Shin? Math is obviously part of being a data scientist. You're good at math I understand. Tell us about yourself. >> Yeah, so I'm a senior principle data scientist at the Nielsen Company. I'm also the founder of 8 Path Solutions, which is a data science, analytics, and technology company, and I'm also on the faculty in the Master of Information and Data Science program at UC Berkeley. So math is part of the IT statistics for data science actually this semester, and I think for me, I consider myself a scientist primarily, and data science is a nice day job to have, right? Something where there's industry need for people with my skill set in the sciences, and data gives us a great way of being able to communicate sort of what we know in science in a way that can be used out there in the real world. I think the best benefit for me is that now that I'm a data scientist, people know what my job is, whereas before, maybe five ten years ago, no one understood what I did. Now, people don't necessarily understand what I do now, but at least they understand kind of what I do, so it's still an improvement. >> Excellent. Thank you Jennifer. Joe Caserta, you're somebody who started in the data warehouse business, and saw that snake swallow a basketball and grow into what we now know as big data, so tell us about yourself. >> So I've been doing data for 30 years now, and I wrote the Data Warehouse ETL Toolkit with Ralph Timbal, which is the best selling book in the industry on preparing data for analytics, and with the big paradigm shift that's happened, you know for me the past seven years has been, instead of preparing data for people to analyze data to make decisions, now we're preparing data for machines to make the decisions, and I think that's the big shift from data analysis to data analytics and data science. >> Great, thank you. Miriam, Miriam Fridell, welcome. >> Thank you. I'm Miriam Fridell, I work for Elder Research, we are a data science consultancy, and I came to data science, sort of through a very circuitous route. I started off as a physicist, went to work as a consultant and software engineer, then became a research analyst, and finally came to data science. And I think one of the most interesting things to me about data science is that it's not simply about building an interesting model and doing some interesting mathematics, or maybe wrangling the data, all of which I love to do, but it's really the entire analytics lifecycle, and a value that you can actually extract from data at the end, and that's one of the things that I enjoy most is seeing a client's eyes light up or a wow, I didn't really know we could look at data that way, that's really interesting. I can actually do something with that, so I think that, to me, is one of the most interesting things about it. >> Great, thank you. Justin Sadeen, welcome. >> Absolutely, than you, thank you. So my name is Justin Sadeen, I work for Morph EDU, an artificial intelligence company in Atlanta, Georgia, and we develop learning platforms for non-profit and private educational institutions. So I'm a Marine Corp veteran turned data enthusiast, and so what I think about data science is the intersection of information, intelligence, and analysis, and I'm really excited about the transition from big data into smart data, and that's what I see data science as. >> Great, and last but not least, Dez Blanchfield, welcome mate. >> Good day. Yeah, I'm the one with the funny accent. So data science for me is probably the funniest job I've ever to describe to my mom. I've had quite a few different jobs, and she's never understood any of them, and this one she understands the least. I think a fun way to describe what we're trying to do in the world of data science and analytics now is it's the equivalent of high altitude mountain climbing. It's like the extreme sport version of the computer science world, because we have to be this magical unicorn of a human that can understand plain english problems from C-suite down and then translate it into code, either as soles or as teams of developers. And so there's this black art that we're expected to be able to transmogrify from something that we just in plain english say I would like to know X, and we have to go and figure it out, so there's this neat extreme sport view I have of rushing down the side of a mountain on a mountain bike and just dodging rocks and trees and things occasionally, because invariably, we do have things that go wrong, and they don't quite give us the answers we want. But I think we're at an interesting point in time now with the explosion in the types of technology that are at our fingertips, and the scale at which we can do things now, once upon a time we would sit at a terminal and write code and just look at data and watch it in columns, and then we ended up with spreadsheet technologies at our fingertips. Nowadays it's quite normal to instantiate a small high performance distributed cluster of computers, effectively a super computer in a public cloud, and throw some data at it and see what comes back. And we can do that on a credit card. So I think we're at a really interesting tipping point now where this coinage of data science needs to be slightly better defined, so that we can help organizations who have weird and strange questions that they want to ask, tell them solutions to those questions, and deliver on them in, I guess, a commodity deliverable. I want to know xyz and I want to know it in this time frame and I want to spend this much amount of money to do it, and I don't really care how you're going to do it. And there's so many tools we can choose from and there's so many platforms we can choose from, it's this little black art of computing, if you'd like, we're effectively making it up as we go in many ways, so I think it's one of the most exciting challenges that I've had, and I think I'm pretty sure I speak for most of us in that we're lucky that we get paid to do this amazing job. That we get make up on a daily basis in some cases. >> Excellent, well okay. So we'll just get right into it. I'm going to go off script-- >> Do they have unicorns down under? I think they have some strange species right? >> Well we put the pointy bit on the back. You guys have in on the front. >> So I was at an IBM event on Friday. It was a chief data officer summit, and I attended what was called the Data Divas' breakfast. It was a women in tech thing, and one of the CDOs, she said that 25% of chief data officers are women, which is much higher than you would normally see in the profile of IT. We happen to have 25% of our panelists are women. Is that common? Miriam and Jennifer, is that common for the data science field? Or is this a higher percentage than you would normally see-- >> James: Or a lower percentage? >> I think certainly for us, we have hired a number of additional women in the last year, and they are phenomenal data scientists. I don't know that I would say, I mean I think it's certainly typical that this is still a male-dominated field, but I think like many male-dominated fields, physics, mathematics, computer science, I think that that is slowly changing and evolving, and I think certainly, that's something that we've noticed in our firm over the years at our consultancy, as we're hiring new people. So I don't know if I would say 25% is the right number, but hopefully we can get it closer to 50. Jennifer, I don't know if you have... >> Yeah, so I know at Nielsen we have actually more than 25% of our team is women, at least the team I work with, so there seems to be a lot of women who are going into the field. Which isn't too surprising, because with a lot of the issues that come up in STEM, one of the reasons why a lot of women drop out is because they want real world jobs and they feel like they want to be in the workforce, and so I think this is a great opportunity with data science being so popular for these women to actually have a job where they can still maintain that engineering and science view background that they learned in school. >> Great, well Hillary Mason, I think, was the first data scientist that I ever interviewed, and I asked her what are the sort of skills required and the first question that we wanted to ask, I just threw other women in tech in there, 'cause we love women in tech, is about this notion of the unicorn data scientist, right? It's been put forth that there's the skill sets required to be a date scientist are so numerous that it's virtually impossible to have a data scientist with all those skills. >> And I love Dez's extreme sports analogy, because that plays into the whole notion of data science, we like to talk about the theme now of data science as a team sport. Must it be an extreme sport is what I'm wondering, you know. The unicorns of the world seem to be... Is that realistic now in this new era? >> I mean when automobiles first came out, they were concerned that there wouldn't be enough chauffeurs to drive all the people around. Is there an analogy with data, to be a data-driven company. Do I need a data scientist, and does that data scientist, you know, need to have these unbelievable mixture of skills? Or are we doomed to always have a skill shortage? Open it up. >> I'd like to have a crack at that, so it's interesting, when automobiles were a thing, when they first bought cars out, and before they, sort of, were modernized by the likes of Ford's Model T, when we got away from the horse and carriage, they actually had human beings walking down the street with a flag warning the public that the horseless carriage was coming, and I think data scientists are very much like that. That we're kind of expected to go ahead of the organization and try and take the challenges we're faced with today and see what's going to come around the corner. And so we're like the little flag-bearers, if you'd like, in many ways of this is where we're at today, tell me where I'm going to be tomorrow, and try and predict the day after as well. It is very much becoming a team sport though. But I think the concept of data science being a unicorn has come about because the coinage hasn't been very well defined, you know, if you were to ask 10 people what a data scientist were, you'd get 11 answers, and I think this is a really challenging issue for hiring managers and C-suites when the generants say I was data science, I want big data, I want an analyst. They don't actually really know what they're asking for. Generally, if you ask for a database administrator, it's a well-described job spec, and you can just advertise it and some 20 people will turn up and you interview to decide whether you like the look and feel and smell of 'em. When you ask for a data scientist, there's 20 different definitions of what that one data science role could be. So we don't initially know what the job is, we don't know what the deliverable is, and we're still trying to figure that out, so yeah. >> Craig what about you? >> So from my experience, when we talk about data science, we're really talking about a collection of experiences with multiple people I've yet to find, at least from my experience, a data science effort with a lone wolf. So you're talking about a combination of skills, and so you don't have, no one individual needs to have all that makes a data scientist a data scientist, but you definitely have to have the right combination of skills amongst a team in order to accomplish the goals of data science team. So from my experiences and from the clients that I've worked with, we refer to the data science effort as a data science team. And I believe that's very appropriate to the team sport analogy. >> For us, we look at a data scientist as a full stack web developer, a jack of all trades, I mean they need to have a multitude of background coming from a programmer from an analyst. You can't find one subject matter expert, it's very difficult. And if you're able to find a subject matter expert, you know, through the lifecycle of product development, you're going to require that individual to interact with a number of other members from your team who are analysts and then you just end up well training this person to be, again, a jack of all trades, so it comes full circle. >> I own a business that does nothing but data solutions, and we've been in business 15 years, and it's been, the transition over time has been going from being a conventional wisdom run company with a bunch of experts at the top to becoming more of a data-driven company using data warehousing and BI, but now the trend is absolutely analytics driven. So if you're not becoming an analytics-driven company, you are going to be behind the curve very very soon, and it's interesting that IBM is now coining the phrase of a cognitive business. I think that is absolutely the future. If you're not a cognitive business from a technology perspective, and an analytics-driven perspective, you're going to be left behind, that's for sure. So in order to stay competitive, you know, you need to really think about data science think about how you're using your data, and I also see that what's considered the data expert has evolved over time too where it used to be just someone really good at writing SQL, or someone really good at writing queries in any language, but now it's becoming more of a interdisciplinary action where you need soft skills and you also need the hard skills, and that's why I think there's more females in the industry now than ever. Because you really need to have a really broad width of experiences that really wasn't required in the past. >> Greg Piateski, you have a comment? >> So there are not too many unicorns in nature or as data scientists, so I think organizations that want to hire data scientists have to look for teams, and there are a few unicorns like Hillary Mason or maybe Osama Faiat, but they generally tend to start companies and very hard to retain them as data scientists. What I see is in other evolution, automation, and you know, steps like IBM, Watson, the first platform is eventually a great advance for data scientists in the short term, but probably what's likely to happen in the longer term kind of more and more of those skills becoming subsumed by machine unique layer within the software. How long will it take, I don't know, but I have a feeling that the paradise for data scientists may not be very long lived. >> Greg, I have a follow up question to what I just heard you say. When a data scientist, let's say a unicorn data scientist starts a company, as you've phrased it, and the company's product is built on data science, do they give up becoming a data scientist in the process? It would seem that they become a data scientist of a higher order if they've built a product based on that knowledge. What is your thoughts on that? >> Well, I know a few people like that, so I think maybe they remain data scientists at heart, but they don't really have the time to do the analysis and they really have to focus more on strategic things. For example, today actually is the birthday of Google, 18 years ago, so Larry Page and Sergey Brin wrote a very influential paper back in the '90s About page rank. Have they remained data scientist, perhaps a very very small part, but that's not really what they do, so I think those unicorn data scientists could quickly evolve to have to look for really teams to capture those skills. >> Clearly they come to a point in their career where they build a company based on teams of data scientists and data engineers and so forth, which relates to the topic of team data science. What is the right division of roles and responsibilities for team data science? >> Before we go, Jennifer, did you have a comment on that? >> Yeah, so I guess I would say for me, when data science came out and there was, you know, the Venn Diagram that came out about all the skills you were supposed to have? I took a very different approach than all of the people who I knew who were going into data science. Most people started interviewing immediately, they were like this is great, I'm going to get a job. I went and learned how to develop applications, and learned computer science, 'cause I had never taken a computer science course in college, and made sure I trued up that one part where I didn't know these things or had the skills from school, so I went headfirst and just learned it, and then now I have actually a lot of technology patents as a result of that. So to answer Jim's question, actually. I started my company about five years ago. And originally started out as a consulting firm slash data science company, then it evolved, and one of the reasons I went back in the industry and now I'm at Nielsen is because you really can't do the same sort of data science work when you're actually doing product development. It's a very very different sort of world. You know, when you're developing a product you're developing a core feature or functionality that you're going to offer clients and customers, so I think definitely you really don't get to have that wide range of sort of looking at 8 million models and testing things out. That flexibility really isn't there as your product starts getting developed. >> Before we go into the team sport, the hard skills that you have, are you all good at math? Are you all computer science types? How about math? Are you all math? >> What were your GPAs? (laughs) >> David: Anybody not math oriented? Anybody not love math? You don't love math? >> I love math, I think it's required. >> David: So math yes, check. >> You dream in equations, right? You dream. >> Computer science? Do I have to have computer science skills? At least the basic knowledge? >> I don't know that you need to have formal classes in any of these things, but I think certainly as Jennifer was saying, if you have no skills in programming whatsoever and you have no interest in learning how to write SQL queries or RR Python, you're probably going to struggle a little bit. >> James: It would be a challenge. >> So I think yes, I have a Ph.D. in physics, I did a lot of math, it's my love language, but I think you don't necessarily need to have formal training in all of these things, but I think you need to have a curiosity and a love of learning, and so if you don't have that, you still want to learn and however you gain that knowledge I think, but yeah, if you have no technical interests whatsoever, and don't want to write a line of code, maybe data science is not the field for you. Even if you don't do it everyday. >> And statistics as well? You would put that in that same general category? How about data hacking? You got to love data hacking, is that fair? Eaves, you have a comment? >> Yeah, I think so, while we've been discussing that for me, the most important part is that you have a logical mind and you have the capability to absorb new things and the curiosity you need to dive into that. While I don't have an education in IT or whatever, I have a background in chemistry and those things that I learned there, I apply to information technology as well, and from a part that you say, okay, I'm a tech-savvy guy, I'm interested in the tech part of it, you need to speak that business language and if you can do that crossover and understand what other skill sets or parts of the roles are telling you I think the communication in that aspect is very important. >> I'd like throw just something really quickly, and I think there's an interesting thing that happens in IT, particularly around technology. We tend to forget that we've actually solved a lot of these problems in the past. If we look in history, if we look around the second World War, and Bletchley Park in the UK, where you had a very similar experience as humans that we're having currently around the whole issue of data science, so there was an interesting challenge with the enigma in the shark code, right? And there was a bunch of men put in a room and told, you're mathematicians and you come from universities, and you can crack codes, but they couldn't. And so what they ended up doing was running these ads, and putting challenges, they actually put, I think it was crossword puzzles in the newspaper, and this deluge of women came out of all kinds of different roles without math degrees, without science degrees, but could solve problems, and they were thrown at the challenge of cracking codes, and invariably, they did the heavy lifting. On a daily basis for converting messages from one format to another, so that this very small team at the end could actually get in play with the sexy piece of it. And I think we're going through a similar shift now with what we're refer to as data science in the technology and business world. Where the people who are doing the heavy lifting aren't necessarily what we'd think of as the traditional data scientists, and so, there have been some unicorns and we've championed them, and they're great. But I think the shift's going to be to accountants, actuaries, and statisticians who understand the business, and come from an MBA star background that can learn the relevant pieces of math and models that we need to to apply to get the data science outcome. I think we've already been here, we've solved this problem, we've just got to learn not to try and reinvent the wheel, 'cause the media hypes this whole thing of data science is exciting and new, but we've been here a couple times before, and there's a lot to be learned from that, my view. >> I think we had Joe next. >> Yeah, so I was going to say that, data science is a funny thing. To use the word science is kind of a misnomer, because there is definitely a level of art to it, and I like to use the analogy, when Michelangelo would look at a block of marble, everyone else looked at the block of marble to see a block of marble. He looks at a block of marble and he sees a finished sculpture, and then he figures out what tools do I need to actually make my vision? And I think data science is a lot like that. We hear a problem, we see the solution, and then we just need the right tools to do it, and I think part of consulting and data science in particular. It's not so much what we know out of the gate, but it's how quickly we learn. And I think everyone here, what makes them brilliant, is how quickly they could learn any tool that they need to see their vision get accomplished. >> David: Justin? >> Yeah, I think you make a really great point, for me, I'm a Marine Corp veteran, and the reason I mentioned that is 'cause I work with two veterans who are problem solvers. And I think that's what data scientists really are, in the long run are problem solvers, and you mentioned a great point that, yeah, I think just problem solving is the key. You don't have to be a subject matter expert, just be able to take the tools and intelligently use them. >> Now when you look at the whole notion of team data science, what is the right mix of roles, like role definitions within a high-quality or a high-preforming data science teams now IBM, with, of course, our announcement of project, data works and so forth. We're splitting the role division, in terms of data scientist versus data engineers versus application developer versus business analyst, is that the right breakdown of roles? Or what would the panelists recommend in terms of understanding what kind of roles make sense within, like I said, a high performing team that's looking for trying to develop applications that depend on data, machine learning, and so forth? Anybody want to? >> I'll tackle that. So the teams that I have created over the years made up these data science teams that I brought into customer sites have a combination of developer capabilities and some of them are IT developers, but some of them were developers of things other than applications. They designed buildings, they did other things with their technical expertise besides building technology. The other piece besides the developer is the analytics, and analytics can be taught as long as they understand how algorithms work and the code behind the analytics, in other words, how are we analyzing things, and from a data science perspective, we are leveraging technology to do the analyzing through the tool sets, so ultimately as long as they understand how tool sets work, then we can train them on the tools. Having that analytic background is an important piece. >> Craig, is it easier to, I'll go to you in a moment Joe, is it easier to cross train a data scientist to be an app developer, than to cross train an app developer to be a data scientist or does it not matter? >> Yes. (laughs) And not the other way around. It depends on the-- >> It's easier to cross train a data scientist to be an app developer than-- >> Yes. >> The other way around. Why is that? >> Developing code can be as difficult as the tool set one uses to develop code. Today's tool sets are very user friendly. where developing code is very difficult to teach a person to think along the lines of developing code when they don't have any idea of the aspects of code, of building something. >> I think it was Joe, or you next, or Jennifer, who was it? >> I would say that one of the reasons for that is data scientists will probably know if the answer's right after you process data, whereas data engineer might be able to manipulate the data but may not know if the answer's correct. So I think that is one of the reasons why having a data scientist learn the application development skills might be a easier time than the other way around. >> I think Miriam, had a comment? Sorry. >> I think that what we're advising our clients to do is to not think, before data science and before analytics became so required by companies to stay competitive, it was more of a waterfall, you have a data engineer build a solution, you know, then you throw it over the fence and the business analyst would have at it, where now, it must be agile, and you must have a scrum team where you have the data scientist and the data engineer and the project manager and the product owner and someone from the chief data office all at the table at the same time and all accomplishing the same goal. Because all of these skills are required, collectively in order to solve this problem, and it can't be done daisy chained anymore it has to be a collaboration. And that's why I think spark is so awesome, because you know, spark is a single interface that a data engineer can use, a data analyst can use, and a data scientist can use. And now with what we've learned today, having a data catalog on top so that the chief data office can actually manage it, I think is really going to take spark to the next level. >> James: Miriam? >> I wanted to comment on your question to Craig about is it harder to teach a data scientist to build an application or vice versa, and one of the things that we have worked on a lot in our data science team is incorporating a lot of best practices from software development, agile, scrum, that sort of thing, and I think particularly with a focus on deploying models that we don't just want to build an interesting data science model, we want to deploy it, and get some value. You need to really incorporate these processes from someone who might know how to build applications and that, I think for some data scientists can be a challenge, because one of the fun things about data science is you get to get into the data, and you get your hands dirty, and you build a model, and you get to try all these cool things, but then when the time comes for you to actually deploy something, you need deployment-grade code in order to make sure it can go into production at your client side and be useful for instance, so I think that there's an interesting challenge on both ends, but one of the things I've definitely noticed with some of our data scientists is it's very hard to get them to think in that mindset, which is why you have a team of people, because everyone has different skills and you can mitigate that. >> Dev-ops for data science? >> Yeah, exactly. We call it insight ops, but yeah, I hear what you're saying. Data science is becoming increasingly an operational function as opposed to strictly exploratory or developmental. Did some one else have a, Dez? >> One of the things I was going to mention, one of the things I like to do when someone gives me a new problem is take all the laptops and phones away. And we just end up in a room with a whiteboard. And developers find that challenging sometimes, so I had this one line where I said to them don't write the first line of code until you actually understand the problem you're trying to solve right? And I think where the data science focus has changed the game for organizations who are trying to get some systematic repeatable process that they can throw data at and just keep getting answers and things, no matter what the industry might be is that developers will come with a particular mindset on how they're going to codify something without necessarily getting the full spectrum and understanding the problem first place. What I'm finding is the people that come at data science tend to have more of a hacker ethic. They want to hack the problem, they want to understand the challenge, and they want to be able to get it down to plain English simple phrases, and then apply some algorithms and then build models, and then codify it, and so most of the time we sit in a room with whiteboard markers just trying to build a model in a graphical sense and make sure it's going to work and that it's going to flow, and once we can do that, we can codify it. I think when you come at it from the other angle from the developer ethic, and you're like I'm just going to codify this from day one, I'm going to write code. I'm going to hack this thing out and it's just going to run and compile. Often, you don't truly understand what he's trying to get to at the end point, and you can just spend days writing code and I think someone made the comment that sometimes you don't actually know whether the output is actually accurate in the first place. So I think there's a lot of value being provided from the data science practice. Over understanding the problem in plain english at a team level, so what am I trying to do from the business consulting point of view? What are the requirements? How do I build this model? How do I test the model? How do I run a sample set through it? Train the thing and then make sure what I'm going to codify actually makes sense in the first place, because otherwise, what are you trying to solve in the first place? >> Wasn't that Einstein who said if I had an hour to solve a problem, I'd spend 55 minutes understanding the problem and five minutes on the solution, right? It's exactly what you're talking about. >> Well I think, I will say, getting back to the question, the thing with building these teams, I think a lot of times people don't talk about is that engineers are actually very very important for data science projects and data science problems. For instance, if you were just trying to prototype something or just come up with a model, then data science teams are great, however, if you need to actually put that into production, that code that the data scientist has written may not be optimal, so as we scale out, it may be actually very inefficient. At that point, you kind of want an engineer to step in and actually optimize that code, so I think it depends on what you're building and that kind of dictates what kind of division you want among your teammates, but I do think that a lot of times, the engineering component is really undervalued out there. >> Jennifer, it seems that the data engineering function, data discovery and preparation and so forth is becoming automated to a greater degree, but if I'm listening to you, I don't hear that data engineering as a discipline is becoming extinct in terms of a role that people can be hired into. You're saying that there's a strong ongoing need for data engineers to optimize the entire pipeline to deliver the fruits of data science in production applications, is that correct? So they play that very much operational role as the backbone for... >> So I think a lot of times businesses will go to data scientist to build a better model to build a predictive model, but that model may not be something that you really want to implement out there when there's like a million users coming to your website, 'cause it may not be efficient, it may take a very long time, so I think in that sense, it is important to have good engineers, and your whole product may fail, you may build the best model it may have the best output, but if you can't actually implement it, then really what good is it? >> What about calibrating these models? How do you go about doing that and sort of testing that in the real world? Has that changed overtime? Or is it... >> So one of the things that I think can happen, and we found with one of our clients is when you build a model, you do it with the data that you have, and you try to use a very robust cross-validation process to make sure that it's robust and it's sturdy, but one thing that can sometimes happen is after you put your model into production, there can be external factors that, societal or whatever, things that have nothing to do with the data that you have or the quality of the data or the quality of the model, which can actually erode the model's performance over time. So as an example, we think about cell phone contracts right? Those have changed a lot over the years, so maybe five years ago, the type of data plan you had might not be the same that it is today, because a totally different type of plan is offered, so if you're building a model on that to say predict who's going to leave and go to a different cell phone carrier, the validity of your model overtime is going to completely degrade based on nothing that you have, that you put into the model or the data that was available, so I think you need to have this sort of model management and monitoring process to take this factors into account and then know when it's time to do a refresh. >> Cross-validation, even at one point in time, for example, there was an article in the New York Times recently that they gave the same data set to five different data scientists, this is survey data for the presidential election that's upcoming, and five different data scientists came to five different predictions. They were all high quality data scientists, the cross-validation showed a wide variation about who was on top, whether it was Hillary or whether it was Trump so that shows you that even at any point in time, cross-validation is essential to understand how robust the predictions might be. Does somebody else have a comment? Joe? >> I just want to say that this even drives home the fact that having the scrum team for each project and having the engineer and the data scientist, data engineer and data scientist working side by side because it is important that whatever we're building we assume will eventually go into production, and we used to have in the data warehousing world, you'd get the data out of the systems, out of your applications, you do analysis on your data, and the nirvana was maybe that data would go back to the system, but typically it didn't. Nowadays, the applications are dependent on the insight coming from the data science team. With the behavior of the application and the personalization and individual experience for a customer is highly dependent, so it has to be, you said is data science part of the dev-ops team, absolutely now, it has to be. >> Whose job is it to figure out the way in which the data is presented to the business? Where's the sort of presentation, the visualization plan, is that the data scientist role? Does that depend on whether or not you have that gene? Do you need a UI person on your team? Where does that fit? >> Wow, good question. >> Well usually that's the output, I mean, once you get to the point where you're visualizing the data, you've created an algorithm or some sort of code that produces that to be visualized, so at the end of the day that the customers can see what all the fuss is about from a data science perspective. But it's usually post the data science component. >> So do you run into situations where you can see it and it's blatantly obvious, but it doesn't necessarily translate to the business? >> Well there's an interesting challenge with data, and we throw the word data around a lot, and I've got this fun line I like throwing out there. If you torture data long enough, it will talk. So the challenge then is to figure out when to stop torturing it, right? And it's the same with models, and so I think in many other parts of organizations, we'll take something, if someone's doing a financial report on performance of the organization and they're doing it in a spreadsheet, they'll get two or three peers to review it, and validate that they've come up with a working model and the answer actually makes sense. And I think we're rushing so quickly at doing analysis on data that comes to us in various formats and high velocity that I think it's very important for us to actually stop and do peer reviews, of the models and the data and the output as well, because otherwise we start making decisions very quickly about things that may or may not be true. It's very easy to get the data to paint any picture you want, and you gave the example of the five different attempts at that thing, and I had this shoot out thing as well where I'll take in a team, I'll get two different people to do exactly the same thing in completely different rooms, and come back and challenge each other, and it's quite amazing to see the looks on their faces when they're like, oh, I didn't see that, and then go back and do it again until, and then just keep iterating until we get to the point where they both get the same outcome, in fact there's a really interesting anecdote about when the UNIX operation system was being written, and a couple of the authors went away and wrote the same program without realizing that each other were doing it, and when they came back, they actually had line for line, the same piece of C code, 'cause they'd actually gotten to a truth. A perfect version of that program, and I think we need to often look at, when we're building models and playing with data, if we can't come at it from different angles, and get the same answer, then maybe the answer isn't quite true yet, so there's a lot of risk in that. And it's the same with presentation, you know, you can paint any picture you want with the dashboard, but who's actually validating when the dashboard's painting the correct picture? >> James: Go ahead, please. >> There is a science actually, behind data visualization, you know if you're doing trending, it's a line graph, if you're doing comparative analysis, it's bar graph, if you're doing percentages, it's a pie chart, like there is a certain science to it, it's not that much of a mystery as the novice thinks there is, but what makes it challenging is that you also, just like any presentation, you have to consider your audience. And your audience, whenever we're delivering a solution, either insight, or just data in a grid, we really have to consider who is the consumer of this data, and actually cater the visual to that person or to that particular audience. And that is part of the art, and that is what makes a great data scientist. >> The consumer may in fact be the source of the data itself, like in a mobile app, so you're tuning their visualization and then their behavior is changing as a result, and then the data on their changed behavior comes back, so it can be a circular process. >> So Jim, at a recent conference, you were tweeting about the citizen data scientist, and you got emasculated by-- >> I spoke there too. >> Okay. >> TWI on that same topic, I got-- >> Kirk Borne I hear came after you. >> Kirk meant-- >> Called foul, flag on the play. >> Kirk meant well. I love Claudia Emahoff too, but yeah, it's a controversial topic. >> So I wonder what our panel thinks of that notion, citizen data scientist. >> Can I respond about citizen data scientists? >> David: Yeah, please. >> I think this term was introduced by Gartner analyst in 2015, and I think it's a very dangerous and misleading term. I think definitely we want to democratize the data and have access to more people, not just data scientists, but managers, BI analysts, but when there is already a term for such people, we can call the business analysts, because it implies some training, some understanding of the data. If you use the term citizen data scientist, it implies that without any training you take some data and then you find something there, and they think as Dev's mentioned, we've seen many examples, very easy to find completely spurious random correlations in data. So we don't want citizen dentists to treat our teeth or citizen pilots to fly planes, and if data's important, having citizen data scientists is equally dangerous, so I'm hoping that, I think actually Gartner did not use the term citizen data scientist in their 2016 hype course, so hopefully they will put this term to rest. >> So Gregory, you apparently are defining citizen to mean incompetent as opposed to simply self-starting. >> Well self-starting is very different, but that's not what I think what was the intention. I think what we see in terms of data democratization, there is a big trend over automation. There are many tools, for example there are many companies like Data Robot, probably IBM, has interesting machine learning capability towards automation, so I think I recently started a page on KDnuggets for automated data science solutions, and there are already 20 different forums that provide different levels of automation. So one can deliver in full automation maybe some expertise, but it's very dangerous to have part of an automated tool and at some point then ask citizen data scientists to try to take the wheels. >> I want to chime in on that. >> David: Yeah, pile on. >> I totally agree with all of that. I think the comment I just want to quickly put out there is that the space we're in is a very young, and rapidly changing world, and so what we haven't had yet is this time to stop and take a deep breath and actually define ourselves, so if you look at computer science in general, a lot of the traditional roles have sort of had 10 or 20 years of history, and so thorough the hiring process, and the development of those spaces, we've actually had time to breath and define what those jobs are, so we know what a systems programmer is, and we know what a database administrator is, but we haven't yet had a chance as a community to stop and breath and say, well what do we think these roles are, and so to fill that void, the media creates coinages, and I think this is the risk we've got now that the concept of a data scientist was just a term that was coined to fill a void, because no one quite knew what to call somebody who didn't come from a data science background if they were tinkering around data science, and I think that's something that we need to sort of sit up and pay attention to, because if we don't own that and drive it ourselves, then somebody else is going to fill the void and they'll create these very frustrating concepts like data scientist, which drives us all crazy. >> James: Miriam's next. >> So I wanted to comment, I agree with both of the previous comments, but in terms of a citizen data scientist, and I think whether or not you're citizen data scientist or an actual data scientist whatever that means, I think one of the most important things you can have is a sense of skepticism, right? Because you can get spurious correlations and it's like wow, my predictive model is so excellent, you know? And being aware of things like leaks from the future, right? This actually isn't predictive at all, it's a result of the thing I'm trying to predict, and so I think one thing I know that we try and do is if something really looks too good, we need to go back in and make sure, did we not look at the data correctly? Is something missing? Did we have a problem with the ETL? And so I think that a healthy sense of skepticism is important to make sure that you're not taking a spurious correlation and trying to derive some significant meaning from it. >> I think there's a Dilbert cartoon that I saw that described that very well. Joe, did you have a comment? >> I think that in order for citizen data scientists to really exist, I think we do need to have more maturity in the tools that they would use. My vision is that the BI tools of today are all going to be replaced with natural language processing and searching, you know, just be able to open up a search bar and say give me sales by region, and to take that one step into the future even further, it should actually say what are my sales going to be next year? And it should trigger a simple linear regression or be able to say which features of the televisions are actually affecting sales and do a clustering algorithm, you know I think hopefully that will be the future, but I don't see anything of that today, and I think in order to have a true citizen data scientist, you would need to have that, and that is pretty sophisticated stuff. >> I think for me, the idea of citizen data scientist I can relate to that, for instance, when I was in graduate school, I started doing some research on FDA data. It was an open source data set about 4.2 million data points. Technically when I graduated, the paper was still not published, and so in some sense, you could think of me as a citizen data scientist, right? I wasn't getting funding, I wasn't doing it for school, but I was still continuing my research, so I'd like to hope that with all the new data sources out there that there might be scientists or people who are maybe kept out of a field people who wanted to be in STEM and for whatever life circumstance couldn't be in it. That they might be encouraged to actually go and look into the data and maybe build better models or validate information that's out there. >> So Justin, I'm sorry you had one comment? >> It seems data science was termed before academia adopted formalized training for data science. But yeah, you can make, like Dez said, you can make data work for whatever problem you're trying to solve, whatever answer you see, you want data to work around it, you can make it happen. And I kind of consider that like in project management, like data creep, so you're so hyper focused on a solution you're trying to find the answer that you create an answer that works for that solution, but it may not be the correct answer, and I think the crossover discussion works well for that case. >> So but the term comes up 'cause there's a frustration I guess, right? That data science skills are not plentiful, and it's potentially a bottleneck in an organization. Supposedly 80% of your time is spent on cleaning data, is that right? Is that fair? So there's a problem. How much of that can be automated and when? >> I'll have a shot at that. So I think there's a shift that's going to come about where we're going to move from centralized data sets to data at the edge of the network, and this is something that's happening very quickly now where we can't just hold everything back to a central spot. When the internet of things actually wakes up. Things like the Boeing Dreamliner 787, that things got 6,000 sensors in it, produces half a terabyte of data per flight. There are 87,400 flights per day in domestic airspace in the U.S. That's 43.5 petabytes of raw data, now that's about three years worth of disk manufacturing in total, right? We're never going to copy that across one place, we can't process, so I think the challenge we've got ahead of us is looking at how we're going to move the intelligence and the analytics to the edge of the network and pre-cook the data in different tiers, so have a look at the raw material we get, and boil it down to a slightly smaller data set, bring a meta data version of that back, and eventually get to the point where we've only got the very minimum data set and data points we need to make key decisions. Without that, we're already at the point where we have too much data, and we can't munch it fast enough, and we can't spin off enough tin even if we witch the cloud on, and that's just this never ending deluge of noise, right? And you've got that signal versus noise problem so then we're now seeing a shift where people looking at how do we move the intelligence back to the edge of network which we actually solved some time ago in the securities space. You know, spam filtering, if an emails hits Google on the west coast of the U.S. and they create a check some for that spam email, it immediately goes into a database, and nothing gets on the opposite side of the coast, because they already know it's spam. They recognize that email coming in, that's evil, stop it. So we've already fixed its insecurity with intrusion detection, we've fixed it in spam, so we now need to take that learning, and bring it into business analytics, if you like, and see where we're finding patterns and behavior, and brew that out to the edge of the network, so if I'm seeing a demand over here for tickets on a new sale of a show, I need to be able to see where else I'm going to see that demand and start responding to that before the demand comes about. I think that's a shift that we're going to see quickly, because we'll never keep up with the data munching challenge and the volume's just going to explode. >> David: We just have a couple minutes. >> That does sound like a great topic for a future Cube panel which is data science on the edge of the fog. >> I got a hundred questions around that. So we're wrapping up here. Just got a couple minutes. Final thoughts on this conversation or any other pieces that you want to punctuate. >> I think one thing that's been really interesting for me being on this panel is hearing all of my co-panelists talking about common themes and things that we are also experiencing which isn't a surprise, but it's interesting to hear about how ubiquitous some of the challenges are, and also at the announcement earlier today, some of the things that they're talking about and thinking about, we're also talking about and thinking about. So I think it's great to hear we're all in different countries and different places, but we're experiencing a lot of the same challenges, and I think that's been really interesting for me to hear about. >> David: Great, anybody else, final thoughts? >> To echo Dez's thoughts, it's about we're never going to catch up with the amount of data that's produced, so it's about transforming big data into smart data. >> I could just say that with the shift from normal data, small data, to big data, the answer is automate, automate, automate, and we've been talking about advanced algorithms and machine learning for the science for changing the business, but there also needs to be machine learning and advanced algorithms for the backroom where we're actually getting smarter about how we ingestate and how we fix data as it comes in. Because we can actually train the machines to understand data anomalies and what we want to do with them over time. And I think the further upstream we get of data correction, the less work there will be downstream. And I also think that the concept of being able to fix data at the source is gone, that's behind us. Right now the data that we're using to analyze to change the business, typically we have no control over. Like Dez said, they're coming from censors and machines and internet of things and if it's wrong, it's always going to be wrong, so we have to figure out how to do that in our laboratory. >> Eaves, final thoughts? >> I think it's a mind shift being a data scientist if you look back at the time why did you start developing or writing code? Because you like to code, whatever, just for the sake of building a nice algorithm or a piece of software, or whatever, and now I think with the spirit of a data scientist, you're looking at a problem and say this is where I want to go, so you have more the top down approach than the bottom up approach. And have the big picture and that is what you really need as a data scientist, just look across technologies, look across departments, look across everything, and then on top of that, try to apply as much skills as you have available, and that's kind of unicorn that they're trying to look for, because it's pretty hard to find people with that wide vision on everything that is happening within the company, so you need to be aware of technology, you need to be aware of how a business is run, and how it fits within a cultural environment, you have to work with people and all those things together to my belief to make it very difficult to find those good data scientists. >> Jim? Your final thought? >> My final thoughts is this is an awesome panel, and I'm so glad that you've come to New York, and I'm hoping that you all stay, of course, for the the IBM Data First launch event that will take place this evening about a block over at Hudson Mercantile, so that's pretty much it. Thank you, I really learned a lot. >> I want to second Jim's thanks, really, great panel. Awesome expertise, really appreciate you taking the time, and thanks to the folks at IBM for putting this together. >> And I'm big fans of most of you, all of you, on this session here, so it's great just to meet you in person, thank you. >> Okay, and I want to thank Jeff Frick for being a human curtain there with the sun setting here in New York City. Well thanks very much for watching, we are going to be across the street at the IBM announcement, we're going to be on the ground. We open up again tomorrow at 9:30 at Big Data NYC, Big Data Week, Strata plus the Hadoop World, thanks for watching everybody, that's a wrap from here. This is the Cube, we're out. (techno music)

Published Date : Sep 28 2016

SUMMARY :

Brought to you by headline sponsors, and this is a cube first, and we have some really but I want to hear them. and appreciate you organizing this. and the term data mining Eves, I of course know you from Twitter. and you can do that on a technical level, How many people have been on the Cube I always like to ask that question. and that was obviously Great, thank you Craig, and I'm also on the faculty and saw that snake swallow a basketball and with the big paradigm Great, thank you. and I came to data science, Great, thank you. and so what I think about data science Great, and last but not least, and the scale at which I'm going to go off script-- You guys have in on the front. and one of the CDOs, she said that 25% and I think certainly, that's and so I think this is a great opportunity and the first question talk about the theme now and does that data scientist, you know, and you can just advertise and from the clients I mean they need to have and it's been, the transition over time but I have a feeling that the paradise and the company's product and they really have to focus What is the right division and one of the reasons I You dream in equations, right? and you have no interest in learning but I think you need to and the curiosity you and there's a lot to be and I like to use the analogy, and the reason I mentioned that is that the right breakdown of roles? and the code behind the analytics, And not the other way around. Why is that? idea of the aspects of code, of the reasons for that I think Miriam, had a comment? and someone from the chief data office and one of the things that an operational function as opposed to and so most of the time and five minutes on the solution, right? that code that the data but if I'm listening to you, that in the real world? the data that you have or so that shows you that and the nirvana was maybe that the customers can see and a couple of the authors went away and actually cater the of the data itself, like in a mobile app, I love Claudia Emahoff too, of that notion, citizen data scientist. and have access to more people, to mean incompetent as opposed to and at some point then ask and the development of those spaces, and so I think one thing I think there's a and I think in order to have a true so I'd like to hope that with all the new and I think So but the term comes up and the analytics to of the fog. or any other pieces that you want to and also at the so it's about transforming big data and machine learning for the science and now I think with the and I'm hoping that you and thanks to the folks at IBM so it's great just to meet you in person, This is the Cube, we're out.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Jennifer Shin	PERSON	0.99+
Miriam Fridell	PERSON	0.99+
Greg Piateski	PERSON	0.99+
Justin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Jeff Frick	PERSON	0.99+
2015	DATE	0.99+
Joe Caserta	PERSON	0.99+
James Cubelis	PERSON	0.99+
James	PERSON	0.99+
Miriam	PERSON	0.99+
Jim	PERSON	0.99+
Joe	PERSON	0.99+
Claudia Emahoff	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Hillary	PERSON	0.99+
New York	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
Justin Sadeen	PERSON	0.99+
Greg	PERSON	0.99+
Dave	PERSON	0.99+
55 minutes	QUANTITY	0.99+
Trump	PERSON	0.99+
2016	DATE	0.99+
Craig	PERSON	0.99+
Dave Valante	PERSON	0.99+
George	PERSON	0.99+
Dez Blanchfield	PERSON	0.99+
UK	LOCATION	0.99+
Ford	ORGANIZATION	0.99+
Craig Brown	PERSON	0.99+
10	QUANTITY	0.99+
8 Path Solutions	ORGANIZATION	0.99+
CISCO	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Kirk	PERSON	0.99+
25%	QUANTITY	0.99+
Marine Corp	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
43.5 petabytes	QUANTITY	0.99+
Boston	LOCATION	0.99+
Data Robot	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
Einstein	PERSON	0.99+
New York City	LOCATION	0.99+
Nielsen	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Friday	DATE	0.99+
Ralph Timbal	PERSON	0.99+
U.S.	LOCATION	0.99+
6,000 sensors	QUANTITY	0.99+
UC Berkeley	ORGANIZATION	0.99+
Sergey Brin	PERSON	0.99+

Caitlin Lepech & Dave Schubmehl - IBM Chief Data Officer Strategy Summit - #IBMCDO - #theCUBE

>> live from Boston, Massachusetts. >> It's the Cube >> covering IBM Chief Data Officer Strategy Summit brought to you by IBM. Now, here are your hosts. Day villain Day and >> stew minimum. Welcome back to Boston, everybody. This is the IBM Chief Data Officer Summit. And this is the Cube, the worldwide leader in live tech coverage. Caitlin Lepic is here. She's an executive within the chief data officer office at IBM. And she's joined by Dave Shoot Mel, who's a research director at, uh D. C. And he covers cognitive systems and content analytics. Folks, welcome to the Cube. Good to see you. Thank you. Can't. Then we'll start with you. You were You kicked off the morning and I referenced the Forbes article or CDOs. Miracle workers. That's great. I hadn't read that article. You put up their scanned it very quickly, but you set up the event. It started yesterday afternoon at noon. You're going through, uh, this afternoon? What's it all about? This is evolved. Since, what, 2014 >> it has, um, we started our first CDO summit back in 2014. And at that time, we estimated there were maybe 200 or so CDOs worldwide, give or take and we had 30, 30 people at our first event. and we joked that we had one small corner of the conference room and we were really quite excited to start the event in 30 2014. And we've really grown. So this year we have about 170 folks joining us, 70 of which are CEOs, more acting, the studios in the organization. And so we've really been able to grow the community over the last two years and are really excited to see to see how we can continue to do that moving forward. >> And IBM has always had a big presence at the conference that we've covered the CDO event. So that's nice that you can leverage that community and continue to cultivate it. Didn't want to ask you, so it used that we were talking when we first met this morning. It used to be dated was such a wonky topic, you know, data was data value. People would try to put a value on data, and but it was just a really kind of boring but important topic. Now it's front and center with cognitive with analytics. What are you seeing in the marketplace. >> Yeah, I think. Well, what we're seeing in the market is this emphasis on predictive applications, predictive analytics, cognitive applications, artificial intelligence of deep learning. All of those those types of applications are derived and really run by data. So unless you have really good authoritative data to actually make these models work, you know, the systems aren't going to be effective. So we're seeing an emerging marketplace in both people looking at how they can leverage their first party data, which, you know, IBM is really talking about what you know, Bob Picciotto talked about this morning. But also, we're seeing thie emergency of a second party and third party data market to help build these models out even further so that I think that's what we're really seeing is the combination of the third party data along with the first party data really being the instrument for building these kind of predictive models, you know, they're going to take us hopefully, you know, far into the future. >> Okay, so, Caitlin square the circle for us. So the CDO roll generally is not perceived. Is it technology role? Correct. Yet as Davis to saying, we're talking about machine learning cognitive. Aye, aye. These air like heavy technical topics. So how does the miracle worker deal with all this stuff generally? And how does IBM deal with it inside the CDO office? Specifically? >> Sure. So it is. It's a very good point, you know, Traditionally, Seo's really have a business background, and we find that the most successful CDO sit in the business organization. So they report somewhere in a line of business. Um, and there are certainly some that have a technical background, but far more come from business background and sit in the business. I can't tell you how we are setting up our studio office at IBM. Um, so are new. And our first global chief date officer joined in December of last year. Interpol Bhandari, um and I started working for him shortly thereafter, and the way he's setting up his office is really three pillars. So first and foremost, we focused on the data engineering data sign. So getting that team in place next, it's information, governance and policy. How are we going to govern access, manage, work with data, both data that we own within our organization as well as the long list of of external data sources that that we bring in and then third is the business integration filler. So the idea is CDOs are going to be most successful when they deliver those data Science data engineering. Um, they manage and govern the data, but they pull it through the business, so ensuring that were really, you know, grounded in business unit and doing this. And so those there are three primary pillars at this point. So prior >> to formalizing the CDO role at I b m e mean remnants of these roles existed. There was a date, equality, you know, function. There was certainly governance in policy, and somebody was responsible to integrate between, you know, from the i t. To the applications, tow the business. Were those part of I t where they sort of, you know, by committee and and how did you bring all those pieces together? That couldn't have been trivial, >> and I would say it's filling. It's still going filling ongoing process. But absolutely, I would say they typically resided within particular business units, um, and so certainly have mature functions within the unit. But when we're looking for enterprise wide answers to questions about certain customers, certain business opportunities. That's where I think the role the studio really comes in and what we're What we're doing now is we are partnering very closely with business units. One example is IBM analytic. Seen it. So we're here with Bob Luciano and other business units to ensure that, as they provide us, you know, their data were able to create the single trusted source of data across the organization across the enterprise. And so I agree with you, I think, ah, lot of those capabilities and functions quite mature, they, you know, existed within units. And now it's about pulling that up to the enterprise level and then our next step. The next vision is starting to make that cognitive and starting to add some of those capabilities in particular data science, engineering, the deep learning on starting to move toward cognitive. >> Dave, I think Caitlin brought up something really interesting. We've been digging into the last couple of years is you know, there's that governance peace, but a lot of CEOs are put into that role with a mandate for innovation on. That's something that you know a lot of times it has been accused of not being all that innovative. Is that what you're seeing? You know what? Because some of the kind of is it project based or, you know, best initiatives that air driving forward with CEOs. I think what we're seeing is that enterprises they're beginning to recognize that it's not just enough to be a manufacturer. It's not just enough to be a retail organization. You need to be the one of the best one of the top two or the top three. And the only way to get to that top two or top three is to have that innovation that you're talking about and that innovation relies on having accurate data for decision making. It also relies on having accurate data for operations. So we're seeing a lot of organizations that are really, you know, looking at how data and predictive models and innovation all become part of the operational fabric of a company. Uh, you know, and if you think about the companies that are there, you know, just beating it together. You know Amazon, for example. I mean, Amazon is a completely data driven company. When you get your recommendations for, you know what to buy, or that's all coming from the data when they set up these logistics centers where they're, you know, shipping the latest supplies. They're doing that because they know where their customers are. You know, they have all this data, so they're they're integrating data into their day to day decision making. And I think that's what we're seeing, You know, throughout industry is this this idea of integrating decision data into the decision making process and elevating it? And I think that's why the CDO rule has become so much more important over the last 2 to 3 years. >> We heard this morning at 88% percent of data is dark data. Papa Geno talked about that. So thinking about the CEOs scope roll agenda, you've got data sources. You've gotto identify those. You gotta deal with data quality and then Dave, with some of the things you've been talking about, you've got predictive models that out of the box they may not be the best predictive models in the world. You've got iterated them. So how does an organization, because not every organizations like Amazon with virtually unlimited resource is capital? How does an organization balance What are you seeing in terms of getting new data sources? Refining those data source is putting my emphasis on the data vs refining and calibrating the predictive models. How organizations balancing that Maybe we start with how IBM is doing. It's what you're seeing in the field. >> So So I would say, from what we're doing from a setting up the chief data office role, we've taken a step back to say, What's the company's monitor monetization strategy? Not how your mind monetizing data. How are how are you? What's your strategy? Moving forward, Um, for Mance station. And so with IBM we've talked about it is moved to enabling cognition throughout the enterprise. And so we've really talked about taking all of your standard business processes, whether they be procurement HR finance and infusing those with cognitive and figuring out how to make those smarter. We talking examples with contracts, for example. Every organization has a lot of contracts, and right now it's, you know, quite a manual process to go through and try and discern the sorts of information you need to make better decisions and optimize the contract process. And so the idea is, you start with that strategy for us. IBM, it's cognitive. And that then dictates what sort of data sources you need. Because that's the problem you're trying to solve in the opportunity you're chasing down. And so then we talk about Okay, we've got some of that data currently residing today internally, typically in silos, typically in business units, you know, some different databases. And then what? What are longer term vision is, is we want to build the intelligence that pulls in that internal data and then really does pull in the external data that we've that we've all talked about. You know, the social data, the sentiment analysis, analysis, the weather. You know, all of that sort of external data to help us. Ultimately, in our value proposition, our mission is, you know, data driven enablement cognition. So helps us achieve our our strategy there. >> Thank you, Dad, to that. Yeah, >> I mean, I think I mean, you could take a number of examples. I mean, there's there's ah, uh, small insurance company in Florida, for example. Uh, and what they've done is they have organized their emergency situation, their emergency processing to be able to deal with tweets and to be able to deal with, you know, SMS messages and things like that. They're using sentiment analysis. They're using Tex analytics to identify where problems are occurring when hurricane happens. So they're what they're doing is they're they're organizing that kind of data and >> there and there were >> relatively small insurance company. And a lot of this is being done to the cloud, but they're basically getting that kind of sentiment analysis being ableto interpret that and add that to their decision making process. About where should I land a person? Where should I land? You know, an insurance adjuster and agent, you know, based on the tweets, that air coming in rather than than just the phone calls that air coming into the into the organization, you know? So that's a That's a simple example. And you were talking about Not everybody has the resources of an Amazon, but, you know, certainly small insurance companies, small manufacturers, small retail organizations, you, Khun get started by, you know, analyzing your You know what people are saying about you. You know, what are people saying about me on Twitter? What are people saying about me on Facebook? You know how can I use that to improve my customer service? Uh, you know, we're seeing ah whole range of solutions coming out, and and IBM actually has a broad range of solutions for things like that. But, you know, they're not the only points out there. There's there's a lot of folks do it that kind of thing, you know, in terms of the dark data analysis and barely providing that, you know, as part of the solution to help people make better decisions. >> So the answers to the questions both You're doing both new sources of data and trying to improve the the the analytics and the models. But it's a balancing act, and you could come back to the E. R. A. Y question. It sounds like IBM strategies to supercharge your existing businesses by infusing them with new data and new insights. Is >> that correctly? I would say that is correct. >> Okay, where is in many cases, the R A. Y of analytics projects that date have been a reduction on investment? You know, I'm going to move stuff from my traditional W two. A dupe is cheaper, and we feels like Dave, we're entering a new wave now maybe could talk about that a little bit. >> Yeah. I mean, I think I think there's a desk in the traditional way of measuring ROI. And I think what people are trying to do now is look at how you mentioned disruption, for example. You know what I think? Disruption is a huge opportunity. How can I increase my sales? How can I increase my revenue? How can I find new customers, you know, through these mechanisms? And I think that's what we're starting to see in the organization. And we're starting to see start ups that are dedicated to providing this level of disruption and helping address new markets. You know, by using these kinds of technologies, uh, in in new and interesting ways. I mean, everybody uses the airbnb example. Everybody uses uber example. You know that these are people who don't own cars. They don't know what hotel rooms. But, you know, they provide analytics to disrupt the hotel industry and disrupt the taxi industry. It's not just limited to those two industries. It's, you know, virtually everything you know. And I think that's what we're starting to see is this height of, uh, virtual disruption based on the dark data, uh, that people can actually begin to analyze >> within IBM. Uh, the chief data officer reports to whom. >> So the way we've set up in our organization is our CBO reports to our senior vice president of transformation and operations, who then reports to our CEO our recommendation as we talked with clients. I mean, we see this as a CEO level reporting relationship, and and oftentimes we advocate, you know, for that is where we're talking with customers and clients. It fits nicely in our organization within transformation operations, because this line is really responsible for transforming IBM. And so they're really charged with a number of initiatives throughout the organization to have better skills alignment with some of the new opportunities. To really improve process is to bring new folks on board s. So it made sense to fit within, uh, organization that the mandate is really transformation of the company of the >> and the CDO was a peer of the CIA. Is that right? Yes. >> Yes, that's right. That's right. Um, and then in our organization, the role of split and that we have a chief data officer as well as a chief analytics officer. Um, but, you know, we often see one person serving both of those roles as well. So that's kind of, you know, depend on the organizational structure of the company. >> So you can't run the business. So to grow the business, which I guess is the P and L manager's role and transformed the business, which is where the CDO comes. >> Right? Right, right. Exactly. >> I can't give you the last word. Sort of Put a bumper sticker on this event. Where do you want to see it go? In the future? >> Yes. Eso last word. You know, we try Tio, we tried a couple new things. Uh, this this year we had our deep dive breakout sessions yesterday. And the feedback I've been hearing from folks is the opportunity to talk about certain topics they really care about. Is their governance or is innovation being able to talk? How do you get started in the 1st 90 days? What? What do you do first? You know, we we have sort of a five steps that we talk through around, you know, getting your data strategy and your plan together and how you execute against that. Um And I have to tell you, those topics continue to be of interest to our to our participants every year. So we're going to continue to have those, um, and I just I love to see the community grow. I saw the first Chief data officer University, you know, announced earlier this year. I did notice a lot of PR and media around. Role of studio is miracle workers, As you mentioned, doing a lot of great work. So, you know, we're really supportive. Were big supporters of the role we'll continue to host in person events. Uh, do virtual events continue to support studios? To be successful on our big plug is will be world of Watson. Eyes are big IBM Analytics event in October, last week of October in Vegas. So we certainly invite folks to join us. There >> will be, >> and he'll be there. Right? >> Get still, try to get Jimmy on. So, Jenny, if you're watching, talking to come on the Q. >> So we do a second interview >> and we'll see. We get Teo, And I saw Hillary Mason is going to be the oh so fantastic to see her so well. Excellent. Congratulations. on being ahead of the curve with the chief date officer can theme. And I really appreciate you coming to Cube, Dave. Thank you. Thank you. All right, Keep right there. Everybody stew and I were back with our next guest. We're live from the Chief Data Officers Summit. IBM sze event in Boston Right back. My name is Dave Volante on DH. I'm a longtime industry analysts.

Published Date : Sep 23 2016

SUMMARY :

covering IBM Chief Data Officer Strategy Summit brought to you by You put up their scanned it very quickly, but you set up the event. And at that time, we estimated there were maybe 200 or so CDOs worldwide, give or take and we had 30, 30 people at our first event. the studios in the organization. a wonky topic, you know, data was data value. data to actually make these models work, you know, the systems aren't going to be effective. So how does the miracle worker deal with all this stuff generally? so ensuring that were really, you know, grounded in business unit and doing this. and somebody was responsible to integrate between, you know, from the i t. units to ensure that, as they provide us, you know, their data were able to create the single that are really, you know, looking at how data and are you seeing in terms of getting new data sources? And so the idea is, you start with that Thank you, Dad, to that. to be able to deal with, you know, SMS messages and things like that. You know, an insurance adjuster and agent, you know, based on the tweets, that air coming in rather than than just So the answers to the questions both You're doing both new sources of data and trying to improve I would say that is correct. You know, I'm going to move stuff from my traditional W two. And I think what people are trying to do now is look at how you mentioned disruption, Uh, the chief data officer reports to whom. you know, for that is where we're talking with customers and clients. and the CDO was a peer of the CIA. So that's kind of, you know, depend on the organizational structure of So you can't run the business. Right? I can't give you the last word. I saw the first Chief data officer University, you know, announced earlier this and he'll be there. So, Jenny, if you're watching, talking to come on the Q. And I really appreciate you coming to Cube, Dave.

ENTITIES

Entity	Category	Confidence
Caitlin Lepic	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
CIA	ORGANIZATION	0.99+
Dave Schubmehl	PERSON	0.99+
Dave Volante	PERSON	0.99+
Florida	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Bob Luciano	PERSON	0.99+
Bob Picciotto	PERSON	0.99+
Jenny	PERSON	0.99+
Dave Shoot Mel	PERSON	0.99+
Caitlin Lepech	PERSON	0.99+
Dave	PERSON	0.99+
Davis	PERSON	0.99+
Boston	LOCATION	0.99+
uber	ORGANIZATION	0.99+
Teo	PERSON	0.99+
70	QUANTITY	0.99+
Vegas	LOCATION	0.99+
2014	DATE	0.99+
Jimmy	PERSON	0.99+
Hillary Mason	PERSON	0.99+
October	DATE	0.99+
Boston, Massachusetts	LOCATION	0.99+
Caitlin	PERSON	0.99+
first	QUANTITY	0.99+
both	QUANTITY	0.99+
200	QUANTITY	0.99+
both people	QUANTITY	0.99+
third	QUANTITY	0.99+
first event	QUANTITY	0.99+
One example	QUANTITY	0.99+
two industries	QUANTITY	0.99+
yesterday	DATE	0.99+
five steps	QUANTITY	0.99+
second interview	QUANTITY	0.98+
single	QUANTITY	0.98+
top three	QUANTITY	0.98+
December of last year	DATE	0.98+
this afternoon	DATE	0.97+
1st 90 days	QUANTITY	0.97+
one	QUANTITY	0.97+
Chief Data Officers Summit	EVENT	0.97+
IBM Chief Data Officer	EVENT	0.96+
one person	QUANTITY	0.96+
this year	DATE	0.96+
last week of October	DATE	0.95+
about 170 folks	QUANTITY	0.95+
both data	QUANTITY	0.95+
earlier this year	DATE	0.95+
top two	QUANTITY	0.95+
yesterday afternoon at noon	DATE	0.94+
30 2014	DATE	0.94+
CDO	EVENT	0.94+
88% percent	QUANTITY	0.93+
three primary pillars	QUANTITY	0.93+
first party	QUANTITY	0.9+
this morning	DATE	0.89+
CDO	ORGANIZATION	0.89+
30,	QUANTITY	0.88+
Khun	ORGANIZATION	0.88+
D. C.	PERSON	0.88+
three pillars	QUANTITY	0.87+
one small corner	QUANTITY	0.87+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hillary Mason: