Krishna Cheriath, Bristol Myers Squibb | MITCDOIQ 2020

>> From the Cube Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a Cube Conversation. >> Hi everyone, this is Dave Vellante and welcome back to the Cube's coverage of the MIT CDOIQ. God, we've been covering this show since probably 2013, really trying to understand the intersection of data and organizations and data quality and how that's evolved over time. And with me to discuss these issues is Krishna Cheriath, who's the Vice President and Chief Data Officer, Bristol-Myers Squibb. Krishna, great to see you, thanks so much for coming on. >> Thank you so much Dave for the invite, I'm looking forward to it. >> Yeah first of all, how are things in your part of the world? You're in New Jersey, I'm also on the East coast, how you guys making out? >> Yeah, I think these are unprecedented times all around the globe and whether it is from a company perspective or a personal standpoint, it is how do you manage your life, how do you manage your work in these unprecedented COVID-19 times has been a very interesting challenge. And to me, what is most amazing has been, I've seen humanity rise up and so to our company has sort of snap to be able to manage our work so that the important medicines that have to be delivered to our patients are delivered on time. So really proud about how we have done as a company and of course, personally, it has been an interesting journey with my kids from college, remote learning, wife working from home. So I'm very lucky and blessed to be safe and healthy at this time. So hopefully the people listening to this conversation are finding that they are able to manage through their lives as well. >> Obviously Bristol-Myers Squibb, very, very strong business. You guys just recently announced your quarter. There's a biologics facility near me in Devon's, Massachusetts, I drive by it all the time, it's a beautiful facility actually. But extremely broad portfolio, obviously some COVID impact, but you're managing through that very, very well, if I understand it correctly, you're taking a collaborative approach to a COVID vaccine, you're now bringing people physically back to work, you've been very planful about that. My question is from your standpoint, what role did you play in that whole COVID response and what role did data play? >> Yeah, I think it's a two part as you rightly pointed out, the Bristol-Myers Squibb, we have been an active partner on the the overall scientific ecosystem supporting many different targets that is, from many different companies I think. Across biopharmaceuticals, there's been a healthy convergence of scientific innovation to see how can we solve this together. And Bristol-Myers Squibb have been an active participant as our CEO, as well as our Chief Medical Officer and Head of Research have articulated publicly. Within the company itself, from a data and technology standpoint, data and digital is core to the response from a company standpoint to the COVID-19, how do we ensure that our work continues when the entire global workforce pivots to a kind of a remote setting. So that really calls on the digital infrastructure to rise to the challenge, to enable a complete global workforce. And I mean workforce, it is not just employees of the company but the all of the third-party partners and others that we work with, the whole ecosystem needs to work. And I think our digital infrastructure has proven to be extremely resilient than that. From a data perspective, I think it is twofold. One is how does the core book of business of data continue to drive forward to make sure that our companies key priorities are being advanced. Secondarily, we've been partnering with a research and development organization as well as medical organization to look at what kind of real world data insights can really help in answering the many questions around COVID-19. So I think it is twofold. Main summary; one is, how do we ensure that the data and digital infrastructure of the company continues to operate in a way that allows us to progress the company's mission even during a time when globally, we have been switched to a remote working force, except for some essential staff from lab and manufacturing standpoint. And secondarily is how do we look at the real-world evidence as well as the scientific data to be a good partner with other companies to look at progressing the societal innovations needed for this. >> I think it's a really prudent approach because let's face it, sometimes one shot all vaccine can be like playing roulette. So you guys are both managing your risk and just as I say, financially, a very, very successful company in a sound approach. I want to ask you about your organization. We've interviewed many, many Chief Data Officers over the years, and there seems to be some fuzziness as to the organizational structure. It's very clear with you, you report in to the CIO, you came out of a technical bag, you have a technical degree but you also of course have a business degree. So you're dangerous from that standpoint. You got both sides which is critical, I would think in your role, but let's start with the organizational reporting structure. How did that come about and what are the benefits of reporting into the CIO? >> I think the Genesis for that as Bristol-Myers Squibb and when I say Bristol-Myers Squibb, the new Bristol-Myers Squibb is a combination of Heritage Bristol-Myers Squibb and Heritage Celgene after the Celgene acquisition last November. So in the Heritage Bristol-Myers Squibb acquisition, we came to a conclusion that in order for BMS to be able to fully capitalize on our scientific innovation potential as well as to drive data-driven decisions across the company, having a robust data agenda is key. Now the question is, how do you progress that? Historically, we had approached a very decentralized mechanism that made a different data constituencies. We didn't have a formal role of a Chief Data Officer up until 2018 or so. So coming from that realization that we need to have an effective data agenda to drive forward the necessary data-driven innovations from an analytic standpoint. And equally importantly, from optimizing our execution, we came to conclusion that we need an enterprise-level data organization, we need to have a first among equals if you will, to be mandated by the CEO, his leadership team, to be the kind of an orchestrator of a data agenda for the company, because data agenda cannot be done individually by a singular CDO. It has to be done in partnership with many stakeholders, business, technology, analytics, et cetera. So from that came this notion that we need an enterprise-wide data organization. So we started there. So for awhile, I would joke around that I had all of the accountabilities of the CDO without the lofty title. So this journey started around 2016, where we create an enterprise-wide data organization. And we made a very conscious choice of separating the data organization from analytics. And the reason we did that is when we look at the bowl of Bristol-Myers Squibb, analytics for example, is core and part of our scientific discovery process, research, our clinical development, all of them have deep data science and analytic embedded in it. But we also have other analytics whether it is part of our sales and marketing, whether it is part of our finance and our enabling functions they catch all across global procurement et cetera. So the world of analytics is very broad. BMS did a separation between the world of analytics and from the world of data. Analytics at BMS is in two modes. There is a central analytics organization called Business Insights and Analytics that drive most of the enterprise-level analytics. But then we have embedded analytics in our business areas, which is research and development, manufacturing and supply chain, et cetera, to drive what needs to be closer to the business idea. And the reason for separating that out and having a separate data organization is that none of these analytic aspirations or the business aspirations from data will be met if the world of data is, you don't have the right level of data available, the velocity of data is not appropriate for the use cases, the quality of data is not great or the control of the data. So that we are using the data for the right intent, meeting the compliance and regulatory expectations around the data is met. So that's why we separated out that data world from the analytics world, which is a little bit of a unique construct for us compared to what we see generally in the world of CDOs. And from that standpoint, then the decision was taken to make that report for global CIO. At Bristol-Myers Squibb, they have a very strong CIO organization and IT organization. When I say strong, it is from this lens standpoint. A, it is centralized, we have centralized the budget as well as we have centralized the execution across the enterprise. And the CDO reporting to the CIO with that data-specific agenda, has a lot of value in being able to connect the world of data with the world of technology. So at BMS, their Chief Data Officer organization is a combination of traditional CDO-type accountabilities like data risk management, data governance, data stewardship, but also all of the related technologies around master data management, data lake, data and analytic engineering and a nascent AI data and technology lab. So that construct allows us to be a true enterprise horizontal, supporting analytics, whether it is done in a central analytics organization or embedded analytics teams in the business area, but also equally importantly, focus on the world of data from operational execution standpoint, how do we optimize data to drive operational effectiveness? So that's the construct that we have where CDO reports to the CIO, data organization separated from analytics to really focus around the availability but also the quality and control of data. And the last nuance that is that at BMS, the Chief Data Officer organization is also accountable to be the Data Protection Office. So we orchestrate and facilitate all privacy-related actions across because that allows us to make sure that all personal data that is collected, managed and consumed, meets all of the various privacy standards across the world, as well as our own commitments as a company from across from compliance principles standpoint. >> So that makes a lot of sense to me and thank you for that description. You're not getting in the way of R&D and the scientists, they know data science, they don't need really your help. I mean, they need to innovate at their own pace, but the balance of the business really does need your innovation, and that's really where it seems like you're focused. You mentioned master data management, data lakes, data engineering, et cetera. So your responsibility is for that enterprise data lifecycle to support the business side of things, and I wonder if you could talk a little bit about that and how that's evolved. I mean a lot has changed from the old days of data warehouse and cumbersome ETL and you mentioned, as you say data lakes, many of those have been challenging, expensive, slow, but now we're entering this era of cloud, real-time, a lot of machine intelligence, and I wonder if you could talk about the changes there and how you're looking at and thinking about the data lifecycle and accelerating the time to insights. >> Yeah, I think the way we think about it, we as an organization in our strategy and tactics, think of this as a data supply chain. The supply chain of data to drive business value whether it is through insights and analytics or through operation execution. When you think about it from that standpoint, then we need to get many elements of that into an effective stage. This could be the technologies that is part of that data supply chain, you reference some of them, the master data management platforms, data lake platforms, the analytics and reporting capabilities and business intelligence capabilities that plug into a data backbone, which is that I would say the technology, swim lane that needs to get right. Along with that, what we also need to get right for that effective data supply chain is that data layer. That is, how do you make sure that there is the right data navigation capability, probably you make sure that we have the right ontology mapping and the understanding around the data. How do we have data navigation? It is something that we have invested very heavily in. So imagine a new employee joining BMS, any organization our size has a pretty wide technology ecosystem and data ecosystem. How do you navigate that, how do we find the data? Data discovery has been a key focus for us. So for an effective data supply chain, then we knew that and we have instituted our roadmap to make sure that we have a robust technology orchestration of it, but equally important is an effective data operations orchestration. Both needs to go hand in hand for us to be able to make sure that that supply chain is effective from a business use case and analytic use standpoint. So that has led us on a journey from a cloud perspective, since you refer that in your question, is we have invested very heavily to move from very disparate set of data ecosystems to a more converse cloud-based data backbone. That has been a big focus at the BMS since 2016, whether it is from a research and development standpoint or from commercialization, it is our word for the sales and marketing or manufacturing and supply chain and HR, et cetera. How do we create a converged data backbone that allows us to use that data as a resource to drive many different consumption patterns? Because when you imagine an enterprise of our size, we have many different consumers of the data. So those consumers have different consumption needs. You have deep data science population who just needs access to the data and they have data science platforms but they are at once programmers as well, to the other end of the spectrum where executives need pre-packaged KPIs. So the effective orchestration of the data ecosystem at BMS through a data supply chain and the data backbone, there's a couple of things for us. One, it drives productivity of our data consumers, the scientific researchers, analytic community or other operational staff. And second, in a world where we need to make sure that the data consumption appalls ethical standards as well as privacy and other regulatory expectations, we are able to build it into our system and process the necessary controls to make sure that the consumption and the use of data meets our highest trust advancements standards. >> That makes a lot of sense. I mean, converging your data like that, people always talk about stove pipes. I know it's kind of a bromide but it's true, and allows you to sort of inject consistent policies. What about automation? How has that affected your data pipeline recently and on your journey with things like data classification and the like? >> I think in pursuing a broad data automation journey, one of the things that we did was to operate at two different speed points. In a historically, the data organizations have been bundled with long-running data infrastructure programs. By the time you complete them, their business context have moved on and the organization leaders are also exhausted from having to wait from these massive programs to reach its full potential. So what we did very intentionally from our data automation journey is to organize ourselves in two speed dimensions. First, a concept called Rapid Data Lab. The idea is that recognizing the reality that the data is not well automated and orchestrated today, we need a SWAT team of data engineers, data SMEs to partner with consumers of data to make sure that we can make effective data supply chain decisions here and now, and enable the business to answer questions of today. Simultaneously in a longer time horizon, we need to do the necessary work of moving the data automation to a better footprint. So enterprise data lake investments, where we built services based on, we had chosen AWS as the cloud backbone for data. So how do we use the AWS services? How do we wrap around it with the necessary capabilities so that we have a consistent reference and technical architecture to drive the many different function journeys? So we organized ourselves into speed dimensions; the Rapid Data Lab teams focus around partnering with the consumers of data to help them with data automation needs here and now, and then a secondary team focused around the convergence of data into a better cloud-based data backbone. So that allowed us to one, make an impact here and now and deliver value from data to the dismiss here and now. Secondly, we also learned a lot from actually partnering with consumers of data on what needs to get adjusted over a period of time in our automation journey. >> It makes sense, I mean again, that whole notion of converged data, putting data at the core of your business, you brought up AWS, I wonder if I could ask you a question. You don't have to comment on specific vendors, but there's a conversation we have in our community. You have AWS huge platform, tons of partners, a lot of innovation going on and you see innovation in areas like the cloud data warehouse or data science tooling, et cetera, all components of that data pipeline. As well, you have AWS with its own tooling around there. So a question we often have in the community is will technologists and technology buyers go for kind of best of breed and cobble together different services or would they prefer to have sort of the convenience of a bundled service from an AWS or a Microsoft or Google, or maybe they even go best of breeds for all cloud. Can you comment on that, what's your thinking? >> I think, especially for organizations, our size and breadth, having a converged to convenient, all of the above from a single provider does not seem practical and feasible, because a couple of reasons. One, the heterogeneity of the data, the heterogeneity of consumption of the data and we are yet to find a single stack provider who can meet all of the different needs. So I am more in the best of breed camp with a few caveats, a hybrid best of breed, if you will. It is important to have a converged the data backbone for the enterprise. And so whether you invest in a singular cloud or private cloud or a combination, you need to have a clear intention strategy around where are you going to host the data and how is the data is going to be organized. But you could have a lot more flexibility in the consumption of data. So once you have the data converged into, in our case, we converged on AWS-based backbone. We allow many different consumptions of the data, because I think the analytic and insights layer, data science community within R&D is different from a data science community in the supply chain context, we have business intelligence needs, we have a catered needs and then there are other data needs that needs to be funneled into software as service platforms like the sales forces of the world, to be able to drive operational execution as well. So when you look at it from that context, having a hybrid model of best of breed, whether you have a lot more convergence from a data backbone standpoint, but then allow for best of breed from an analytic and consumption of data is more where my heart and my brain is. >> I know a lot of companies would be excited to hear that answer, but I love it because it fosters competition and innovation. I wish I could talk for you forever, but you made me think of another question which is around self-serve. On your journey, are you at the point where you can deliver self-serve to the lines of business? Is that something that you're trying to get to? >> Yeah, I think it does. The self-serve is an absolutely important point because I think the traditional boundaries of what you consider the classical IT versus a classical business is great. I think there is an important gray area in the middle where you have a deep citizen data scientist in the business community who really needs to be able to have access to the data and I have advanced data science and programming skills. So self-serve is important but in that, companies need to be very intentional and very conscious of making sure that you're allowing that self-serve in a safe containment sock. Because at the end of the day, whether it is a cyber risk or data risk or technology risk, it's all real. So we need to have a balanced approach between promoting whether you call it data democratization or whether you call it self-serve, but you need to balance that with making sure that you're meeting the right risk mitigation strategy standpoint. So that's how then our focus is to say, how do we promote self-serve for the communities that they need self-serve, where they have deeper levels of access? How do we set up the right safe zones for those which may be the appropriate mitigation from a cyber risk or data risk or technology risk. >> Security pieces, again, you keep bringing up topics that I could talk to you forever on, but I heard on TV the other night, I heard somebody talking about how COVID has affected, because of remote access, affected security. And it's like hey, give everybody access. That was sort of the initial knee-jerk response, but the example they gave as well, if your parents go out of town and the kid has a party, you may have some people show up that you don't want to show up. And so, same issue with remote working, work from home. Clearly you guys have had to pivot to support that, but where does the security organization fit? Does that report separate alongside the CIO? Does it report into the CIO? Are they sort of peers of yours, how does that all work? >> Yeah, I think at Bristol-Myers Squibb, we have a Chief Information Security Officer who is a peer of mine, who also reports to the global CIO. The CDO and the CSO are effective partners and are two sides of the coin and trying to advance a total risk mitigation strategy, whether it is from a cyber risk standpoint, which is the focus of the Chief Information Security Officer and whether it is the general data consumption risk. And that is the focus from a Chief Data Officer in the capacities that I have. And together, those are two sides of a coin that the CIO needs to be accountable for. So I think that's how we have orchestrated it, because I think it is important in these worlds where you want to be able to drive data-driven innovation but you want to be able to do that in a way that doesn't open the company to unwanted risk exposures as well. And that is always a delicate balancing act, because if you index too much on risk and then high levels of security and control, then you could lose productivity. But if you index too much on productivity, collaboration and open access and data, it opens up the company for risks. So it is a delicate balance within the two. >> Increasingly, we're seeing that reporting structure evolve and coalesce, I think it makes a lot of sense. I felt like at some point you had too many seats at the executive leadership table, too many kind of competing agendas. And now your structure, the CIO is obviously a very important position. I'm sure has a seat at the leadership table, but also has the responsibility for managing that sort of data as an asset versus a liability which my view, has always been sort of the role of the Head of Information. I want to ask you, I want to hit the Escape key a little bit and ask you about data as a resource. You hear a lot of people talk about data is the new oil. We often say data is more valuable than oil because you can use it, it doesn't follow the laws of scarcity. You could use data in infinite number of places. You can only put oil in your car or your house. How do you think about data as a resource today and going forward? >> Yeah, I think the data as the new oil paradigm in my opinion, was an unhealthy, and it prompts different types of conversations around that. I think for certain companies, data is indeed an asset. If you're a company that is focused on information products and data products and that is core of your business, then of course there's monetization of data and then data as an asset, just like any other assets on the company's balance sheet. But for many enterprises to further their mission, I think considering data as a resource, I think is a better focus. So as a vital resource for the company, you need to make sure that there is an appropriate caring and feeding for it, there is an appropriate management of the resource and an appropriate evolution of the resource. So that's how I would like to consider it, it is a personal end of one perspective, that data as a resource that can power the mission of the company, the new products and services, I think that's a good, healthy way to look at it. At the center of it though, a lot of strategies, whether people talk about a digital strategy, whether the people talk about data strategy, what is important is a company to have a pool north star around what is the core mission of the company and what is the core strategy of the company. For Bristol-Myers Squibb, we are about transforming patients' lives through science. And we think about digital and data as key value levers and drivers of that strategy. So digital for the sake of digital or data strategy for the sake of data strategy is meaningless in my opinion. We are focused on making sure that how do we make sure that data and digital is an accelerant and has a value lever for the company's mission and company strategy. So that's why thinking about data as a resource, as a key resource for our scientific researchers or a key resource for our manufacturing team or a key resource for our sales and marketing, allows us to think about the actions and the strategies and tactics we need to deploy to make that effective. >> Yeah, that makes a lot of sense, you're constantly using that North star as your guideline and how data contributes to that mission. Krishna Cheriath, thanks so much for coming on the Cube and supporting the MIT Chief Data Officer community, it was a really pleasure having you. >> Thank you so much for Dave, hopefully you and the audience is safe and healthy during these times. >> Thank you for that and thank you for watching everybody. This is Vellante for the Cube's coverage of the MIT CDOIQ Conference 2020 gone virtual. Keep it right there, we'll right back right after this short break. (lively upbeat music)

Published Date : Sep 3 2020

SUMMARY :

leaders all around the world, coverage of the MIT CDOIQ. I'm looking forward to it. so that the important medicines I drive by it all the time, and digital infrastructure of the company of reporting into the CIO? So that's the construct that we have and accelerating the time to insights. and the data backbone, and allows you to sort of and enable the business to in areas like the cloud data warehouse and how is the data is to the lines of business? in the business community that I could talk to you forever on, that the CIO needs to be accountable for. about data is the new oil. that can power the mission of the company, and supporting the MIT Chief and healthy during these times. of the MIT CDOIQ Conference

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bristol-Myers Squibb	ORGANIZATION	0.99+
New Jersey	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Devon	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Rapid Data Lab	ORGANIZATION	0.99+
2013	DATE	0.99+
Krishna Cheriath	PERSON	0.99+
two sides	QUANTITY	0.99+
two	QUANTITY	0.99+
COVID-19	OTHER	0.99+
Celgene	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
Krishna	PERSON	0.99+
Heritage Bristol-Myers Squibb	ORGANIZATION	0.99+
2018	DATE	0.99+
both sides	QUANTITY	0.99+
Both	QUANTITY	0.98+
Boston	LOCATION	0.98+
2016	DATE	0.98+
CDO	TITLE	0.98+
two modes	QUANTITY	0.98+
COVID	OTHER	0.98+
first	QUANTITY	0.98+
Bristol-Myers Squibb	ORGANIZATION	0.98+
last November	DATE	0.98+
Data Protection Office	ORGANIZATION	0.98+
One	QUANTITY	0.98+
two part	QUANTITY	0.98+
Secondly	QUANTITY	0.98+
second	QUANTITY	0.98+
MIT	ORGANIZATION	0.98+
both	QUANTITY	0.98+
MIT CDOIQ Conference 2020	EVENT	0.97+
Heritage Celgene	ORGANIZATION	0.97+
one	QUANTITY	0.97+
COVID-19 times	OTHER	0.96+
today	DATE	0.96+
BMS	ORGANIZATION	0.96+
single provider	QUANTITY	0.95+
single stack	QUANTITY	0.93+
Bristol Myers Squibb	PERSON	0.93+
one shot	QUANTITY	0.92+
Cube Studios	ORGANIZATION	0.9+
one perspective	QUANTITY	0.9+
Bristol-Myers	ORGANIZATION	0.9+
Business Insights	ORGANIZATION	0.89+
two speed	QUANTITY	0.89+
twofold	QUANTITY	0.84+
secondary	QUANTITY	0.8+
Secondarily	QUANTITY	0.77+
MIT CDOIQ	ORGANIZATION	0.76+
Massachusetts	LOCATION	0.75+
MITCDOIQ 2020	EVENT	0.74+
Vellante	PERSON	0.72+
Data	PERSON	0.71+
Chief Data Officer	PERSON	0.61+

Peter Burris Big Data Research Presentation

(upbeat music) >> Announcer: Live from San Jose, it's theCUBE presenting Big Data Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partner. >> What am I going to spend time, next 15, 20 minutes or so, talking about. I'm going to answer three things. Our research has gone deep into where are we now in the big data community. I'm sorry, where is the big data community going, number one. Number two is how are we going to get there and number three, what do the numbers say about where we are? So those are the three things. Now, since when we want to get out of here, I'm going to fly through some of these slides but again there's a lot of opportunity for additional conversation because we're all about having conversations with the community. So let's start here. The first thing to know, when we think about where this is all going is it has to be bound. It's inextricably bound up with digital transformation. Well, what is digital transformation? We've done a lot of research on this. This is Peter Drucker who famously said many years ago, that the purpose of a business is to create and keep a customer. That's what a business is. Now what's the difference between a business and a digital business? What's the business between Sears Roebuck, or what's the difference between Sears Roebuck and Amazon? It's data. A digital business uses data as an asset to create and keep customers. It infuses data and operations differently to create more automation. It infuses data and engagement differently to catalyze superior customer experiences. It reformats and restructures its concept of value proposition and product to move from a product to a services orientation. The role of data is the centerpiece of digital business transformation and in many respects that is where we're going, is an understanding and appreciation of that. Now, we think there's going to be a number of strategic capabilities that will have to be built out to make that possible. First off, we have to start thinking about what it means to put data to work. The whole notion of an asset is an asset is something that can be applied to a productive activity. Data can be applied to a productive activity. Now, there's a lot of very interesting implications that we won't get into now, but essentially if we're going to treat data as an asset and think about how we could put more data to work, we're going to focus on three core strategic capabilities about how to make that possible. One, we need to build a capability for collecting and capturing data. That's a lot of what IoT is about. It's a lot of what mobile computing is about. There's going to be a lot of implications around how to ethically and properly do some of those things but a lot of that investment is about finding better and superior ways to capture data. Two, once we are able to capture that data, we have to turn it into value. That in many respects is the essence of big data. How we turn data into data assets, in the form of models, in the form of insights, in the form of any number of other approaches to thinking about how we're going to appropriate value out of data. But it's not just enough to create value out of it and have it sit there as potential value. We have to turn it into kinetic value, to actually do the work with it and that is the last piece. We have to build new capabilities for how we're going to apply data to perform work better, to enact based on data. Now, we've got a concept we're researching now that we call systems of agency, which is the idea that there's going to be a lot of new approaches, new systems with a lot of intelligence and a lot of data that act on behalf of the brand. I'm not going to spend a lot of time going into this but remember that word because I will come back to it. Systems of agency is about how you're going to apply data to perform work with automation, augmentation, and actuation on behalf of your brand. Now, all this is going to happen against the backdrop of cloud optimization. I'll explain what we mean by that right now. Very importantly, increasingly how you create value out of data, how you create future options on the value of your data is going to drive your technology choices. For the first 10 years of the cloud, the presumption is all data was going to go to the cloud. We think that a better way of thinking about it is how is the cloud experience going to come to the data. We've done a lot of research on the cost of data movement and both in terms of the actual out-of-pocket costs but also the potential uncertainty, the transaction costs, etc, associated with data movement. And that's going to be one of the fundamental pieces or elements of how we think about the future of big data and how digital business works, is what we think about data movement. I'll come to that in a bit. But our proposition is increasingly, we're going to see architectural approaches that focus on how we're going to move the cloud experience to the data. We've got this notion of true private cloud which is effectively the idea of the cloud experience on or near premise. That doesn't diminish the role that the cloud's going to play on industry or doesn't say that Amazon and AWS and Microsoft Azure and all the other options are not important. They're crucially important but it means we have to start thinking architecturally about how we're going to create value of data out of data and recognize that means that it, we have to start envisioning how our organization and infrastructure is going to be set up so that we can use data where it needs to be or where it's most valuable and often that's close to the action. So if we think then about that very quickly because it's a backdrop for everything, increasingly we're going to start talking about the idea of where's the workload going to go? Where's workload the dog going to be against this kind of backdrop of the divorce of infrastructure? We believe that and our research pretty strongly shows that a lot of workloads are going to go to true private cloud but a lot of big data is moving into the cloud. This is a prediction we made a few years ago and it's clearly happening and it's underway and we'll get into what some of the implications are. So again, when we say that a lot of the big data elements, a lot of the process of creating value out of data is going to move into the cloud. That doesn't mean that all the systems of agency that build or rely on that data, the inference engines, etc, are also in a public cloud. A lot of them are going to be distributed out to the edge, out to where the action needs to be because of latency and other types of issues. This is a fundamental proposition and I know I'm going fast but hopefully I'm being clear. All right, so let's now get to the second part. This is kind of where the industry's going. Data is an asset. Invest in strategic business capabilities to appreciate, to create those data assets and appreciate the value of those assets and utilize the cloud intelligently to generate and ensure increasing returns. So the next question is well, how will we get there? Now. Right now, not too far from here, Neil Raden for example, was on the show floor yesterday. Neil made the observation that, as he wandered around, he only heard the word big data two or three times. The concept of big data is not dead. Whether the term is or is not is somebody else's decision. Our perspective, very simply, is that the notion is bifurcating. And it's bifurcating because we see different strategic imperatives happening at two different levels. On the one hand, we see infrastructure convergence. The idea that increasingly we have to think about how we're going to bring and federated data together, both from a systems and a data management standpoint. And on the other hand, we're going to see infrastructure or application specialization. That's going to have an enormous implication over next few years, if only because there just aren't enough people in the world that understand how to create value out of data. And there's going to be a lot of effort made over the next few years to find new ways to go from that one expertise group to billions of people, billions of devices, and those are the two dominant considerations in the industry right now. How can we converge data physically, logically, and on the other hand, how can we liberate more of the smarts associated with this very, very powerful approach so that more people get access to the capacities and the capabilities and the assets that are being generated by that process. Now, we've done at Wikibon, probably I don't know, 18, 20, 23 predictions overall on the role that or on the changes being wrought by digital business. Here I'm going to focus on four of them that are central to our big data research. We have many more but I'm just going to focus on four. The first one, when we think about infrastructure convergence we worry about hardware. Here's a prediction about what we think is going to happen with hardware and our observation is we believe pretty strongly that future systems are going to be built on the concept of how do you increase the value of data assets. The technologies are all in place. Simpler parts that it more successfully bind specifically through all its storage and network are going to play together. Why, because increasingly that's the fundamental constraint. How do I make data available to other machines, actors, sources of change, sources of process within the business. Now, we envision or we are watching before our very eyes, new technologies that allow us to take these simple piece parts and weave them together in very powerful fabrics or grids, what we call UniGrid. So that there is almost no latency between data that exists within one of these, call it a molecule, and anywhere else in that grid or lattice. Now again, these are not systems that are going to be here in five years. All the piece parts are here today and there are companies that are actually delivering them. So if you take a look at what Micron has done with Mellanox and other players, that's an example of one of these true private cloud oriented machines in place. The bottom line though is that there is a lot of room left in hardware. A lot of room. This is what cloud suppliers are building and are going to build but increasingly as we think about true private cloud, enterprises are going to look at this as well. So future systems for improving data assets. The capacity of this type of a system with low latency amongst any source of data means that we can now think about data not as... Not as a set of sources that have to be each individually, each having some control over its own data and sinks woven together by middleware and applications but literally as networks of data. As we start to think about distributing data and distributing control and authority associated with that data more broadly across systems, we now have to think about what does it mean to create networks of data? Because that, in many respects, is how these assets are going to be forged. I haven't even mentioned the role that security is going to play in all of this by the way but fundamentally that's how it's likely to play out. We'll have a lot of different sources but from a business standpoint, we're going to think about how those sources come together into a persistent network that can be acted upon by the business. One of the primary drivers of this is what's going on at the edge. Marc Andreessen famously said that software is eating the world, well our observation is great but if software's eating the world, it's eating it at the edge. That's where it's happening. Secondly, that this notion of agency zones. I said I'm going to bring that word up again, how systems act on behalf of a brand or act on behalf of an institution or business is very, very crucial because the time necessary to do the analysis, perform the intelligence, and then take action is a real constraint on how we do things. And our expectation is that we're going to see what we call an agency zone or a hub zone or cloud zone defined by latency and how we architect data to get the data that's necessary to perform that piece of work into the zone where it's required. Now, the implications of this is none of this is going to happen if we don't use AI and related technologies to increasingly automate how we handle infrastructure. And technologies like blockchain have the potential to provide a interesting way of imagining how these networks of data actually get structured. It's not going to solve everything. There's some people that think the blockchain is kind of everything that's necessary but it will be a way of describing a network of data. So we see those technologies on the ascension. But what does it mean for DBMS? In the old way, in the old world, the old way of thinking, the database manager was the control point for data. In the new world these networks of data are going to exist beyond a single DBMS and in fact, over time, that concept of federated data actually has a potential to become real. When we have these networks of data, we're going to need people to act upon them and that's essentially a lot of what the data scientist is going to be doing. Identifying the outcome, identifying the data that's required, and weaving that data through the construction and management, manipulation of pipelines, to ensure that the data as an asset can persist for the purposes of solving a near-term problem or over whatever duration is required to solve a longer term problem. Data scientists remain very important but we're going to see, as a consequence of improvements in tooling capable of doing these things, an increasing recognition that there's a difference between a data scientist and a data scientist. There's going to be a lot of folks that participate in the process of manipulating, maintaining, managing these networks of data to create these business outcomes but we're going to see specialization in those ranks as the tooling is more targeted to specific types of activities. So the data scientist is going to become or will remain an important job, going to lose a little bit of its luster because it's going to become clear what it means. So some data scientists will probably become more, let's call them data network administrators or networks of data administrators. And very importantly as I said earlier, there's just not enough of these people on the planet and so increasingly when we think about again, digital business and the idea of creating data assets. A central challenge is going to be how to create the data or how to turn all the data that can be captured into assets that can be applied to a lot of different uses. There's going to be two fundamental changes to the way we are currently conceiving of the big data world on the horizon. One is well, it's pretty clear that Hadoop can only go so far. Hadoop is a great tool for certain types of activities and certain numbers of individuals. So Hadoop solves problems for an important but relatively limited subset of the world. Some of the new data science platforms that we just talked about, that I just talked about, they're going to help with a degree of specialization that hasn't been available before in the data world, will certainly also help but it also will only take it so far. The real way that we see the work that we're doing, the work that the big data community is performing, turned into sources of value that extend into virtually every single corner of humankind is going to be through these cloud services that are being built and increasingly through packaged applications. A lot of computer science, it still exists between what I just said and when this actually happens. But in many respects, that's the challenge of the vendor ecosystem. How to reconstruct the idea of packaged software, which has historically been built around operations and transaction processing, with a known data model and an unknown or the known process and some technology challenges. How do we reapply that to a world where we now are thinking about, well we don't know exactly what the process is because the data tells us at the moment that the actions going to be taking place. It's a very different way of thinking about application development. A very different way of thinking about what's important in IT and very different way of thinking about how business is going to be constructed and how strategy's going to be established. Packaged applications are going to be crucially important. So in the last few minutes here, what are the numbers? So this is kind of the basis for our analysis. Digital business, role of data is an asset, having an enormous impact in how we think about hardware, how do we think about database management or data management, how we think about the people involved in this, and ultimately how we think about how we're going to deliver all this value out to the world. And the numbers are starting to reflect that. So why don't you think about four numbers as I go through the two or three slides. Hundred and three billion, 68%, 11%, and 2017. So of all the numbers that you will see, those are four of the most important numbers. So let's start by looking at the total market place. This is the growth of the hardware, software, and services pieces of the big data universe. Now we have a fair amount of additional research that breaks all these down into tighter segments, especially in software side. But the key number here is we're talking about big numbers. 103 billion over the course of next 10 years and let's be clear that 103 billion dollars actually has a dramatic amplification on the rest of the computing industry because a lot of the pricing models associated with, especially the software, are tied back to open source which has its own issues. And very importantly, the fact that the services business is going to go through an enormous amount of change over the next five years as service companies better understand how to deliver some of these big data rich applications. The second point to note here is that it was in 2017 that the software market surpassed the hardware market in big data. Again, for first number of years we focused on buying the hardware and the system software associated with that and the software became something that we hope to discover. So I was having a conversation here in theCUBE with the CEO of Transwarp which is a very interesting Chinese big data company and I asked what's the difference between how you do things in China and how we do things in the US? He said well, in the US you guys focus on proof of concept. You spend an enormous amount of time asking, does the hardware work? Does the database software work? Does the data management software work? In China we focus on the outcome. That's what we focus on. Here you have to placate the IT organization to make sure that everybody in IT is comfortable with what's about to happen. In China, were focused on the business people. This is the first year that software is bigger than hardware and it's only going to get bigger and bigger over time. It doesn't mean again, that hardware is dead or hardware is not important. It's going to remain very important but it does mean that the centerpiece of the locus of the industry is moving. Now, when we think about what the market shares look like, it's a very fragmented market. 60%, 68% of the market is still other. This is a highly immature market that's going to go through a number of changes over the next few years. Partly catalyzed by that notion of infrastructure convergence. So in four years our expectation is that, that 68% is going to start going down pretty fast as we see greater consolidation in how some of these numbers come together. Now IBM is the biggest one on the basis of the fact that they operate in all these different segments. They operating the hardware, software, and services segment but especially because they're very strong within the services business. The last one I want to point your attention to is this one. I mentioned earlier on, that our expectation is that the market increasingly is going to move to a packaged application orientation or packaged services orientation as a way of delivering expertise about big data to customers. Splunk is the leading software player right now. Why, because that's the perspective that they've taken. Now, perhaps we're a limited subset. It's perhaps for a limited subset of individuals or markets or of sectors but it takes a packaged application, weaves these technologies together, and applies them to an outcome. And we think this presages more of that kind of activity over the course of the next few years. Oracle, kind of different approach and we'll see how that plays out over the course of the next five years as well. Okay, so that's where the numbers are. Again, a lot more numbers, a lot of people you can talk to. Let me give you some action items. First one, if data was a core asset, how would IT, how would your business be different? Stop and think about that. If it wasn't your buildings that were the asset, it wasn't the machines that were the asset, it wasn't your people by themselves who were the asset, but data was the asset. How would you reinstitutionalize work? That's what every business is starting to ask, even if they don't ask it in the same way. And our advice is, then do it because that's the future of business. Not that data is the only asset but data is a recognized central asset and that's going to have enormous impacts on a lot of things. The second point I want to leave you with, tens of billions of users and I'm including people and devices, are dependent on thousands of data scientists that's an impedance mismatch that cannot be sustained. Packaged apps and these cloud services are going to be the way to bridge that gap. I'd love to tell you that it's all going to be about tools, that we're going to have hundreds of thousands or millions or tens of millions or hundreds of millions of data scientists suddenly emerge out of the woodwork. It's not going to happen. The third thing is we think that big businesses, enterprises, have to master what we call the big inflection. The big tech inflection. The first 50 years were about known process and unknown technology. How do I take an accounting package and do I put on a mainframe or a mini computer a client/server or do I do it on the web? Unknown technology. Well increasingly today, all of us have a pretty good idea what the base technology is going to be. Does anybody doubt it's going to be the cloud? We got a pretty good idea what the base technology is going to be. What we don't know is what are the new problems that we can attack, that we can address with data rich approaches to thinking about how we turn those systems into actors on behalf of our business and customers. So I'm a couple minutes over, I apologize. I want to make sure everybody can get over to the keynotes if you want to. Feel free to stay, theCUBE's going to be live at 9:30. If I got that right. So it's actually pretty exciting if anybody wants to see how it works, feel free to stay. Georgia's here, Neil's here, I'm here. I mentioned Greg Terrio, Dave Volante, John Greco, I think I saw Sam Kahane back in the corner. Any questions, come and ask us, we'll be more than happy. Thank you very much for, oh David Volante. >> David: I have a question. >> Yes. >> David: Do you have time? >> Yep. >> David: So you talk about data as a core asset, that if you look at the top five companies by market cap in the US, Google, Amazon, Facebook, etc. They're data companies, they got data at the core which is kind of what your first bullet here describes. How do you see traditional companies closing that gap where humans, buildings, etc at the core as we enter this machine intelligence era, what's your advice to the traditional companies on how they close that gap? >> All right. So the question was, the most valuable companies in the world are companies that are well down the path of treating data as an asset. How does everybody else get going? Our observation is you go back to what's the value proposition? What actions are most important? what's data is necessary to perform those actions? Can changing the way the data is orchestrated and organized and put together inform or change the cost of performing that work by changing the cost transactions? Can you increase a new service along the same lines and then architect your infrastructure and your business to make sure that the data is near the action in time for the action to be absolute genius to your customer. So it's a relatively simple thought process. That's how Amazon thought, Apple increasingly thinks like that, where they design the experience and they think what data is necessary to deliver that experience. That's a simple approach but it works. Yes, sir. >> Audience Member: With the slide that you had a few slides ago, the market share, the big spenders, and you mentioned that, you asked the question do any of us doubt that cloud is the future? I'm with Snowflake, I don't see many of those large vendors in the cloud and I was wondering if you could speak to what are you seeing in terms of emerging vendors in that space. >> What a great question. So the question was, when you look at the companies that are catalyzing a lot of the change, you don't see a lot of the big companies being at the leadership. And someone from Snowflake just said, well who's going to lead it? That's a big question that has a lot of implications but at this point time it's very clear that the big companies are suffering a bit from the old, from the old, trying to remember what the... RCA syndrome. I think Clay Christensen talked about this. You know, the innovators dilemma. So RCA actually is one of the first creators. They created the transistor and they held a lot of original patents on it. They put that incredible new technology, back in the forties and fifties, under the control of the people who ran the vacuum tube business. When was the last time anybody bought RCA stock? The same problem is existing today. Now, how is that going to play out? Are we going to see a lot of, as we've always seen, a lot of new vendors emerge out of this industry, grow into big vendors with IPO related exits to try to scale their business? Or are we going to see a whole bunch of gobbling up? That's what I'm not clear on but it's pretty clear at this point in time that a lot of the technology, a lot of the science, is being done in smaller places. The moderating feature of that is the services side. Because there's limited groupings of expertise that the companies that today are able to attract that expertise. The Googles, the Facebooks, the AWSs, etc, the Amazons. Are doing so in support of a particular service. IBM and others are trying to attract that talent so they can apply it to customer problems. We'll see over the next few years whether the IBMs and the Accentures and the big service providers are able to attract the kind of talent necessary to diffuse that knowledge into the industry faster. So it's the rate at which that the idea of internet scale computing, the idea of big data being applied to business problems, can diffuse into the marketplace through services. If it can diffuse faster that will have both an accelerating impact for smaller vendors, as it has in the past. But it may also again, have a moderating impact because a lot of that expertise that comes out of IBM, IBM is going to find ways to drive in the product faster than it ever has before. So it's a complicated answer but that's our thinking at this point time. >> Dave: Can I add to that? >> Yeah. (audience member speaking faintly) >> I think that's true now but I think the real question, not to not to argue with Dave but this is part of what we do. The real question is how is that knowledge going to diffuse into the enterprise broadly? Because Airbnb, I doubt is going to get into the business of providing services. (audience member speaking faintly) So I think that the whole concept of community, partnership, ecosystem is going to remain very important as it always has and we'll see how fast those service companies that are dedicated to diffusing knowledge, diffusing knowledge into customer problems actually occurs. Our expectation is that as the tooling gets better, we will see more people be able to present themselves truly as capable of doing this and that will accelerate the process. But the next few years are going to be really turbulent and we'll see which way it actually ends up going. (audience member speaking faintly) >> Audience Member: So I'm with IBM. So I can tell you 100% for sure that we are, I hired literally 50 data scientists in the last three months to go out and do exactly what you're saying. Sit down with clients and help them figure out how to do data science in the enterprise. And so we are in fact scaling it, we're getting people that have done this at Google, Facebook. Not a whole lot of those 'cause we want to do it with people that have actually done it in legacy fortune 500 Companies, right? Because there's a little bit difference there. >> So. >> Audience Member: So we are doing exactly what you said and Microsoft is doing the same thing, Amazon is actually doing the same thing too, Domino Data Lab. >> They don't like they're like talking about it too much but they're doing it. >> Audience Member: But all the big players from the data science platform game are doing this at a different scale. >> Exactly. >> Audience Member: IBM is doing it on a much bigger scale than anyone else. >> And that will have an impact on ultimately how the market gets structured and who the winners end up being. >> Audience Member: To add too, a lot of people thought that, you mentioned the Red Hat of big data, a lot of people thought Cloudera was going to be the Red Hat of big data and if you look at what's happened to their business. (background noise drowns out other sounds) They're getting surrounded by the cloud. We look at like how can we get closer to companies like AWS? That was like a wild card that wasn't expected. >> Yeah but look, at the end of the day Red Hat isn't even the Red Hat of open source. So the bottom line is the thing to focus on is how is this knowledge going to diffuse. That's the thing to focus on. And there's a lot of different ways, some of its going to diffuse through tools. If it diffuses through tools, it increases the likelihood that we'll have more people capable of doing this in IBM and others can hire more. That Citibank can hire more. That's an important participant, that's an important play. So you have something to say about that but it also says we're going to see more of the packaged applications emerge because that facilitates the diffusion. This is not, we haven't figured out, I don't know exactly, nobody knows exactly the exact shape it's going to take. But that's the centerpiece of our big data researches. How is that diffusion process going to happen, accelerate, and what's the resulting structure going to look like? And ultimately how are enterprises going to create value with whatever results. Yes, sir. (audience member asks question faintly) So the recap question is you see more people coming in and promising the moon but being incapable of delivering because they are, partly because the technology is uncertain and for other reasons. So here's our approach. Or here's our observation. We actually did a fair amount of research on this. When you take a look at what we call a approach to doing big data that's optimized for the costs of procurement i.e. let's get the simplest combination of infrastructure, the simplest combination of open-source software, the simplest contracting, to create that proof of concept that you can stand things up very quickly if you have enough expertise but you can create that proof of concept but the process of turning that into actually a production system extends dramatically. And that's one of the reasons why the Clouderas did not take over the universe. There are other reasons. As George Gilbert's research has pointed out, that Cloudera is spending 53, 55 % of their money right now just integrating all the stuff that they bought into the distribution five years ago. Which is a real great recipe for creating customer value. The bottom line though is that if we focus on the time to value in production, we end up taking a different path. We don't focus as much on whether the hardware is going to work and the network is going to work and the storage can be integrated and how it's going to impact the database and what that's going to mean to our Oracle license pool and all the other things that people tend to think about if they're focused on the technology. And so as a consequence, you get better time to value if you focus on bringing the domain expertise, working with the right partner, working with the appropriate approach, to go from what's the value proposition, what actions are associated with a value proposition, what's stated in that area to perform those actions, how can I take transaction costs out of performing those actions, where's the data need to be, what infrastructure do I require? So we have to focus on a time to value not the time to procure. And that's not what a lot of professional IT oriented people are doing because many of them, I hate say it, but many of them still acquire new technology with the promise to helping the business but having a stronger focus on what it's going to mean to their careers. All right, I want to be really respectful to everybody's time. The keynotes start in about five minutes which means you just got time. If you want to stay, feel free to stay. We'll be here, we'll be happy to talk but I think that's pretty much going to close our presentation broadcast. Thank you very much for being an attentive audience and I hope you found this useful. (upbeat music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media that the actions going to be taking place. by market cap in the US, Google, Amazon, Facebook, etc. or change the cost of performing that work in the cloud and I was wondering if you could speak to the idea of big data being applied to business problems, (audience member speaking faintly) Our expectation is that as the tooling gets better, in the last three months to go out and do and Microsoft is doing the same thing, but they're doing it. Audience Member: But all the big players from Audience Member: IBM is doing it on a much bigger scale how the market gets structured They're getting surrounded by the cloud. and the network is going to work

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Marc Andreessen	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Neil	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Sam Kahane	PERSON	0.99+
Google	ORGANIZATION	0.99+
Neil Raden	PERSON	0.99+
2017	DATE	0.99+
John Greco	PERSON	0.99+
Citibank	ORGANIZATION	0.99+
Greg Terrio	PERSON	0.99+
China	LOCATION	0.99+
David Volante	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Clay Christensen	PERSON	0.99+
David	PERSON	0.99+
Sears Roebuck	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Domino Data Lab	ORGANIZATION	0.99+
Peter Drucker	PERSON	0.99+
US	LOCATION	0.99+
Amazons	ORGANIZATION	0.99+
two	QUANTITY	0.99+
11%	QUANTITY	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
68%	QUANTITY	0.99+
millions	QUANTITY	0.99+
53, 55 %	QUANTITY	0.99+
60%	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
Facebooks	ORGANIZATION	0.99+
103 billion	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
second point	QUANTITY	0.99+
IBMs	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
AWSs	ORGANIZATION	0.99+
Accentures	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
four	QUANTITY	0.99+
Hundred	QUANTITY	0.99+
Transwarp	ORGANIZATION	0.99+
Mellanox	ORGANIZATION	0.99+
tens of millions	QUANTITY	0.99+
three things	QUANTITY	0.99+
Micron	ORGANIZATION	0.99+
50 data scientists	QUANTITY	0.99+
First	QUANTITY	0.99+
yesterday	DATE	0.99+
three times	QUANTITY	0.99+
103 billion dollars	QUANTITY	0.99+
Red Hat	TITLE	0.99+
first bullet	QUANTITY	0.99+
Two	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
Secondly	QUANTITY	0.99+
five years	QUANTITY	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
hundreds of millions	QUANTITY	0.98+
first	QUANTITY	0.98+

Andreas S Weigend, PhD | Data Privacy Day 2017

>> Hey welcome back everybody, Jeff Frick here with theCUBE we're at the data privacy day at Twitter's world headquarters in downtown San Fransciso and we're really excited to get into it with our next guest Dr. Andreas Weigend, he is now at the Social Data Lab, used to be at Amazon, recently published author. Welcome. >> Good to be here, morning. >> Absolutely, so give us a little about what is Social Data Lab for people who aren't that familiar with it and what are you doing over at Berkeley? >> Alright, so let's start with what is social data? Social data is a data people create and share whether they know it or not and what that means is Twitter is explicit but also a geo location or maybe even just having photos about you. I was in Russia all day during the election day in the United States with Putin, and I have to say that people now share on Facebook what the KGB wouldn't have gotten out of them under torture. >> So did you ever see the Saturday Night Live sketch where they had a congressional hearing and the guy the CIA guy says, Facebook is the most successful project that we've ever launched, people tell us where they are who they're with and what they're going to do, share pictures, location, it's a pretty interesting sketch. >> Only be taught by Black Mirror, some of these episodes are absolutely amazing. >> People can't even watch is it what I have not seen I have to see but they're like that's just too crazy. Too real, too close to home. >> Yeah, so what was the question? >> So let's talk about your new book. >> Oh that was social data. >> Yeah social data >> Yeah, and so I call it actually social data revolution. Because if you think back, 10, 20 years ago we absolutely we doesn't mean just you and me, it means a billion people. They think about who they are, differently from 20 years ago, think Facebook as you mentioned. How we buy things, we buy things based on social data we buy things based on what other people say. Not on what some marketing department says. And even you know, the way we think about information I mean could you do a day without Google? >> No >> No. >> Could you go an hour without Google? >> An hour, yes, when I sleep. But some people actually they Google in their sleep. >> Well and they have their health tracker turned on while they sleep to tell them if they slept well. >> I actually find this super interesting. How dependent I am to know in the morning when I wake up before I can push a smiley face or the okay face or the frowny face, to first see how did I sleep? And if the cycles were nice up and down, then it must have been a good night. >> So it's interesting because the concept from all of these kind of biometric feedback loops is if you have the data, you can change your behavior based on the data, but on the other hand there is so much data and do we really change our behaivor based on the data? >> I think the question is a different one. The question is alright, we have all this data but how can we make sure that this data is used for us, not against us. Within a few hundred meters of here there's a company where employees were asked to wear a fit bit or tracking devices which retain more generally. And then one morning one employee came in after you know not having had an exactly solid night of sleep shall we say and his boss said I'm sorry but I just looked at your fit bit you know this is an important meeting, we can't have you at that meeting. Sorry about that. >> True story? >> Yeah >> Now that's interesting. So I think the fit bit angle is interesting when that is a requirement to have company issued health insurance and they see you've been sitting on your couch too much. Now how does that then run into the HIPPA regulations. >> You know, they have dog walkers here. I'm not sure where you live in San Francisco. But in the area many people have dogs. And I know that a couple of my neighbors they give when the dog walker comes to take the dog, they also give their phone to the dog walker so now it looks like they are taking regular walks and they're waiting for the discount from health insurance. >> Yeah, it's interesting. Works great for the person that does walk or gives their phone to the dog walker. But what about the person that doesn't, what about the person that doesn't stop at stop signs. What happens in a world on business models based on aggregated risk pooling when you can segment the individual? >> That is a very very very biased question. It's a question of fairness. So if we know everything about everybody what would it mean to be fair? As you said, insurance is built on pooling risk and that means by nature that there are things that we don't know about people. So maybe, we should propose lbotomy data lobotomy. So people actually have some part chopped off out of the data chopped off. So now we can pool again. >> Interesting >> Of course not, the answer is that we as society should come up with ways of coming up with objective functions, how do we weigh the person you know taking a walk and then it's easy to agree on the function then get the data and rank whatever insurance premium whatever you're talking about here rank that accordingly. So I really think it's a really important concept which actually goes back to my time at Amazon. Where we came up with fitness functions as we call it. And it takes a lot of work to have probably spent 50 hours on that with me going through groups and groups and groups figuring out, what do we want the fitness function to be like? You have to have the buy in of the groups you know it they just think you know that is some random management thing imposed on us, it's not going to happen. But if they understand that's the output they're managing for, then not bad. >> So I want to follow up on the Amazon piece because we're big fans of Jeff Hamilton and Jeff Bezzos who we go to AWS and it's interesting excuse me, James Hamilton when he talks about the resources that EWS can bring to bear around privacy and security and networking and all this massive infrastructure being built in terms of being able to protect privacy once you're in the quote un-quote public cloud versus people trying to execute that at the individual company level and you know RSA is in a couple of weeks the amount of crazy scary stuff that is coming in for people that want interviews around some of this crazy security stuff. When you look at kind of public cloud versus private cloud and privacy you know supported by a big heavy infrastructure like what EWS has versus a Joe Blow company you know trying to implement them themselves, how do you see that challenge. I mean I don't know how the person can compete with having the resourses again the aggregated resources pool that James Hamilton has to bring to barrel this problem. >> So I think we really need to distinguish two things. Which is security versus privacy. So for security there's no question in my mind that Joe Blow, with this little PC has not a chance against our Chinese or Russian friends. Is no question for me that Amazon or Google have way better security teams than anybody else can afford. Because it is really their bread and butter. And if there's a breach on that level then I think it is terrible for them. Just think about the Sony breach on a much smaller scale. That's a very different point from the point of privacy. And from the point about companies deliberately giving the data about you for targeting purposes for instance. And targeting purposes to other companies So I think for the cloud there I trust, I trust Google, I trust Amazon that they are doing hopefully a better job than the Russian hackers. I am more interested in the discussion on the value of data. Over the privacy discussion after all this is the world privacy day and there the question is what do people understand as the trade off they have, what they give in order to get something. People have talked about Google having this impossible irresistible value proposition that for all of those little data you get for instance I took Google Maps to get here, of course Google needs to know where I am to tell me to turn left at the intersection. And of course Google has to know where I want to be going. And Google knows that a bunch of other people are going there today, and you probably figure out that something interesting is happening here. >> Right >> And so those are the interesting questions from me. What do we do with data? What is the value of data? >> But A I don't really think people understand the amount of data that they're giving over and B I really don't think that they understand I mean now maybe they're starting to understand the value because of the value of companies like Google and Facebook that have the data. But do you see a shifting in A the awareness, and I think it's even worse with younger kids who just have lived on their mobile phones since the day they were conscious practically these days. Or will there be a value to >> Or will they even mobile before they were born? Children now come pre-loaded, because the parents take pictures of their children before they are born >> That's true. And you're right and the sonogram et cetera. But and then how has mobile changed this whole conversation because when I was on Facebook on my PC at home very different set of information than when it's connected to all the sensors in my mobile phone when Facebook is on my mobile phone really changes where I am how fast I'm moving, who I'm in proximity to it completely changed the privacy game. >> Yes so geo location and the ACLU here in Northern California chapter has a very good quote on that. "Geo location is really extremely powerful variable" Now what was the question? >> How has this whole privacy thing changed now with the proliferation of the mobile, and the other thing I would say, when you have kids that grew up with mobile and sharing on the young ones don't use Facebook anymore, Instagram, Snap Chat just kind of the notion of sharing and privacy relative to folks that you know wouldn't even give their credit card over the telephone not that long ago, much less type it into a keyboard, um do they really know the value do they really understand the value do they really get the implications when that's the world in which they've lived in. Most of them, you know they're just starting to enter the work force and haven't really felt the implications of that. >> So for me the value of data is how much the data impacts a decision. So for the side of the individual, if I have data about the restaurant, and that makes me decide whether to go there or to not go there. That is having an impact on my decision thus the data is valuable. For a company a decision whether to show me this offer or that offer that is how data is valued from the company. So that kind of should be quantified The value of the picture of my dog when I was a child. That is you know so valuable, I'm not talking about this. I'm very sort of rational here in terms of value of data as the impact is has on decisions. >> Do you see companies giving back more of that value to the providers of that data? Instead of you know just simple access to useful applications but obviously the value exceeds the value of the application they're giving you. >> So you use the term giving back and before you talked about kids giving up data. So I don't think that it is quite the right metaphor. So I know that metaphor come from the physical world. That sometimes has been data is in your oil and that indeed is a good metaphor when it comes to it needs to be refined to have value. But there are other elements where data is very different from oil and that is that I don't really give up data when I share and the company doesn't really give something back to me but it is much interesting exchange like a refinery that I put things in and now I get something not necessarily back I typically get something which is very different from what I gave because it has been combined with the data of a billion other people. And that is where the value lies, that my data gets combined with other peoples data in some cases it's impossible to actually take it out it's like a drop of ink, a drop in the ocean and it spreads out and you cannot say, oh I want my ink back. No, it's too late for that. But it's now spread out and that is a metaphor I think I have for data. So people say, you know I want to be in control of my data. I often think they don't have deep enough thought of what they mean by that. I want to change the conversation of people saying You what can I get by giving you the data? How can you help me make better decisions? How can I be empowered by the data which you are grabbing or which you are listening to that I produce. That is a conversation which I want to ask here at the Privacy Day. >> And that's happening with like Google Maps obviously you're exchanging the information, you're walking down the street, you're headed here they're telling you that there's a Starbucks on the corner if you want to pick up a coffee on the way. So that is already kind of happening right and that's why obviously Google has been so successful. Because they're giving you enough and you're giving them more and you get in this kind of virtuous cycle in terms of the information flow but clearly they're getting a lot more value than you are in terms of their you know based on their market capitalization you know, it's a very valuable thing in the aggregation. So it's almost like a one plus one makes three >> Yes. >> On their side. >> Yes, but it's a one trick pony ultimately. All of the money we make is rats. >> Right, right that's true. But in-- >> It's a good one to point out-- >> But then it begs the question too when we no longer ask but are just delivered that information. >> Yes, I have a friend Gam Dias and he runs a company called First Retail, and he makes the point that there will be no search anymore in a couple of years from now. What are you talking about? I search every day, but is it. Yes. But You know, you will get the things before you even think about it and with Google now a few years ago when other things, I think he is quite right. >> We're starting to see that, right where the cards come to you with a guess as to-- >> And it's not so complicated If let's see you go to the symphony you know, my phone knows that I'm at the symphony even if I turn it off, it know where I turned it off. And it knows when the symphony ends because there are like a thousand other people, so why not get Ubers, Lyfts closer there and amaze people by wow, your car is there already. You know that is always a joke what we have in Germany. In Germany we have a joke that says, Hey go for vacation in Poland your car is there already. But maybe I shouldn't tell those jokes. >> Let's talk about your book. So you've got a new book that came out >> Yeah >> Just recently released, it's called Data for the People. What's in it what should people expect, what motivated you to write the book? >> Well, I'm actually excited yesterday I got my first free copies not from the publisher and not from Amazon. Because they are going by the embargo by which is out next week. But Barnes and Noble-- >> They broke the embargo-- Barnes and Noble. Breaking news >> But three years of work and basically it is about trying to get people to embrace the data they create and to be empowered by the data they create. Lots of stories from companies I've worked with Lots of stories also from China, I have a house in China I spend a month or two months there every year for the last 15 years and the Chinese ecosystem is quite different from the US ecosystem and you of course know that the EU regulations are quite different from the US regulations. So, I wrote on what I think is interesting and I'm looking forward to actually rereading it because they told me I should reread it before I talk about it. >> Because when did you submit it? You probably submitted it-- >> Half a year >> Half a year ago, so yeah. Yeah. So it's available at Barnes and Noble and now Amazon >> It is available. I mean if you order it now, you'll get it by Monday. >> Alright, well Dr. Andreas Weigin thanks for taking a few minutes, we could go forever and ever but I think we've got to let you go back to the rest of the sessions. >> Thank you for having me. >> Alright, pleasure Jeff Frick, you're watching theCUBE see you next time.

Published Date : Jan 28 2017

SUMMARY :

Dr. Andreas Weigend, he is now at the Social Data Lab, day in the United States with Putin, So did you ever see the Saturday Night Live sketch Only be taught by Black Mirror, some of these episodes I have to see but they're like that's just too crazy. And even you know, the way we think about information But some people actually they Google in their sleep. Well and they have their health tracker turned on or the frowny face, to first see how did I sleep? an important meeting, we can't have you at that meeting. So I think the fit bit angle is interesting And I know that a couple of my neighbors they give aggregated risk pooling when you can segment the individual? As you said, insurance is built on pooling risk it they just think you know that is some random at the individual company level and you know RSA is the data about you for targeting purposes for instance. What is the value of data? because of the value of companies like Google and it completely changed the privacy game. Yes so geo location and the ACLU here in that you know wouldn't even give their credit card over the So for me the value of data is how much the data Instead of you know just simple access to How can I be empowered by the data which you are Because they're giving you enough and you're giving All of the money we make is rats. But in-- But then it begs the question too when You know, you will get the things before you even you know, my phone knows that I'm at the symphony So you've got a new book that came out what motivated you to write the book? free copies not from the publisher and not from Amazon. They broke the embargo-- and you of course know that the EU regulations are So it's available at Barnes and Noble and now Amazon I mean if you order it now, you'll get it by Monday. I think we've got to let you go back to the rest Jeff Frick, you're watching theCUBE see you next time.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Putin	PERSON	0.99+
Google	ORGANIZATION	0.99+
James Hamilton	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Jeff Bezzos	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Jeff Hamilton	PERSON	0.99+
Poland	LOCATION	0.99+
Barnes and Noble	ORGANIZATION	0.99+
Andreas Weigend	PERSON	0.99+
Germany	LOCATION	0.99+
Andreas Weigin	PERSON	0.99+
Russia	LOCATION	0.99+
50 hours	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
First Retail	ORGANIZATION	0.99+
Sony	ORGANIZATION	0.99+
China	LOCATION	0.99+
CIA	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Andreas S Weigend	PERSON	0.99+
ACLU	ORGANIZATION	0.99+
EWS	ORGANIZATION	0.99+
An hour	QUANTITY	0.99+
a month	QUANTITY	0.99+
United States	LOCATION	0.99+
next week	DATE	0.99+
Northern California	LOCATION	0.99+
three years	QUANTITY	0.99+
an hour	QUANTITY	0.99+
two months	QUANTITY	0.99+
Starbucks	ORGANIZATION	0.99+
first free copies	QUANTITY	0.99+
Social Data Lab	ORGANIZATION	0.99+
Saturday Night Live	TITLE	0.99+
KGB	ORGANIZATION	0.99+
20 years ago	DATE	0.99+
yesterday	DATE	0.99+
EU	ORGANIZATION	0.98+
three	QUANTITY	0.98+
two things	QUANTITY	0.98+
Black Mirror	TITLE	0.98+
Half a year ago	DATE	0.98+
Berkeley	LOCATION	0.98+
today	DATE	0.97+
US	LOCATION	0.97+
one employee	QUANTITY	0.97+
Monday	DATE	0.97+
Twitter	ORGANIZATION	0.97+
first	QUANTITY	0.97+
Lyfts	ORGANIZATION	0.96+
one morning	QUANTITY	0.96+
Joe Blow	ORGANIZATION	0.95+
Russian	OTHER	0.95+
Data for the People	TITLE	0.95+
one	QUANTITY	0.94+
Google Maps	TITLE	0.93+
a day	QUANTITY	0.93+
Gam Dias	PERSON	0.92+
Ubers	ORGANIZATION	0.91+
Dr.	PERSON	0.91+
Chinese	OTHER	0.9+
one trick	QUANTITY	0.89+
few years ago	DATE	0.88+
Instagram	ORGANIZATION	0.83+

Chris Lynskey, Oracle - On the Ground - #theCUBE

>> Announcer: TheCube presents, On the Ground. (upbeat music) Hi everyone, welcome to this special On the Ground, Cube coverage here at Oracle headquarters. I'm John Furrier, the host of theCube, here with Chris Linskey, who's the Vice President, Product Management for Oracle Big Data. Welcome to On the Ground, good to see you. >> Thanks John, nice to meet you. >> So let's talk about big data, and the concepts going on now for analytics. What is going on in your mind around big data, and some of the ideas that with customers are kicking around, because the number one thing we hear is, I got to store the data. Solved, check, database, system of record. But now other databases are popping up. Different types of databases, you got graph databases, you've got unstructured databases. Do I run Oracle for all those? When do I use Oracle? When do I don't use Oracle? So the first question is, what are some of the obstacles that are facing the companies? Is it integration? Is it the choice? What's going on? >> There's a lot. There's a lot of interest in the market around big data. But in terms of companies that are actually using that in kind of a productized fashion to build competitive insight, are less than you would think, because of some of these obstacles. So, we look at it in a few different ways, and we try to tackle the obstacles at Oracle in each of these categories. One of the first big questions to solve, is what you raised. How do I manage the data? I've got a lot of gravity in my data warehouse and in my databases, but now I've got all this new content coming in. It might be social media. It might be log data. Things that you're not sure of the value, so it may not make sense to store in that enterprise data warehouse. That's really where customers are looking at alternate technologies like big data, like Hadoop, to give you both that cost savings, but also to give you specialized access, whether you're doing, like you said, spacial queries, or graph queries. Oracle can give you the right engine for the right job, but what's also important in that data management layer, is doing it in a way that breeds simplicity of ownership. If the cost of ownership is too expensive, no one's going to do that. So we also have an initiative called Big Data SQL, that let's you use that common Oracle database as your front-end, but then queried back to Hadoop, queried back to a spatial or graph engine. You can leave that data there, where it makes the most sense. >> I mean SQL on Hadoop for instance has proven that SQL is the language of most people querying. So, that's out there, so that's done. But it doesn't mean run relational databases all the time, but that's what people are interfacing into other databases. >> Chris: Yeah. >> Is that a pretext to what's really happening? Is that, interfacing to other data sets is really the more important than actually having whole new systems. Because that seems to be ... >> It's a bit of both. The way I look at it is, some companies look at Hadoop as just another data source. I've got some log data, some social data, let me put it in a place that's cost-effective to store. And there using your database as a front-end makes sense. Other customers look at Hadoop and big data more as a data platform, where they want to use that cluster, that compute environment, to do more than just query things and build a chart. And that's where you see some new technologies coming out. In Oracle we call it our data factory. That's around, how can I use all of that compute power to actually do data integration? Right, how can I keep up with that one hour of ETL window I'm given a night to deal with all these new sources? So we see people adopting Hadoop for ... >> That's a tough window, one hour is a tough window. If you're Wall Street, backing up. >> Yeah, some of it's tough. >> Talk about Data Lab. What is this concept that you have been kicking around called Data Lab? >> Exactly. >> What does that mean? >> So I think that's the third pillar. We talked about data management, giving you the right engine. We talked about data factory, giving you that integration capability. But, why go through all that effort, if not to start driving innovation? And that's what we think about as the Data Lab. It's a place where you can experiment with advanced analytics. It's a place where you can experiment with data mashup, and new data combinations. And you do it in a cost-effective way, and a way that breeds this notion of agility. You mentioned the word, system of record before. That's a very great description for the warehouse. You're not going to change your revenue definition, or your customer dimension, in the warehouse. That's what everyone uses. But Hadoop, people look at as a system of innovation. It sits alongside the warehouse. You can put a lot of that same data in there. Often you'll put data that never made it into the warehouse. So you get that big data variety, and then you can use that to come up with new ideas. So that's really the essence of the lab, is bringing in more data sets. Trying more combinations of data. And then also seeing if you can move beyond just descriptive and diagnostic analytics, into predictive. >> So let me just get this right. Factory is all the ingestion, Data Lab is like your, I'd say sandbox, my word. So system of record is the most important data. That's a customer name, a key variable for that, that's in the company's business model. So that's where all the hardcore data is. Social media data might be, hey, I'm geo, piece of geo data, and it's at retail store, says I'm going to buy something, or has local presence. Has my name, which is in the system of record. So, that data is in a different database. Has to go over there and get to the system of record. That's hard. That's actually a hard problem. >> Chris: It is. >> But that's a realistic thing that people want to take is this gestural data pieces, small data, that means something to the system of record, or some engagement data, cross-connected to system of record. Do you guys solve that problem? This is what people want to look for, right? >> We do. What's interesting is, that's an age old problem. We had that with data warehousing. We have it even more now with all the big data sources. And, I think the opportunity here is to decide who should solve that problem. Is it a scarce ETL developer that you have in IT. They have limited cycles. >> That's true. >> Do I have a data scientist? People actually use data scientists to do this sort of data integration work. It's hard to come up with a new predictive model if the data sets don't match up. And, its unfortunate, because that's the PhD guy. And, that's menial labor to a large degree. >> Hard to find PhD's, too. >> It is. I like to call them unicorns. You hear about them, you never really see them. And you definitely don't want the scientist doing that menial labor. The joke we say is that the data scientist has been turned into a data janitor, because of all these tasks that get put on their shoulders. So, we think at Oracle that's an opportunity. With this combination of data management, data factory, and Data Lab on top, you can actually push that work out to your business analyst teams. They can collaborate with IT. They can collaborate with your data scientist if you have them, but the spirit of the Lab is not ... >> So making the analysts and the business folks, make them like data scientists. >> Chris: Exactly. >> As functional as data scientists, without having them being ... >> One of the phrases in the industry is citizen data scientist, and I manage a product called Oracle Big Data Discovery, and that is really our goal. Can we build these very intuitive UI's, that make these analysts produce more output like a data scientist would. >> So what's the architecture to make that happen? Because I think that's right on the money. I think that's a great solution. I think and the example I used is just a small piece of data, but that's a database problem. So by abstracting out to another level with software, you can let people wire their own solutions together. I get that. How do you guys do that from an architecture standpoint? What do you say to customers? How do I do this? What's the playbook? >> It's a good question, because at its core, there's no reason to go about solving this problem, unless it works at the big data scale, right? If you can't analyze petabytes, terabytes of content, you would use a regular BI solution. There's no reason to move over to big data. So, a key aspect of the architecture is scale. But also if you're going to support these analysts, they're not happy if they click on the screen and then they wait five minutes for something to come back. So, interactivity performance is critical too for this user base. Because of that, in products like BDD, and really across a lot of our different initiatives, Apache Spark has become a key piece of our architecture. And that's something you might not expect from Oracle, that we're moving into open source, adopting a lot of those technologies, but we really do see the value of Spark. >> So I asked Neil Mendelson just today the question, where he sees the market going. So I want to ask you a little bit different question, but same question on a different task. What's the next big thing? Because we are on the front end of this really pioneering analytics mindset. >> Chris: Yep. >> Horizontally scalable data sets. Software value propositions, applied to data as currency, if you will. Soon data will be on the balance sheets. Some say, certainly the analysts at Wikibon are saying that, some day it should be an asset class. >> Chris: Data capital is a phrase we use. >> Data capital, love that. And so that is a trend, that could be right around the corner. But that's where it's going. What's the next big thing to get us there? >> I think the first hurdle was just making sense of big data. It took organizations a couple years just to get their head around that, and to build that architecture, so it will scale and people will adopt the system. I think the opportunity now is, at least as we see it in our analytic portfolio is, you've got these users on the system. You've got these Hadoop clusters in place. What can you do with that power? And, we think the big opportunity, especially as we create these data scientists, these citizen data scientists, is machine learning. How can we embed, especially the Spark machine learning libraries, into our products more natively? Such that, you don't have to have the PhD at the outset. You can use that compute power, and you can use the Spark open source libraries, to help bootstrap that process. >> So do you guys solve what I call the data swamp problem? Because, let me explain in more color. Most people are dumping everything in what they call a data lake. And, just store all the data, we'll get to it later. Some of it, mostly it's Hadoop, it's a bunch of batch data. Because they don't know what to do with it yet. So it just sits there. And it gets dirty, and it turns into a swamp. That's what the joke is, data swamp. Ironically we're looking at the lake here at the Oracle headquarters. >> Chris: Pristine, pristine. >> Pristine, the water's flying up through thing, it's beautiful. This is a big problem, because data that's idle, that's not being used in this case, not being intelligently acted upon, can turn into a swamp, is only valuable when needed. Meaning, if something's happening in real time, you go to the data lake, and pull out a piece of data, to your earlier reference, and make it in real time, it's important. So you never know the potential energy of that data, and the value. It could be perfectly useless one minute, extremely valuable the next. Is your value proposition with the big data appliance of analyst tools to connect to those lakes and bring them back? Is that the whole, you guys save the data lake >> There's two pieces >> problem? >> There's two pieces. One is giving you the infrastructure, and for that we have our big data cloud service, our big data appliance. Because, lots of people think big data is just commodity hardware. As you move into analytics and do more in memory, you're going to want that extra capacity. So that's one piece, making sure you've got the horsepower. But then, you need those tools on top. And that's where our big data discovery product focuses. And to your point, what we've done is actually integrate the things that those analysts need when they're in that discovery moment. First thing they need, like you said, I never knew I needed this data set before. It just came up to me. So we give you almost a shopping experience for data. You can go in, type in keywords. I want to look for social media log data. And we actually search into Hadoop, and index all that content. So, it's just like you were on our website. >> So you're kind of keeping the lake moving and clean, because you're indexing it, so you can service data at any given time. >> That's the first piece. The second piece though is again, in your discovery process, you have to recognize this is the first time people will be working with this data. And that's where a lot of these data scientists shine, because they know all the techniques as to how do I interrogate it? What's important? What's not? And that's what we build into our product now. So the analyst can just look at a very visual screen, and it helps them figure out where to focus. Is it worth me spending time? >> It's like almost this bot craze that's going on. You guys are abstracting away the scientist's knowledge into software, and providing almost an interface. >> That's the hope. If you can get a data scientist, trust me, keep them. They're very valuable. >> Catch that unicorn. >> Yes. >> No, it's true though. There's not enough PhD's, or data scientists out there. Soon, there's new curriculum out there, but still. The idea is to scale up, and make the normal person, the citizen be the data scientist. >> And also, it's funny, if you look at the advanced analytic tools, and the data science tools out there, they're very dated. A lot of them were built 15, 20 years ago with that data miner statistician. There's now this new breed of data scientists that they want more compelling interfaces. They expect more. >> Chris, final question. Top three conversations that you have with customers, where they're most challenged. If you had to look at the patterns, applying all the big data techniques in your brain to the three top problems that customers are trying to solve that you guys help. >> Excellent. So the first one I would say by far, and I wish it wasn't the case, but it's, help me justify building out my big data cluster. That's the first one. Lots of companies want to do more with big data, but they're struggling ... >> Just their ROI, or cost, or both. >> The ROI, the cost, really, why should I make that investment? How do I justify it? And I really do think that cloud is going to change that picture dramatically. When I can shift to looking at the CapEx versus OpEx ... >> So you're saying the cloud lowers the bar, in terms of getting value generated, or is it ... >> It does two things. It lowers the financial entry point, and how much you have to justify up front. And it lowers the IT skillset to manage those clusters in the data center. So, two very big problems. >> Great, that's awesome. Second one? >> No that I've solved that. Second one is, okay, well what do I do next? How do I find things? Where should I be looking? And that is where this Data Lab concept is meant to come into play. Some customers will have a perfect use case in mind. That's how they justified the project. They can go and execute that. But a lot of them, again it's this notion of a data lake. I need to pursue a range of experiments. Where do I start? And tools like big data discovery help a lot there. >> So Data Lab is just play with the data, and get a feel for it. >> Yep. And do it in a way that breeds that experimentation. Not just to visualize the data, but change it. Reshape it. Build new models, build new classifications. The last thing I'd say is okay, did I get my ROI, do I have a cluster? Yes. Did I figure out something that looks interesting? Yes. Now I have an idea. What do I do next? It's how do I connect my insights from big data back to the tools that we use every day. >> So this is where the value of the data capital thing you're talking about. The Lab is essentially formulating the key connections for data pipes to connect in. >> Yep. >> Is that kind of the best way to think about it? >> Roughly, yes. Yeah you come up with new ideas, new data products ... >> So you've operationalized it by the third step. >> Yes. And then, how do you do that? In some cases it's, oh, I just push the data, I move the data over to my data warehouse. Which may make sense. But Oracle also has, I think I mentioned it before, Big Data SQL as a product. Which will let you keep that data in Hadoop, keep everything else in your data warehouse, and productization is that easy. So you don't have to worry about moving data. It helps a lot. >> Well that highlights one of the things we always hear all the time, which is skills. >> Chris: Yep. >> And people know SQL. >> Chris: They do. Everyone does. >> Everyone does. Chris, thanks so much for spending the time here On the Ground. Really appreciate chatting with you. This is theCube. Exclusive coverage on the ground at Oracle headquarters. I'm John Furrier, thanks for watching.

Published Date : Sep 7 2016

SUMMARY :

I'm John Furrier, the host of theCube, that are facing the companies? One of the first big questions to solve, is what you raised. has proven that SQL is the language Is that a pretext to what's really happening? And that's where you see some new technologies coming out. That's a tough window, one hour is a tough window. What is this concept that you have been kicking around So that's really the essence of the lab, So system of record is the most important data. that means something to the system of record, Is it a scarce ETL developer that you have in IT. It's hard to come up with a new predictive model And you definitely don't want the scientist So making the analysts and the business folks, As functional as data scientists, One of the phrases in the industry So by abstracting out to another level with software, So, a key aspect of the architecture is scale. So I want to ask you a little bit different question, Some say, certainly the analysts at Wikibon What's the next big thing to get us there? and you can use the Spark open source libraries, So do you guys solve what I call the data swamp problem? Is that the whole, you guys So we give you almost a shopping experience for data. so you can service data at any given time. So the analyst can just look at a very visual screen, the scientist's knowledge into software, That's the hope. The idea is to scale up, and make the normal person, and the data science tools out there, that you guys help. So the first one I would say by far, And I really do think that cloud is going to So you're saying the cloud lowers the bar, And it lowers the IT skillset to manage those clusters Great, that's awesome. And that is where this Data Lab concept So Data Lab is just play with the data, back to the tools that we use every day. The Lab is essentially formulating the key connections Yeah you come up with new ideas, new data products ... I move the data over to my data warehouse. Well that highlights one of the things we always hear Chris: They do. Exclusive coverage on the ground at Oracle headquarters.

ENTITIES

Entity	Category	Confidence
Chris Linskey	PERSON	0.99+
John	PERSON	0.99+
Chris	PERSON	0.99+
Neil Mendelson	PERSON	0.99+
two pieces	QUANTITY	0.99+
John Furrier	PERSON	0.99+
five minutes	QUANTITY	0.99+
second piece	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Chris Lynskey	PERSON	0.99+
first piece	QUANTITY	0.99+
one piece	QUANTITY	0.99+
first question	QUANTITY	0.99+
One	QUANTITY	0.99+
one hour	QUANTITY	0.99+
one minute	QUANTITY	0.99+
SQL	TITLE	0.99+
third step	QUANTITY	0.99+
Data Lab	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
third pillar	QUANTITY	0.99+
Second one	QUANTITY	0.99+
first time	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
Oracle Big Data	ORGANIZATION	0.98+
two things	QUANTITY	0.98+
first hurdle	QUANTITY	0.98+
three top problems	QUANTITY	0.98+
today	DATE	0.98+
Spark	TITLE	0.97+
first one	QUANTITY	0.97+
First thing	QUANTITY	0.96+
Apache	ORGANIZATION	0.95+
two very big problems	QUANTITY	0.92+
CapEx	ORGANIZATION	0.9+
one	QUANTITY	0.89+
Wall Street	LOCATION	0.86+
OpEx	ORGANIZATION	0.85+
15, 20 years ago	DATE	0.85+
three conversations	QUANTITY	0.85+
TheCube	ORGANIZATION	0.84+
Data	ORGANIZATION	0.82+
theCube	ORGANIZATION	0.81+
first big questions	QUANTITY	0.79+
On the Ground	ORGANIZATION	0.62+
reen	PERSON	0.57+
Pristine	PERSON	0.54+
Data	OTHER	0.52+
President	PERSON	0.52+
couple	QUANTITY	0.52+
Data	TITLE	0.52+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Data Lab: