Krishna Cheriath, Bristol Myers Squibb | MITCDOIQ 2020

>> From the Cube Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a Cube Conversation. >> Hi everyone, this is Dave Vellante and welcome back to the Cube's coverage of the MIT CDOIQ. God, we've been covering this show since probably 2013, really trying to understand the intersection of data and organizations and data quality and how that's evolved over time. And with me to discuss these issues is Krishna Cheriath, who's the Vice President and Chief Data Officer, Bristol-Myers Squibb. Krishna, great to see you, thanks so much for coming on. >> Thank you so much Dave for the invite, I'm looking forward to it. >> Yeah first of all, how are things in your part of the world? You're in New Jersey, I'm also on the East coast, how you guys making out? >> Yeah, I think these are unprecedented times all around the globe and whether it is from a company perspective or a personal standpoint, it is how do you manage your life, how do you manage your work in these unprecedented COVID-19 times has been a very interesting challenge. And to me, what is most amazing has been, I've seen humanity rise up and so to our company has sort of snap to be able to manage our work so that the important medicines that have to be delivered to our patients are delivered on time. So really proud about how we have done as a company and of course, personally, it has been an interesting journey with my kids from college, remote learning, wife working from home. So I'm very lucky and blessed to be safe and healthy at this time. So hopefully the people listening to this conversation are finding that they are able to manage through their lives as well. >> Obviously Bristol-Myers Squibb, very, very strong business. You guys just recently announced your quarter. There's a biologics facility near me in Devon's, Massachusetts, I drive by it all the time, it's a beautiful facility actually. But extremely broad portfolio, obviously some COVID impact, but you're managing through that very, very well, if I understand it correctly, you're taking a collaborative approach to a COVID vaccine, you're now bringing people physically back to work, you've been very planful about that. My question is from your standpoint, what role did you play in that whole COVID response and what role did data play? >> Yeah, I think it's a two part as you rightly pointed out, the Bristol-Myers Squibb, we have been an active partner on the the overall scientific ecosystem supporting many different targets that is, from many different companies I think. Across biopharmaceuticals, there's been a healthy convergence of scientific innovation to see how can we solve this together. And Bristol-Myers Squibb have been an active participant as our CEO, as well as our Chief Medical Officer and Head of Research have articulated publicly. Within the company itself, from a data and technology standpoint, data and digital is core to the response from a company standpoint to the COVID-19, how do we ensure that our work continues when the entire global workforce pivots to a kind of a remote setting. So that really calls on the digital infrastructure to rise to the challenge, to enable a complete global workforce. And I mean workforce, it is not just employees of the company but the all of the third-party partners and others that we work with, the whole ecosystem needs to work. And I think our digital infrastructure has proven to be extremely resilient than that. From a data perspective, I think it is twofold. One is how does the core book of business of data continue to drive forward to make sure that our companies key priorities are being advanced. Secondarily, we've been partnering with a research and development organization as well as medical organization to look at what kind of real world data insights can really help in answering the many questions around COVID-19. So I think it is twofold. Main summary; one is, how do we ensure that the data and digital infrastructure of the company continues to operate in a way that allows us to progress the company's mission even during a time when globally, we have been switched to a remote working force, except for some essential staff from lab and manufacturing standpoint. And secondarily is how do we look at the real-world evidence as well as the scientific data to be a good partner with other companies to look at progressing the societal innovations needed for this. >> I think it's a really prudent approach because let's face it, sometimes one shot all vaccine can be like playing roulette. So you guys are both managing your risk and just as I say, financially, a very, very successful company in a sound approach. I want to ask you about your organization. We've interviewed many, many Chief Data Officers over the years, and there seems to be some fuzziness as to the organizational structure. It's very clear with you, you report in to the CIO, you came out of a technical bag, you have a technical degree but you also of course have a business degree. So you're dangerous from that standpoint. You got both sides which is critical, I would think in your role, but let's start with the organizational reporting structure. How did that come about and what are the benefits of reporting into the CIO? >> I think the Genesis for that as Bristol-Myers Squibb and when I say Bristol-Myers Squibb, the new Bristol-Myers Squibb is a combination of Heritage Bristol-Myers Squibb and Heritage Celgene after the Celgene acquisition last November. So in the Heritage Bristol-Myers Squibb acquisition, we came to a conclusion that in order for BMS to be able to fully capitalize on our scientific innovation potential as well as to drive data-driven decisions across the company, having a robust data agenda is key. Now the question is, how do you progress that? Historically, we had approached a very decentralized mechanism that made a different data constituencies. We didn't have a formal role of a Chief Data Officer up until 2018 or so. So coming from that realization that we need to have an effective data agenda to drive forward the necessary data-driven innovations from an analytic standpoint. And equally importantly, from optimizing our execution, we came to conclusion that we need an enterprise-level data organization, we need to have a first among equals if you will, to be mandated by the CEO, his leadership team, to be the kind of an orchestrator of a data agenda for the company, because data agenda cannot be done individually by a singular CDO. It has to be done in partnership with many stakeholders, business, technology, analytics, et cetera. So from that came this notion that we need an enterprise-wide data organization. So we started there. So for awhile, I would joke around that I had all of the accountabilities of the CDO without the lofty title. So this journey started around 2016, where we create an enterprise-wide data organization. And we made a very conscious choice of separating the data organization from analytics. And the reason we did that is when we look at the bowl of Bristol-Myers Squibb, analytics for example, is core and part of our scientific discovery process, research, our clinical development, all of them have deep data science and analytic embedded in it. But we also have other analytics whether it is part of our sales and marketing, whether it is part of our finance and our enabling functions they catch all across global procurement et cetera. So the world of analytics is very broad. BMS did a separation between the world of analytics and from the world of data. Analytics at BMS is in two modes. There is a central analytics organization called Business Insights and Analytics that drive most of the enterprise-level analytics. But then we have embedded analytics in our business areas, which is research and development, manufacturing and supply chain, et cetera, to drive what needs to be closer to the business idea. And the reason for separating that out and having a separate data organization is that none of these analytic aspirations or the business aspirations from data will be met if the world of data is, you don't have the right level of data available, the velocity of data is not appropriate for the use cases, the quality of data is not great or the control of the data. So that we are using the data for the right intent, meeting the compliance and regulatory expectations around the data is met. So that's why we separated out that data world from the analytics world, which is a little bit of a unique construct for us compared to what we see generally in the world of CDOs. And from that standpoint, then the decision was taken to make that report for global CIO. At Bristol-Myers Squibb, they have a very strong CIO organization and IT organization. When I say strong, it is from this lens standpoint. A, it is centralized, we have centralized the budget as well as we have centralized the execution across the enterprise. And the CDO reporting to the CIO with that data-specific agenda, has a lot of value in being able to connect the world of data with the world of technology. So at BMS, their Chief Data Officer organization is a combination of traditional CDO-type accountabilities like data risk management, data governance, data stewardship, but also all of the related technologies around master data management, data lake, data and analytic engineering and a nascent AI data and technology lab. So that construct allows us to be a true enterprise horizontal, supporting analytics, whether it is done in a central analytics organization or embedded analytics teams in the business area, but also equally importantly, focus on the world of data from operational execution standpoint, how do we optimize data to drive operational effectiveness? So that's the construct that we have where CDO reports to the CIO, data organization separated from analytics to really focus around the availability but also the quality and control of data. And the last nuance that is that at BMS, the Chief Data Officer organization is also accountable to be the Data Protection Office. So we orchestrate and facilitate all privacy-related actions across because that allows us to make sure that all personal data that is collected, managed and consumed, meets all of the various privacy standards across the world, as well as our own commitments as a company from across from compliance principles standpoint. >> So that makes a lot of sense to me and thank you for that description. You're not getting in the way of R&D and the scientists, they know data science, they don't need really your help. I mean, they need to innovate at their own pace, but the balance of the business really does need your innovation, and that's really where it seems like you're focused. You mentioned master data management, data lakes, data engineering, et cetera. So your responsibility is for that enterprise data lifecycle to support the business side of things, and I wonder if you could talk a little bit about that and how that's evolved. I mean a lot has changed from the old days of data warehouse and cumbersome ETL and you mentioned, as you say data lakes, many of those have been challenging, expensive, slow, but now we're entering this era of cloud, real-time, a lot of machine intelligence, and I wonder if you could talk about the changes there and how you're looking at and thinking about the data lifecycle and accelerating the time to insights. >> Yeah, I think the way we think about it, we as an organization in our strategy and tactics, think of this as a data supply chain. The supply chain of data to drive business value whether it is through insights and analytics or through operation execution. When you think about it from that standpoint, then we need to get many elements of that into an effective stage. This could be the technologies that is part of that data supply chain, you reference some of them, the master data management platforms, data lake platforms, the analytics and reporting capabilities and business intelligence capabilities that plug into a data backbone, which is that I would say the technology, swim lane that needs to get right. Along with that, what we also need to get right for that effective data supply chain is that data layer. That is, how do you make sure that there is the right data navigation capability, probably you make sure that we have the right ontology mapping and the understanding around the data. How do we have data navigation? It is something that we have invested very heavily in. So imagine a new employee joining BMS, any organization our size has a pretty wide technology ecosystem and data ecosystem. How do you navigate that, how do we find the data? Data discovery has been a key focus for us. So for an effective data supply chain, then we knew that and we have instituted our roadmap to make sure that we have a robust technology orchestration of it, but equally important is an effective data operations orchestration. Both needs to go hand in hand for us to be able to make sure that that supply chain is effective from a business use case and analytic use standpoint. So that has led us on a journey from a cloud perspective, since you refer that in your question, is we have invested very heavily to move from very disparate set of data ecosystems to a more converse cloud-based data backbone. That has been a big focus at the BMS since 2016, whether it is from a research and development standpoint or from commercialization, it is our word for the sales and marketing or manufacturing and supply chain and HR, et cetera. How do we create a converged data backbone that allows us to use that data as a resource to drive many different consumption patterns? Because when you imagine an enterprise of our size, we have many different consumers of the data. So those consumers have different consumption needs. You have deep data science population who just needs access to the data and they have data science platforms but they are at once programmers as well, to the other end of the spectrum where executives need pre-packaged KPIs. So the effective orchestration of the data ecosystem at BMS through a data supply chain and the data backbone, there's a couple of things for us. One, it drives productivity of our data consumers, the scientific researchers, analytic community or other operational staff. And second, in a world where we need to make sure that the data consumption appalls ethical standards as well as privacy and other regulatory expectations, we are able to build it into our system and process the necessary controls to make sure that the consumption and the use of data meets our highest trust advancements standards. >> That makes a lot of sense. I mean, converging your data like that, people always talk about stove pipes. I know it's kind of a bromide but it's true, and allows you to sort of inject consistent policies. What about automation? How has that affected your data pipeline recently and on your journey with things like data classification and the like? >> I think in pursuing a broad data automation journey, one of the things that we did was to operate at two different speed points. In a historically, the data organizations have been bundled with long-running data infrastructure programs. By the time you complete them, their business context have moved on and the organization leaders are also exhausted from having to wait from these massive programs to reach its full potential. So what we did very intentionally from our data automation journey is to organize ourselves in two speed dimensions. First, a concept called Rapid Data Lab. The idea is that recognizing the reality that the data is not well automated and orchestrated today, we need a SWAT team of data engineers, data SMEs to partner with consumers of data to make sure that we can make effective data supply chain decisions here and now, and enable the business to answer questions of today. Simultaneously in a longer time horizon, we need to do the necessary work of moving the data automation to a better footprint. So enterprise data lake investments, where we built services based on, we had chosen AWS as the cloud backbone for data. So how do we use the AWS services? How do we wrap around it with the necessary capabilities so that we have a consistent reference and technical architecture to drive the many different function journeys? So we organized ourselves into speed dimensions; the Rapid Data Lab teams focus around partnering with the consumers of data to help them with data automation needs here and now, and then a secondary team focused around the convergence of data into a better cloud-based data backbone. So that allowed us to one, make an impact here and now and deliver value from data to the dismiss here and now. Secondly, we also learned a lot from actually partnering with consumers of data on what needs to get adjusted over a period of time in our automation journey. >> It makes sense, I mean again, that whole notion of converged data, putting data at the core of your business, you brought up AWS, I wonder if I could ask you a question. You don't have to comment on specific vendors, but there's a conversation we have in our community. You have AWS huge platform, tons of partners, a lot of innovation going on and you see innovation in areas like the cloud data warehouse or data science tooling, et cetera, all components of that data pipeline. As well, you have AWS with its own tooling around there. So a question we often have in the community is will technologists and technology buyers go for kind of best of breed and cobble together different services or would they prefer to have sort of the convenience of a bundled service from an AWS or a Microsoft or Google, or maybe they even go best of breeds for all cloud. Can you comment on that, what's your thinking? >> I think, especially for organizations, our size and breadth, having a converged to convenient, all of the above from a single provider does not seem practical and feasible, because a couple of reasons. One, the heterogeneity of the data, the heterogeneity of consumption of the data and we are yet to find a single stack provider who can meet all of the different needs. So I am more in the best of breed camp with a few caveats, a hybrid best of breed, if you will. It is important to have a converged the data backbone for the enterprise. And so whether you invest in a singular cloud or private cloud or a combination, you need to have a clear intention strategy around where are you going to host the data and how is the data is going to be organized. But you could have a lot more flexibility in the consumption of data. So once you have the data converged into, in our case, we converged on AWS-based backbone. We allow many different consumptions of the data, because I think the analytic and insights layer, data science community within R&D is different from a data science community in the supply chain context, we have business intelligence needs, we have a catered needs and then there are other data needs that needs to be funneled into software as service platforms like the sales forces of the world, to be able to drive operational execution as well. So when you look at it from that context, having a hybrid model of best of breed, whether you have a lot more convergence from a data backbone standpoint, but then allow for best of breed from an analytic and consumption of data is more where my heart and my brain is. >> I know a lot of companies would be excited to hear that answer, but I love it because it fosters competition and innovation. I wish I could talk for you forever, but you made me think of another question which is around self-serve. On your journey, are you at the point where you can deliver self-serve to the lines of business? Is that something that you're trying to get to? >> Yeah, I think it does. The self-serve is an absolutely important point because I think the traditional boundaries of what you consider the classical IT versus a classical business is great. I think there is an important gray area in the middle where you have a deep citizen data scientist in the business community who really needs to be able to have access to the data and I have advanced data science and programming skills. So self-serve is important but in that, companies need to be very intentional and very conscious of making sure that you're allowing that self-serve in a safe containment sock. Because at the end of the day, whether it is a cyber risk or data risk or technology risk, it's all real. So we need to have a balanced approach between promoting whether you call it data democratization or whether you call it self-serve, but you need to balance that with making sure that you're meeting the right risk mitigation strategy standpoint. So that's how then our focus is to say, how do we promote self-serve for the communities that they need self-serve, where they have deeper levels of access? How do we set up the right safe zones for those which may be the appropriate mitigation from a cyber risk or data risk or technology risk. >> Security pieces, again, you keep bringing up topics that I could talk to you forever on, but I heard on TV the other night, I heard somebody talking about how COVID has affected, because of remote access, affected security. And it's like hey, give everybody access. That was sort of the initial knee-jerk response, but the example they gave as well, if your parents go out of town and the kid has a party, you may have some people show up that you don't want to show up. And so, same issue with remote working, work from home. Clearly you guys have had to pivot to support that, but where does the security organization fit? Does that report separate alongside the CIO? Does it report into the CIO? Are they sort of peers of yours, how does that all work? >> Yeah, I think at Bristol-Myers Squibb, we have a Chief Information Security Officer who is a peer of mine, who also reports to the global CIO. The CDO and the CSO are effective partners and are two sides of the coin and trying to advance a total risk mitigation strategy, whether it is from a cyber risk standpoint, which is the focus of the Chief Information Security Officer and whether it is the general data consumption risk. And that is the focus from a Chief Data Officer in the capacities that I have. And together, those are two sides of a coin that the CIO needs to be accountable for. So I think that's how we have orchestrated it, because I think it is important in these worlds where you want to be able to drive data-driven innovation but you want to be able to do that in a way that doesn't open the company to unwanted risk exposures as well. And that is always a delicate balancing act, because if you index too much on risk and then high levels of security and control, then you could lose productivity. But if you index too much on productivity, collaboration and open access and data, it opens up the company for risks. So it is a delicate balance within the two. >> Increasingly, we're seeing that reporting structure evolve and coalesce, I think it makes a lot of sense. I felt like at some point you had too many seats at the executive leadership table, too many kind of competing agendas. And now your structure, the CIO is obviously a very important position. I'm sure has a seat at the leadership table, but also has the responsibility for managing that sort of data as an asset versus a liability which my view, has always been sort of the role of the Head of Information. I want to ask you, I want to hit the Escape key a little bit and ask you about data as a resource. You hear a lot of people talk about data is the new oil. We often say data is more valuable than oil because you can use it, it doesn't follow the laws of scarcity. You could use data in infinite number of places. You can only put oil in your car or your house. How do you think about data as a resource today and going forward? >> Yeah, I think the data as the new oil paradigm in my opinion, was an unhealthy, and it prompts different types of conversations around that. I think for certain companies, data is indeed an asset. If you're a company that is focused on information products and data products and that is core of your business, then of course there's monetization of data and then data as an asset, just like any other assets on the company's balance sheet. But for many enterprises to further their mission, I think considering data as a resource, I think is a better focus. So as a vital resource for the company, you need to make sure that there is an appropriate caring and feeding for it, there is an appropriate management of the resource and an appropriate evolution of the resource. So that's how I would like to consider it, it is a personal end of one perspective, that data as a resource that can power the mission of the company, the new products and services, I think that's a good, healthy way to look at it. At the center of it though, a lot of strategies, whether people talk about a digital strategy, whether the people talk about data strategy, what is important is a company to have a pool north star around what is the core mission of the company and what is the core strategy of the company. For Bristol-Myers Squibb, we are about transforming patients' lives through science. And we think about digital and data as key value levers and drivers of that strategy. So digital for the sake of digital or data strategy for the sake of data strategy is meaningless in my opinion. We are focused on making sure that how do we make sure that data and digital is an accelerant and has a value lever for the company's mission and company strategy. So that's why thinking about data as a resource, as a key resource for our scientific researchers or a key resource for our manufacturing team or a key resource for our sales and marketing, allows us to think about the actions and the strategies and tactics we need to deploy to make that effective. >> Yeah, that makes a lot of sense, you're constantly using that North star as your guideline and how data contributes to that mission. Krishna Cheriath, thanks so much for coming on the Cube and supporting the MIT Chief Data Officer community, it was a really pleasure having you. >> Thank you so much for Dave, hopefully you and the audience is safe and healthy during these times. >> Thank you for that and thank you for watching everybody. This is Vellante for the Cube's coverage of the MIT CDOIQ Conference 2020 gone virtual. Keep it right there, we'll right back right after this short break. (lively upbeat music)

Published Date : Sep 3 2020

SUMMARY :

leaders all around the world, coverage of the MIT CDOIQ. I'm looking forward to it. so that the important medicines I drive by it all the time, and digital infrastructure of the company of reporting into the CIO? So that's the construct that we have and accelerating the time to insights. and the data backbone, and allows you to sort of and enable the business to in areas like the cloud data warehouse and how is the data is to the lines of business? in the business community that I could talk to you forever on, that the CIO needs to be accountable for. about data is the new oil. that can power the mission of the company, and supporting the MIT Chief and healthy during these times. of the MIT CDOIQ Conference

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bristol-Myers Squibb	ORGANIZATION	0.99+
New Jersey	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Devon	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Rapid Data Lab	ORGANIZATION	0.99+
2013	DATE	0.99+
Krishna Cheriath	PERSON	0.99+
two sides	QUANTITY	0.99+
two	QUANTITY	0.99+
COVID-19	OTHER	0.99+
Celgene	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
Krishna	PERSON	0.99+
Heritage Bristol-Myers Squibb	ORGANIZATION	0.99+
2018	DATE	0.99+
both sides	QUANTITY	0.99+
Both	QUANTITY	0.98+
Boston	LOCATION	0.98+
2016	DATE	0.98+
CDO	TITLE	0.98+
two modes	QUANTITY	0.98+
COVID	OTHER	0.98+
first	QUANTITY	0.98+
Bristol-Myers Squibb	ORGANIZATION	0.98+
last November	DATE	0.98+
Data Protection Office	ORGANIZATION	0.98+
One	QUANTITY	0.98+
two part	QUANTITY	0.98+
Secondly	QUANTITY	0.98+
second	QUANTITY	0.98+
MIT	ORGANIZATION	0.98+
both	QUANTITY	0.98+
MIT CDOIQ Conference 2020	EVENT	0.97+
Heritage Celgene	ORGANIZATION	0.97+
one	QUANTITY	0.97+
COVID-19 times	OTHER	0.96+
today	DATE	0.96+
BMS	ORGANIZATION	0.96+
single provider	QUANTITY	0.95+
single stack	QUANTITY	0.93+
Bristol Myers Squibb	PERSON	0.93+
one shot	QUANTITY	0.92+
Cube Studios	ORGANIZATION	0.9+
one perspective	QUANTITY	0.9+
Bristol-Myers	ORGANIZATION	0.9+
Business Insights	ORGANIZATION	0.89+
two speed	QUANTITY	0.89+
twofold	QUANTITY	0.84+
secondary	QUANTITY	0.8+
Secondarily	QUANTITY	0.77+
MIT CDOIQ	ORGANIZATION	0.76+
Massachusetts	LOCATION	0.75+
MITCDOIQ 2020	EVENT	0.74+
Vellante	PERSON	0.72+
Data	PERSON	0.71+
Chief Data Officer	PERSON	0.61+

Tom Davenport, Babson College - #MITCDOIQ - #theCUBE

in Cambridge Massachusetts it's the cube covering the MIT chief data officer and information quality symposium now here are your hosts Stu miniman and George Gilbert you're watching the cube SiliconANGLE media's flagship program we go out to lots of technology shows and symposiums like this one here help extract the signal from the noise I'm Stu miniman droid joined by George Gilbert from the Wikibon research team and really thrilled to have on the program the keynote speaker from this MIT event Tom Davenport whose pressure at babson author of some books including a new one that just came out and thank you so much for joining us my pleasure great to be here all right so uh you know so many things your morning keynote that I know George and I want to dig into I guess I'll start with you talk about the you know for eras of you called it data today used to be formation from the information sorry but you said you started with when it was three eras of analytics and now you've came to information so I'm just curious we you know we get caught up sometimes on semantics but is there a reason why you switch from you know analytics to information now well I'm not sure it's a permanent switch I just did it for this occasion but you know I I think that it's important for even people who aren't who don't have as their job doing something with analytics to realize that analytics or how we turn data into information so kind of on a whim I change it from four errors of analytics 24 hours of information to kind of broaden it out in a sense and make people realize that the whole world is changing it's not just about analytics ya know I it resonated with me because you know in the tech industry so much we get caught up on the latest tool George will be talking about how Hadoop is moving to spark and you know right if we step back and look from a longitudinal view you know data is something's been around for a long time but as as you said from Peter Drucker's quote when we endow that with relevance and purpose you know that that's when we get information so yeah and that's why I got interested in analytics a year ago or so it was because we weren't thinking enough about how we endowed data with relevance and purpose turning it into knowledge and knowledge management was one of those ways and I did that for a long time but the people who were doing stuff with analytics weren't really thinking about any of the human mechanisms for adding value to to data so that moved me in analytics direction okay so so Tommy you've been at this event before you know you you've taught in written and you know written books about this about this whole space so willing I'm old no no its you got a great perspective okay so bring us what's exciting you these days what are some of our big challenges and big opportunities that we're facing as kind of kind of humanity and in an industry yeah well I think for me the most exciting thing is they're all these areas where there's just too much data and too much analysis for humans to to do it anymore you know when I first started working with analytics the idea was some human analysts would have a hypothesis about how to do that about what's going on in the data and you'd gather some data and test that hypothesis and so on it could take weeks if not months and now you know we need me to make decisions in milliseconds on way too much data for a human to absorb even in areas like health care we have 400 different types of cancer hundreds of genes that might be related to cancer hundreds of drugs to administer you know we have these decisions have to be made by technology now and so very interesting to think about what's the remaining human role how do we make sure those decisions are good how do we review them and understand them all sorts of fascinating new issues I think along those lines come you know in at a primitive level in the Big Data realm the tools are kind of still emerging and we want to keep track of every time someone's touched it or transformed it but when you talk about something as serious as cancer and let's say we're modeling how we decide to or how we get to a diagnosis do we need a similar mechanism so that it's not either/or either the doctor or you know some sort of machine machine learning model or cognitive model some waited for the model to say here's how I arrived at that conclusion and then for the doctor to say you know to the patient here's my thinking along those lines yeah I mean I think one can like or just like Watson it was being used for a lot of these I mean Watson's being used for a lot of these oncology oriented projects and the good thing about Watson in that context is it does kind of presume a human asking a question in the first place and then a human deciding whether to take the answer the answers in most cases still have confidence intervals you know confidence levels associated with them so and in health care it's great that we have this electronic medical record where the physicians decision of their clinicians decision about how to treat that patient is recorded in a lot of other areas of business we don't really have that kind of system of record to say you know what what decision did we make and why do we make it and so on so in a way I think health care despite being very backward in a lot of areas is kind of better off than then a lot of areas of business the other thing I often say about healthcare is if they're treating you badly and you die at least there will be a meeting about it in a healthcare institution in business you know we screw up a decision we push it under the rug nobody ever nobody ever considered it what about 30 years ago I think it was with Porter's second book you know and the concept of the value chain and sort of remaking the the understanding of strategy and you're talking about the you know the AP AP I economy and and the data flows within that can you help tie your concept you know the data flows the data value chain and the api's that connect them with the porters value chain across companies well it's an interesting idea I think you know companies are just starting to realize that we are in this API economy you don't have to do it all yourself the smart ones have without kind of modeling it in any systematic way like the porter value chain have said you know we we need to have other people linking to our information through api's google is fairly smart i think in saying will even allow that for free for a while and if it looks like there's money to be made in what start charging for access to those api so you know building the access and then thinking about the the revenue from it is one of the new principles of this approach but i haven't seen its i think would be a great idea for paper to say how do we translate the sort of value chain ideas a michael porter which were i don't know 30 years ago into something for the api oriented world that we live in today which you think would you think that might be appropriate for the sort of platform economics model of thinking that's emerging that's an interesting question i mean the platform people are quite interested in inner organizational connections i don't hear them as talking as much about you know the new rules of the api economy it's more about how to two sided and multi-sided platforms work and so on Michael Porter was a sort of industrial economist a lot of those platform people are economists so from that sense it's the same kind of overall thinking but lots of opportunity there to exploit I think so tell me what want to bring it back to kind of the chief data officer when one of the main themes of the symposium here I really like you talked about kind of there needs to be a balance of offense and defense because so much at least in the last couple of years we've been covering this you know governance and seems to be kind of a central piece of it but it's such an exciting subject it's exciting subject but you know you you put that purely in defense on and you know we get excited the companies that are you know building new products you know either you know saving or making more money with with data Kenny can you talk a little bit about kind of as you see how this chief data officer needs to be how that fits into your kind of four arrows yeah yeah well I don't know if I mentioned it in my talk but I went back and confirmed my suspicions that the sama Phi odd was the world's first chief data officer at Yahoo and I looked at what Osama did at Yahoo and it was very much data product and offense or unity established yahoo research labs you know not everything worked out well at Yahoo in retrospect but I think they were going in the direction of what interesting data products can can we create and so I think we saw a lot of kind of what I call to point o companies in the in the big data area in Silicon Valley sing it's not just about internal decisions from data it's what can we provide to customers in terms of data not just access but things that really provide value that means data plus analytics so you know linkedin they attribute about half of their membership to the people you may know data product and everybody else as a people you may know now well we these companies haven't been that systematic about how you build them and how do you know which one to actually take the market and so on but I think now more and more companies even big industrial companies are realizing that this is a distinct possibility and we oughta we ought to look externally with our data for opportunities as much as supporting internal and I guess for you talk to you know companies like Yahoo some of the big web companies the whole you know Big Data meme has been about allowing you know tools and processes to get to a broader you know piece of the economy you know the counterbalance that a little bit you know large public clouds and services you know how much can you know a broad spectrum of companies out there you know get the skill set and really take advantage of these tools versus you know or is it going to be something that I'm going to still going to need to go to some outside chores for some of this well you know I think it's all being democratized fairly rapidly and I read yesterday the first time the quote nobody ever got fired for choosing amazon web services that's a lot cheaper than the previous company in that role which was IBM where you had to build up all these internal capabilities so I think the human side is being democratized they're over a hundred company over 100 universities in the US alone that have analytics oriented degree programs so i think there's plenty of opportunity for existing companies to do this it's just a matter of awareness on the part of the management team I think that's what's lacking in most cases they're not watching your shows i guess and i along the lines of the you know going back 30 years we had a preference actually a precedent where the pc software sort of just exploded onto the scene and it was i want control over my information not just spreadsheets you know creating my documents but then at the same time aighty did not have those guardrails to you know help help people from falling off you know their bikes and getting injured what are the what tools and technologies do we have for both audiences today so that we don't repeat that mistake ya know it's a very interesting question and I think you know spreadsheets were great you know the ultimate democratization tool but depending on which study you believe 22 eighty percent of them had errors in them and there was some pretty bad decisions that were made sometimes with them so we now have the tools so that we could tell people you know that spreadsheet is not going to calculate the right value or you should not be using a pie chart for that visual display I think vendors need to start building in those guardrails as you put it to say here's how you use this product effectively in addition to just accomplishing your basic task but you wouldn't see those guardrails extending all the way back because of data that's being provisioned for the users well I think ultimately if we got to the point of having better control over our data to saying you should not be using that data element it's not you know the right one for representing you know customer address or something along those lines we're not there yet and the vast majority of companies I've seen a few that have kind of experimented with data watermarks or something to say yes this is the one that you're allowed to to use has been certified as the right one for that purpose but we need to do a lot more in that regard yeah all right so Tommy you've got a new book that came out earlier this year only humans need apply winners and losers in the age of smart machines so ask you the same question we asked eric donaldson and Auntie McAfee when they wrote the second Machine Age you know are we all out of job soon well I think big day and I have become a little more optimistic as we look in some depth at at the data I mean one there are a lot of jobs evolving working with these technologies and you know it's just somebody was telling me the other day that is that I was doing a radio interview from my book and the guy was hung who said you know I've made a big transition into podcasting he said but the vast majority of people in radio have not been able to make that transition so if you're willing to kind of go with the flow learn about new technologies how they work I think there are plenty of opportunities the other thing to think about is that these transitions tend to be rather slow I mean we had about in the United States in 1980 about half a million bank tellers since then we've had ATMs online banking etc give so many bank tellers we have in 2016 about half a million it's rather shocking i think i don't know exactly what they're all doing but we're pretty slow in making these transitions so i think those of us sitting here today or even watching her probably okay we'll see some job loss on the margins but anybody who's willing to keep up with new technologies and add value to the smart machines that come into the workplace i think is likely to be okay okay do you have any advice for people that either are looking at becoming you know chief data officers well yeah as I as you said balanced offense and defense defense is a very tricky area to inhabit as a CDO because you if you succeed and you prevent you know breaches and privacy problems and security issues and so on nobody gives you necessarily any credit for it or even knows that it's helps of your work that you were successful and if you fail it's obviously very visible and bad for your career too so I think you need to supplement defense with offense activities are analytics adding valued information digitization data products etc and then I think it's very important that you make nice with all the other data oriented c-level executives you know you may not want to report to the CIO or if you have a cheap analytics officer or chief information security officer chief digitization officer chief digital officer you gotta present a united front to your organization and figure out what's the division of labor who's going to do what in too many of these organizations some of these people aren't even talking to each other and it's crazy really and very confusing to the to the rest of the organization about who's doing what yeah do you see the CDO role but you know five years from now being a standalone you know peace in the organization and you know any guidance on where that should sit is structurally compared to say the CIO yeah I don't you know I I've said that ideally you'd have a CIO or somebody who all of these things reported to who could kind of represent all these different interests of the rest of the organization that doesn't mean that a CDO shouldn't engage with the rest of the business I think CIO should be very engaged with the rest of the business but i think this uncontrolled proliferation has not been a good thing it does mean that information and data are really important to organization so we need multiple people to address it but they need to be coordinated somehow in a smart CEO would say you guys get your act together and figure out sort of who does what tell me a structure I think multiple different things can work you can have it inside of IT outside of IT but you can at least be collaborating okay last question I've got is you talked about these errors and you know that they're not you know not one dies in the next one comes and you talked about you know we know how slow you know people especially are to change so what happened to the company that are still sitting in the 10 or 20 era as we see more 30 and 40 companies come yeah well it's not a good place to be in general and I think what we've seen is this in many industries the sophisticated companies with regard to IT are the ones that get more and more market share the the late adopters end up ultimately going out of business I mean you think about in retail who's still around Walmart was the most aggressive company in terms of Technology Walmart is the world's largest company in moving packages around the world FedEx was initially very aggressive with IT UPS said we better get busy and they did it to not too much left of anybody else sending packages around the world so I think in every industry ultimately the ones that embrace these ideas tend to be the ones who who prosper all right well Tom Davenport really appreciate this morning's keynote and sharing with our audience everything that's happening in the space will be back with lots more coverage here from the MIT CDO IQ symposium you're watching the q hi this is christopher

Published Date : Jul 14 2016

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Tom Davenport	PERSON	0.99+
2016	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
yahoo	ORGANIZATION	0.99+
Tom Davenport	PERSON	0.99+
Michael Porter	PERSON	0.99+
George	PERSON	0.99+
IBM	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
amazon	ORGANIZATION	0.99+
United States	LOCATION	0.99+
1980	DATE	0.99+
hundreds of drugs	QUANTITY	0.99+
Tommy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
US	LOCATION	0.99+
second book	QUANTITY	0.99+
Peter Drucker	PERSON	0.99+
MIT	ORGANIZATION	0.99+
Stu miniman	PERSON	0.99+
yesterday	DATE	0.99+
michael porter	PERSON	0.99+
hundreds of genes	QUANTITY	0.99+
UPS	ORGANIZATION	0.99+
Auntie McAfee	PERSON	0.98+
today	DATE	0.98+
30	QUANTITY	0.98+
Babson College	ORGANIZATION	0.98+
a year ago	DATE	0.98+
about half a million	QUANTITY	0.98+
sama Phi	PERSON	0.98+
both audiences	QUANTITY	0.98+
about half a million	QUANTITY	0.98+
eric donaldson	PERSON	0.97+
40 companies	QUANTITY	0.97+
over 100 universities	QUANTITY	0.96+
Cambridge Massachusetts	LOCATION	0.96+
over a hundred	QUANTITY	0.96+
Osama	PERSON	0.96+
30 years ago	DATE	0.96+
22 eighty percent	QUANTITY	0.96+
first time	QUANTITY	0.95+
this morning	DATE	0.93+
earlier this year	DATE	0.92+
two sided	QUANTITY	0.92+
first	QUANTITY	0.92+
Watson	TITLE	0.9+
one	QUANTITY	0.9+
first chief data officer	QUANTITY	0.9+
second	QUANTITY	0.88+
four errors	QUANTITY	0.88+
#MITCDOIQ	ORGANIZATION	0.87+
Kenny	PERSON	0.86+
MIT CDO IQ symposium	EVENT	0.86+
christopher Tom Davenport	PERSON	0.84+
SiliconANGLE	ORGANIZATION	0.84+
about 30 years ago	DATE	0.83+
400 different types of cancer	QUANTITY	0.83+
Wikibon research team	ORGANIZATION	0.82+
last couple of years	DATE	0.82+
google	ORGANIZATION	0.81+
Porter	PERSON	0.81+
one of those ways	QUANTITY	0.8+
20	QUANTITY	0.75+
first place	QUANTITY	0.75+
10	QUANTITY	0.74+
Hadoop	PERSON	0.74+
some books	QUANTITY	0.73+
about half	QUANTITY	0.72+
lot of areas	QUANTITY	0.71+
three eras	QUANTITY	0.7+
five years	QUANTITY	0.7+
Machine	TITLE	0.63+
lot of areas	QUANTITY	0.61+
30 years	QUANTITY	0.59+
lot	QUANTITY	0.59+
milliseconds	QUANTITY	0.55+
linkedin	ORGANIZATION	0.55+
AP	ORGANIZATION	0.52+
shows	QUANTITY	0.51+
aighty	ORGANIZATION	0.5+
Watson	ORGANIZATION	0.5+
babson	TITLE	0.42+

Lisa Ehrlinger, Johannes Kepler University | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Hi, everybody, welcome back to Cambridge, Massachusetts. This is theCUBE, the leader in tech coverage. I'm Dave Vellante with my cohost, Paul Gillin, and we're here covering the MIT Chief Data Officer Information Quality Conference, #MITCDOIQ. Lisa Ehrlinger is here, she's the Senior Researcher at the Johannes Kepler University in Linz, Austria, and the Software Competence Center in Hagenberg. Lisa, thanks for coming in theCUBE, great to see you. >> Thanks for having me, it's great to be here. >> You're welcome. So Friday you're going to lay out the results of the study, and it's a study of Data Quality Tools. Kind of the long tail of tools, some of those ones that may not have made the Gartner Magic Quadrant and maybe other studies, but talk about the study and why it was initiated. >> Okay, so the main motivation for this study was actually a very practical one, because we have many company projects with companies from different domains, like steel industry, financial sector, and also focus on automotive industry at our department at Johannes Kepler University in Linz. We have experience with these companies for more than 20 years, actually, in this department, and what reoccurred was the fact that we spent the majority of time in such big data projects on data quality measurement and improvement tasks. So at some point we thought, okay, what possibilities are there to automate these tasks and what tools are out there on the market to automate these data quality tasks. So this was actually the motivation why we thought, okay, we'll look at those tools. Also, companies ask us, "Do you have any suggestions? "Which tool performs best in this-and-this domain?" And I think this study answers some questions that have not been answered so far in this particular detail, in these details. For example, Gartner Magic Quadrant of Data Quality Tools, it's pretty interesting but it's very high-level and focusing on some global windows, but it does not look on the specific measurement functionalities. >> Yeah, you have to have some certain number of whatever, customers or revenue to get into the Magic Quadrant. So there's a long tail that they don't cover. But talk a little bit more about the methodology, was it sort of you got hands-on or was it more just kind of investigating what the capabilities of the tools were, talking to customers? How did you come to the conclusions? >> We actually approached this from a very scientific side. We conducted a systematic search, which tools are out there on the market, not only industrial tools, but also open-sourced tools were included. And I think this gives a really nice digest of the market from different perspectives, because we also include some tools that have not been investigated by Gartner, for example, like more BTQ, Data Quality, or Apache Griffin, which has really nice monitoring capabilities, but lacks some other features from these comprehensive tools, of course. >> So was the goal of the methodology largely to capture a feature function analysis of being able to compare that in terms of binary, did it have it or not, how robust is it? And try to develop a common taxonomy across all these tools, is that what you did? >> So we came up with a very detailed requirements catalog, which is divided into three fields, like the focuses on data profiling to get a first insight into data quality. The second is data quality management in terms of dimensions, metrics, and rules. And the third part is dedicated to data quality monitoring over time, and for all those three categories, we came up with different case studies on a database, on a test database. And so we conducted, we looked, okay, does this tool, yes, support this feature, no, or partially? And when partially, to which extent? So I think, especially on the partial assessment, we got a lot into detail in our survey, which is available on Archive online already. So the preliminary results are already online. >> How do you find it? Where is it available? >> On Archive. >> Archive? >> Yes. >> What's the URL, sorry. Archive.com, or .org, or-- >> Archive.org, yeah. >> Archive.org. >> But actually there is a ID I have not with me currently, but I can send you afterwards, yeah. >> Yeah, maybe you can post that with the show notes. >> We can post it afterwards. >> I was amazed, you tested 667 tools. Now, I would've expected that there would be 30 or 40. Where are all of these, what do all of these long tail tools do? Are they specialized by industry or by function? >> Oh, sorry, I think we got some confusion here, because we identified 667 tools out there on the market, but we narrowed this down. Because, as you said, it's quite impossible to observe all those tools. >> But the question still stands, what is the difference, what are these very small, niche tools? What do they do? >> So most of them are domain-specific, and I think this really highlights also these very basic early definition about data quality, of like data qualities defined as fitness for use, and we can pretty much see it here that we excluded the majority of these tools just because they assess some specific kind of data, and we just really wanted to find tools that are generally applicable for different kinds of data, for structured data, unstructured data, and so on. And most of these tools, okay, someone came up with, we want to assess the quality of our, I don't know, like geological data or something like that, yeah. >> To what extent did you consider other sort of non-technical factors? Did you do that at all? I mean, was there pricing or complexity of downloading or, you know, is there a free version available? Did you ignore those and just focus on the feature function, or did those play a role? >> So basically the focus was on the feature function, but of course we had to contact the customer support. Especially with the commercial tools, we had to ask them to provide us with some trial licenses, and there we perceived different feedback from those companies, and I think the best comprehensive study here is definitely Gartner Magic Quadrant for Data Quality Tools, because they give a broad assessment here, but what we also highlight in our study are companies that have a very open support and they are very willing to support you. For example, Informatica Data Quality, we perceived a really close interaction with them in terms of support, trial licenses, and also like specific functionality. Also Experian, our contact from Experian from France was really helpful here. And other companies, like IBM, they focus on big vendors, and here, it was not able to assess these tools, for example, yeah. >> Okay, but the other differences of the Magic Quadrant is you guys actually used the tools, played with them, experienced firsthand the customer experience. >> Exactly, yeah. >> Did you talk to customers as well, or, because you were the customer, you had that experience. >> Yes, I were the customer, but I was also happy to attend some data quality event in Vienna, and there I met some other customers who had experience with single tools. Not of course this wide range we observed, but it was interesting to get feedback on single tools and verify our results, and it matched pretty good. >> How large was the team that ran the study? >> Five people. >> Five people, and how long did it take you from start to finish? >> Actually, we performed it for one year, roughly. The assessment. And I think it's a pretty long time, especially when you see how quick the market responds, especially in the open source field. But nevertheless, you need to make some cut, and I think it's a very recent study now, and there is also the idea to publish it now, the preliminary results, and we are happy with that. >> Were there any surprises in the results? >> I think the main results, or one of the surprises was that we think that there is definitely more potential for automation, but not only for automation. I really enjoyed the keynote this morning that we need more automation, but at the same time, we think that there is also the demand for more declaration. We observed some tools that say, yeah, we apply machine learning, and then you look into their documentation and find no information, which algorithm, which parameters, which thresholds. So I think this is definitely, especially if you want to assess the data quality, you really need to know what algorithm and how it's attuned and give the user, which in most case will be a technical person with technical background, like some chief data officer. And he or she really needs to have the possibility to tune these algorithms to get reliable results and to know what's going on and why, which records are selected, for example. >> So now what? You're presenting the results, right? You're obviously here at this conference and other conferences, and so it's been what, a year, right? >> Yes. >> And so what's the next wave? What's next for you? >> The next wave, we're currently working on a project which is called some Knowledge Graph for Data Quality Assessment, which should tackle two problems in ones. The first is to come up with a semantic representation of your data landscape in your company, but not only the data landscape itself in terms of gathering meta data, but also to automatically improve or annotate this data schema with data profiles. And I think what we've seen in the tools, we have a lot of capabilities for data profiling, but this is usually left to the user ad hoc, and here, we store it centrally and allow the user to continuously verify newly incoming data if this adheres to this standard data profile. And I think this is definitely one step into the way into more automation, and also I think it's the most... The best thing here with this approach would be to overcome this very arduous way of coming up with all the single rules within a team, but present the data profile to a group of data, within your data quality project to those peoples involved in the projects, and then they can verify the project and only update it and refine it, but they have some automated basis that is presented to them. >> Oh, great, same team or new team? >> Same team, yeah. >> Oh, great. >> We're continuing with it. >> Well, Lisa, thanks so much for coming to theCUBE and sharing the results of your study. Good luck with your talk on Friday. >> Thank you very much, thank you. >> All right, and thank you for watching. Keep it right there, everybody. We'll be back with our next guest right after this short break. From MIT CDOIQ, you're watching theCUBE. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. and the Software Competence Center in Hagenberg. it's great to be here. Kind of the long tail of tools, Okay, so the main motivation for this study of the tools were, talking to customers? And I think this gives a really nice digest of the market And the third part is dedicated to data quality monitoring What's the URL, sorry. but I can send you afterwards, yeah. Yeah, maybe you can post that I was amazed, you tested 667 tools. Oh, sorry, I think we got some confusion here, and I think this really highlights also these very basic So basically the focus was on the feature function, Okay, but the other differences of the Magic Quadrant Did you talk to customers as well, or, and there I met some other customers and we are happy with that. or one of the surprises was that we think but present the data profile to a group of data, and sharing the results of your study. All right, and thank you for watching.

ENTITIES

Entity	Category	Confidence
Lisa Ehrlinger	PERSON	0.99+
Paul Gillin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Hagenberg	LOCATION	0.99+
Lisa	PERSON	0.99+
Vienna	LOCATION	0.99+
Linz	LOCATION	0.99+
Five people	QUANTITY	0.99+
30	QUANTITY	0.99+
Johannes Kepler University	ORGANIZATION	0.99+
40	QUANTITY	0.99+
Friday	DATE	0.99+
one year	QUANTITY	0.99+
667 tools	QUANTITY	0.99+
France	LOCATION	0.99+
three categories	QUANTITY	0.99+
third part	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Experian	ORGANIZATION	0.99+
second	QUANTITY	0.99+
two problems	QUANTITY	0.99+
more than 20 years	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
single tools	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
first	QUANTITY	0.98+
MIT CDOIQ	ORGANIZATION	0.98+
a year	QUANTITY	0.97+
three fields	QUANTITY	0.97+
Apache Griffin	ORGANIZATION	0.97+
Archive.org	OTHER	0.96+
.org	OTHER	0.96+
one step	QUANTITY	0.96+
Linz, Austria	LOCATION	0.95+
one	QUANTITY	0.94+
single	QUANTITY	0.94+
first insight	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.92+
2019	DATE	0.92+
this morning	DATE	0.91+
BTQ	ORGANIZATION	0.91+
MIT Chief Data Officer and	EVENT	0.9+
Archive.com	OTHER	0.88+
Informatica	ORGANIZATION	0.85+
Software Competence Center	ORGANIZATION	0.84+
Information Quality Symposium 2019	EVENT	0.81+
MIT Chief Data Officer Information Quality Conference	EVENT	0.72+
Data Quality	ORGANIZATION	0.67+
#MITCDOIQ	EVENT	0.65+
Magic Quadrant	COMMERCIAL_ITEM	0.63+
Magic	COMMERCIAL_ITEM	0.45+
next	EVENT	0.44+
wave	EVENT	0.43+
Magic Quadrant	ORGANIZATION	0.43+
wave	DATE	0.41+
Magic	TITLE	0.39+

Mark Ramsey, Ramsey International LLC | MIT CDOIQ 2019

>> From Cambridge, Massachusetts. It's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. We're here at MIT, sweltering Cambridge, Massachusetts. You're watching theCUBE, the leader in live tech coverage, my name is Dave Vellante. I'm here with my co-host, Paul Gillin. Special coverage of the MITCDOIQ. The Chief Data Officer event, this is the 13th year of the event, we started seven years ago covering it, Mark Ramsey is here. He's the Chief Data and Analytics Officer Advisor at Ramsey International, LLC and former Chief Data Officer of GlaxoSmithKline. Big pharma, Mark, thanks for coming onto theCUBE. >> Thanks for having me. >> You're very welcome, fresh off the keynote. Fascinating keynote this evening, or this morning. Lot of interest here, tons of questions. And we have some as well, but let's start with your history in data. I sat down after 10 years, but I could have I could have stretched it to 20. I'll sit down with the young guns. But there was some folks in there with 30 plus year careers. How about you, what does your data journey look like? >> Well, my data journey, of course I was able to stand up for the whole time because I was in the front, but I actually started about 32, a little over 32 years ago and I was involved with building. What I always tell folks is that Data and Analytics has been a long journey, and the name has changed over the years, but we've been really trying to tackle the same problems of using data as a strategic asset. So when I started I was with an insurance and financial services company, building one of the first data warehouse environments in the insurance industry, and that was in the 87, 88 range, and then once I was able to deliver that, I ended up transitioning into being in consulting for IBM and basically spent 18 years with IBM in consulting and services. When I joined, the name had evolved from Data Warehousing to Business Intelligence and then over the years it was Master Data Management, Customer 360. Analytics and Optimization, Big Data. And then in 2013, I joined Samsung Mobile as their first Chief Data Officer. So, moving out of consulting, I really wanted to own the end-to-end delivery of advanced solutions in the Data Analytics space and so that made the transition to Samsung quite interesting, very much into consumer electronics, mobile phones, tablets and things of that nature, and then in 2015 I joined GSK as their first Chief Data Officer to deliver a Data Analytics solution. >> So you have long data history and Paul, Mark took us through. And you're right, Mark-o, it's a lot of the same narrative, same wine, new bottle but the technology's obviously changed. The opportunities are greater today. But you took us through Enterprise Data Warehouse which was ETL and then MAP and then Master Data Management which is kind of this mapping and abstraction layer, then an Enterprise Data Model, top-down. And then that all failed, so we turned to Governance which has been very very difficult and then you came up with another solution that we're going to dig into, but is it the same wine, new bottle from the industry? >> I think it has been over the last 20, 30 years, which is why I kind of did the experiment at the beginning of how long folks have been in the industry. I think that certainly, the technology has advanced, moving to reduction in the amount of schema that's required to move data so you can kind of move away from the map and move type of an approach of a data warehouse but it is tackling the same type of problems and like I said in the session it's a little bit like Einstein's phrase of doing the same thing over and over again and expecting a different answer is certainly the definition of insanity and what I really proposed at the session was let's come at this from a very different perspective. Let's actually use Data Analytics on the data to make it available for these purposes, and I do think I think it's a different wine now and so I think it's just now a matter of if folks can really take off and head that direction. >> What struck me about, you were ticking off some of the issues that have failed like Data Warehouses, I was surprised to hear you say Data Governance really hasn't worked because there's a lot of talk around that right now, but all of those are top-down initiatives, and what you did at GSK was really invert that model and go from the bottom up. What were some of the barriers that you had to face organizationally to get the cooperation of all these people in this different approach? >> Yeah, I think it's still key. It's not a complete bottoms up because then you do end up really just doing data for the sake of data, which is also something that's been tried and does not work. I think it has to be a balance and that's really striking that right balance of really tackling the data at full perspective but also making sure that you have very definitive use cases to deliver value for the organization and then striking the balance of how you do that and I think of the things that becomes a struggle is you're talking about very large breadth and any time you're covering multiple functions within a business it's getting the support of those different business functions and I think part of that is really around executive support and what that means, I did mention it in the session, that executive support to me is really stepping up and saying that the data across the organization is the organization's data. It isn't owned by a particular person or a particular scientist, and I think in a lot of organization, that gatekeeper mentality really does put barriers up to really tackling the full breadth of the data. >> So I had a question around digital initiatives. Everywhere you go, every C-level Executive is trying to get digital right, and a lot of this is top-down, a lot of it is big ideas and it's kind of the North Star. Do you think that that's the wrong approach? That maybe there should be a more tactical line of business alignment with that threaded leader as opposed to this big picture. We're going to change and transform our company, what are your thoughts? >> I think one of the struggles is just I'm not sure that organizations really have a good appreciation of what they mean when they talk about digital transformation. I think there's in most of the industries it is an initiative that's getting a lot of press within the organizations and folks want to go through digital transformation but in some cases that means having a more interactive experience with consumers and it's maybe through sensors or different ways to capture data but if they haven't solved the data problem it just becomes another source of data that we're going to mismanage and so I do think there's a risk that we're going to see the same outcome from digital that we have when folks have tried other approaches to integrate information, and if you don't solve the basic blocking and tackling having data that has higher velocity and more granularity, if you're not able to solve that because you haven't tackled the bigger problem, I'm not sure it's going to have the impact that folks really expect. >> You mentioned that at GSK you collected 15 petabytes of data of which only one petabyte was structured. So you had to make sense of all that unstructured data. What did you learn about that process? About how to unlock value from unstructured data as a result of that? >> Yeah, and I think this is something. I think it's extremely important in the unstructured data to apply advanced analytics against the data to go through a process of making sense of that information and a lot of folks talk about or have talked about historically around text mining of trying to extract an entity out of unstructured data and using that for the value. There's a few steps before you even get to that point, and first of all it's classifying the information to understand which documents do you care about and which documents do you not care about and I always use the story that in this vast amount of documents there's going to be, somebody has probably uploaded the cafeteria menu from 10 years ago. That has no scientific value, whereas a protocol document for a clinical trial has significant value, you don't want to look through manually a billion documents to separate those, so you have to apply the technology even in that first step of classification, and then there's a number of steps that ultimately lead you to understanding the relationship of the knowledge that's in the documents. >> Side question on that, so you had discussed okay, if it's a menu, get rid of it but there's certain restrictions where you got to keep data for decades. It struck me, what about work in process? Especially in the pharmaceutical industry. I mean, post Federal Rules of Civil Procedure was everybody looking for a smoking gun. So, how are organizations dealing with what to keep and what to get rid of? >> Yeah, and I think certainly the thinking has been to remove the excess and it's to your point, how do you draw the line as to what is excess, right, so you don't want to just keep every document because then if an organization is involved in any type of litigation and there's disclosure requirements, you don't want to have to have thousands of documents. At the same time, there are requirements and so it's like a lot of things. It's figuring out how do you abide by the requirements, but that is not an easy thing to do, and it really is another driver, certainly document retention has been a big thing over a number of years but I think people have not applied advanced analytics to the level that they can to really help support that. >> Another Einstein bro-mahd, you know. Keep everything you must but no more. So, you put forth a proposal where you basically had this sort of three approaches, well, combined three approaches. The crawlers to go, the spiders to go out and do the discovery and I presume that's where the classification is done? >> That's really the identification of all of the source information >> Okay, so find out what you got, okay. >> so that's kind of the start. Find out what you have. >> Step two is the data repository. Putting that in, I thought it was when I heard you I said okay it must be a logical data repository, but you said you basically told the CIO we're copying all the data and putting it into essentially one place. >> A physical location, yes. >> Okay, and then so I got another question about that and then use bots in the pipeline to move the data and then you sort of drew the diagram of the back end to all the databases. Unstructured, structured, and then all the fun stuff up front, visualization. >> Which people love to focus on the fun stuff, right? Especially, you can't tell how many articles are on you got to apply deep learning and machine learning and that's where the answers are, we have to have the data and that's the piece that people are missing. >> So, my question there is you had this tactical mindset, it seems like you picked a good workload, the clinical trials and you had at least conceptually a good chance of success. Is that a fair statement? >> Well, the clinical trials was one aspect. Again, we tackled the entire data landscape. So it was all of the data across all of R&D. It wasn't limited to just, that's that top down and bottom up, so the bottom up is tackle everything in the landscape. The top down is what's important to the organization for decision making. >> So, that's actually the entire R&D application portfolio. >> Both internal and external. >> So my follow up question there is so that largely was kind of an inside the four walls of GSK, workload or not necessarily. My question was what about, you hear about these emerging Edge applications, and that's got to be a nightmare for what you described. In other words, putting all the data into one physical place, so it must be like a snake swallowing a basketball. Thoughts on that? >> I think some of it really does depend on you're always going to have these, IOT is another example where it's a large amount of streaming information, and so I'm not proposing that all data in every format in every location needs to be centralized and homogenized, I think you have to add some intelligence on top of that but certainly from an edge perspective or an IOT perspective or sensors. The data that you want to then make decisions around, so you're probably going to have a filter level that will impact those things coming in, then you filter it down to where you're going to really want to make decisions on that and then that comes together with the other-- >> So it's a prioritization exercise, and that presumably can be automated. >> Right, but I think we always have these cases where we can say well what about this case, and you know I guess what I'm saying is I've not seen organizations tackle their own data landscape challenges and really do it in an aggressive way to get value out of the data that's within their four walls. It's always like I mentioned in the keynote. It's always let's do a very small proof of concept, let's take a very narrow chunk. And what ultimately ends up happening is that becomes the only solution they build and then they go to another area and they build another solution and that's why we end up with 15 or 25-- (all talk over each other) >> The conventional wisdom is you start small. >> And fail. >> And you go on from there, you fail and that's now how you get big things done. >> Well that's not how you support analytic algorithms like machine learning and deep learning. You can't feed those just fragmented data of one aspect of your business and expect it to learn intelligent things to then make recommendations, you've got to have a much broader perspective. >> I want to ask you about one statistic you shared. You found 26 thousand relational database schemas for capturing experimental data and you standardized those into one. How? >> Yeah, I mean we took advantage of the Tamr technology that Michael Stonebraker created here at MIT a number of years ago which is really, again, it's applying advanced analytics to the data and using the content of the data and the characteristics of the data to go from dispersed schemas into a unified schema. So if you look across 26 thousand schemas using machine learning, you then can understand what's the consolidated view that gives you one perspective across all of those different schemas, 'cause ultimately when you give people flexibility they love to take advantage of it but it doesn't mean that they're actually doing things in an extremely different way, 'cause ultimately they're capturing the same kind of data. They're just calling things different names and they might be using different formats but in that particular case we use Tamr very heavily, and that again is back to my example of using advanced analytics on the data to make it available to do the fun stuff. The visualization and the advanced analytics. >> So Mark, the last question is you well know that the CDO role emerged in these highly regulated industries and I guess in the case of pharma quasi-regulated industries but now it seems to be permeating all industries. We have Goka-lan from McDonald's and virtually every industry is at least thinking about this role or has some kind of de facto CDO, so if you were slotted in to a CDO role, let's make it generic. I know it depends on the industry but where do you start as a CDO for an organization large company that doesn't have a CDO. Even a mid-sized organization, where do you start? >> Yeah, I mean my approach is that a true CDO is maximizing the strategic value of data within the organization. It isn't a regulatory requirement. I know a lot of the banks started there 'cause they needed someone to be responsible for data quality and data privacy but for me the most critical thing is understanding the strategic objectives of the organization and how will data be used differently in the future to drive decisions and actions and the effectiveness of the business. In some cases, there was a lot of discussion around monetizing the value of data. People immediately took that to can we sell our data and make money as a different revenue stream, I'm not a proponent of that. It's internally monetizing your data. How do you triple the size of the business by using data as a strategic advantage and how do you change the executives so what is good enough today is not good enough tomorrow because they are really focused on using data as their decision making tool, and that to me is the difference that a CDO needs to make is really using data to drive those strategic decision points. >> And that nuance you mentioned I think is really important. Inderpal Bhandari, who is the Chief Data Officer of IBM often says how can you monetize the data and you're right, I don't think he means selling data, it's how does data contribute, if I could rephrase what you said, contribute to the value of the organization, that can be cutting costs, that can be driving new revenue streams, that could be saving lives if you're a hospital, improving productivity. >> Yeah, and I think what I've shared typically shared with executives when I've been in the CDO role is that they need to change their behavior, right? If a CDO comes in to an organization and a year later, the executives are still making decisions on the same data PowerPoints with spinning logos and they said ooh, we've got to have 'em. If they're still making decisions that way then the CDO has not been successful. The executives have to change what their level of expectation is in order to make a decision. >> Change agents, top down, bottom up, last question. >> Going back to GSK, now that they've completed this massive data consolidation project how are things different for that business? >> Yeah, I mean you look how Barron joined as the President of R&D about a year and a half ago and his primary focus is using data and analytics and machine learning to drive the decision making in the discovery of a new medicine and the environment that has been created is a key component to that strategic initiative and so they are actually completely changing the way they're selecting new targets for new medicines based on data and analytics. >> Mark, thanks so much for coming on theCUBE. >> Thanks for having me. >> Great keynote this morning, you're welcome. All right, keep it right there everybody. We'll be back with our next guest. This is theCUBE, Dave Vellante with Paul Gillin. Be right back from MIT. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Special coverage of the MITCDOIQ. I could have stretched it to 20. and so that made the transition to Samsung and then you came up with another solution on the data to make it available some of the issues that have failed striking the balance of how you do that and it's kind of the North Star. the bigger problem, I'm not sure it's going to You mentioned that at GSK you against the data to go through a process of Especially in the pharmaceutical industry. as to what is excess, right, so you and do the discovery and I presume Okay, so find out what you so that's kind of the start. all the data and putting it into essentially one place. and then you sort of drew the diagram of and that's the piece that people are missing. So, my question there is you had this Well, the clinical trials was one aspect. My question was what about, you hear about these and homogenized, I think you have to exercise, and that presumably can be automated. and then they go to another area and that's now how you get big things done. Well that's not how you support analytic and you standardized those into one. on the data to make it available to do the fun stuff. and I guess in the case of pharma the difference that a CDO needs to make is of the organization, that can be Yeah, and I think what I've shared and the environment that has been created This is theCUBE, Dave Vellante with Paul Gillin.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Mark	PERSON	0.99+
Mark Ramsey	PERSON	0.99+
15 petabytes	QUANTITY	0.99+
Samsung	ORGANIZATION	0.99+
Inderpal Bhandari	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Barron	PERSON	0.99+
Ramsey International, LLC	ORGANIZATION	0.99+
26 thousand schemas	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
18 years	QUANTITY	0.99+
2015	DATE	0.99+
thousands	QUANTITY	0.99+
Einstein	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
tomorrow	DATE	0.99+
Samsung Mobile	ORGANIZATION	0.99+
26 thousand	QUANTITY	0.99+
Ramsey International LLC	ORGANIZATION	0.99+
30 plus year	QUANTITY	0.99+
a year later	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Federal Rules of Civil Procedure	TITLE	0.99+
20	QUANTITY	0.99+
25	QUANTITY	0.99+
Both	QUANTITY	0.99+
first step	QUANTITY	0.99+
one petabyte	QUANTITY	0.98+
today	DATE	0.98+
15	QUANTITY	0.98+
one	QUANTITY	0.98+
three approaches	QUANTITY	0.98+
13th year	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
MIT	ORGANIZATION	0.97+
seven years ago	DATE	0.97+
McDonald's	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.95+
R&D	ORGANIZATION	0.95+
10 years ago	DATE	0.95+
this morning	DATE	0.94+
this evening	DATE	0.93+
one place	QUANTITY	0.93+
one perspective	QUANTITY	0.92+
about a year and a half ago	DATE	0.91+
over 32 years ago	DATE	0.9+
a lot of talk	QUANTITY	0.9+
a billion documents	QUANTITY	0.9+
CDO	TITLE	0.89+
decades	QUANTITY	0.88+
one statistic	QUANTITY	0.87+
2019	DATE	0.85+
first data	QUANTITY	0.84+
of years ago	DATE	0.83+
Step two	QUANTITY	0.8+
Tamr	OTHER	0.77+
Information Quality Symposium 2019	EVENT	0.77+
PowerPoints	TITLE	0.76+
documents	QUANTITY	0.75+
theCUBE	ORGANIZATION	0.75+
one physical	QUANTITY	0.73+
10 years	QUANTITY	0.72+
87, 88 range	QUANTITY	0.71+
President	PERSON	0.7+
Chief Data Officer	PERSON	0.7+
Enterprise Data Warehouse	ORGANIZATION	0.66+
Goka-lan	ORGANIZATION	0.66+
first Chief Data	QUANTITY	0.63+
first Chief Data Officer	QUANTITY	0.63+
Edge	TITLE	0.63+
tons	QUANTITY	0.62+

Keynote Analysis | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's The Cube! Covering MIT Chief Data Officer and Information Qualities Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome to Cambridge, Massachusetts everybody. You're watching The Cube, the leader in live tech coverage. My name is Dave Vellante and I'm here with my cohost Paul Gillin. And we're covering the 13th annual MIT CDOIQ conference. The Cube first started here in 2013 when the whole industry Paul, this segment of the industry was kind of moving out of the ashes of the compliance world and the data quality world and kind of that back office role, and it had this tailwind of the so called big data movement behind it. And the Chief Data Officer was emerging very strongly within as we've talked about many times in theCube, within highly regulated industries like financial services and government and healthcare and now we're seeing data professionals from all industries join this symposium at MIT as I say 13th year, and we're now seeing a lot of discussion about not only the role of the Chief Data Officer, but some of what we heard this morning from Mark Ramsey some of the failures along the way of all these north star data initiatives, and kind of what to do about it. So this conference brings together several hundred practitioners and we're going to be here for two days just unpacking all the discussions the major trends that touch on data. The data revolution, whether it's digital transformation, privacy, security, blockchain and the like. Now Paul, you've been involved in this conference for a number of years, and you've seen it evolve. You've seen that chief data officer role both emerge from the back office into a c-level executive role, and now spanning a very wide scope of responsibilities. Your thoughts? >> It's been like being part of a soap opera for the last eight years that I've been part of this conference because as you said Dave, we've gone through all of these transitions. In the early days this conference actually started as an information qualities symposium. It has evolved to become about chief data officer and really about the data as an asset to the organization. And I thought that the presentation we saw this morning, Mark Ramsey's talk, we're going to have him on later, very interesting about what they did at GlaxoSmithKline to get their arms around all of the data within that organization. Now a project like that would've unthinkable five years ago, but we've seen all of these new technologies come on board, essentially they've created a massive search engine for all of their data. We're seeing organizations beginning to get their arms around this massive problem. And along the way I say it's a soap opera because along the way we've seen failure after failure, we heard from Mark this morning that data governance is a failure too. That was news to me! All of these promising initiatives that have started and fallen flat or failed to live up to their potential, the chief data officer role has emerged out of that to finally try to get beyond these failures and really get their arms around that organizational data and it's a huge project, and it's something that we're beginning to see some organization succeed at. >> So let's talk a little bit about the role. So the chief data officer in many ways has taken a lot of the heat off the chief information officer, right? It used to be CIO stood for career is over. Well, when you throw all the data problems at an individual c-level executive, that really is a huge challenge. And so, with the cloud it's created opportunities for CIOs to actually unburden themselves of some of the crapplications and actually focus on some of the mission critical stuff that they've always been really strong at and focus their budgets there. But the chief data officer has had somewhat of an unclear scope. Different organizations have different roles and responsibilities. And there's overlap with the chief digital officer. There's a lot of emphasis on monetization whether that's increasing revenue or cutting costs. And as we heard today from the keynote speaker Mark Ramsey, a lot of the data initiatives have failed. So what's your take on that role and its viability and its longterm staying power? >> I think it's coming together. I think last year we saw the first evidence of that. I talked to a number of CDOs last year as well as some of the analysts who were at this conference, and there was pretty good clarity beginning to emerge about what they chief data officer role stood for. I think a lot of what has driven this is this digital transformation, the hot buzz word of 2019. The foundation of digital transformation is a data oriented culture. It's structuring the entire organization around data, and when you get to that point when an organization is ready to do that, then the role of the CDO I think becomes crystal clear. It's not so much just an extract transform load discipline. It's not just technology, it's not just governance. It really is getting that data, pulling that data together and putting it at the center of the organization. That's the value that the CDO can provide, I think organizations are coming around to that. >> Yeah and so we've seen over the last 10 years the decrease, the rapid decrease in cost, the cost of storage. Microprocessor performance we've talked about endlessly. And now you've got the machine intelligence piece layering in. In the early days Hadoop was the hot tech, and interesting now nobody talks even about Hadoop. Rarely. >> Yet it was discussed this morning. >> It was mentioned today. It is a fundamental component of infrastructures. >> Yeah. >> But what it did is it dramatically lowered the cost of storing data, and allowing people to leave data in place. The old adage of ship a five megabytes of code to a petabyte of data versus the reverse. Although we did hear today from Mark Ramsey that they copied all the data into a centralized location so I got some questions on that. But the point I want to make is that was really early days. We're now entered an era and it's underscored by if you look at the top five companies in terms of market cap in the US stock market, obviously Microsoft is now over a trillion. Microsoft, Apple, Amazon, Google and Facebook. Top five. They're data companies, their assets are all data driven. They've surpassed the banks, the energy companies, of course any manufacturing automobile companies, et cetera, et cetera. So they're data companies, and they're wrestling with big issues around security. You can't help but open the paper and see issues on security. Yesterday was the big Capital One. The Equifax issue was resolved in terms of the settlement this week, et cetera, et cetera. Facebook struggling mightily with whether or not how to deal fake news, how to deal with deep fakes. Recently it shut down likes for many Instagram accounts in some countries because they're trying to protect young people who are addicted to this. Well, they need to shut down likes for business accounts. So what kids are doing is they're moving over to the business Instagram accounts. Well when that happens, it exposes their emails automatically so they've all kinds of privacy landmines and people don't know how to deal with them. So this data explosion, while there's a lot of energy and excitement around it, brings together a lot of really sticky issues. And that falls right in the lap of the chief data officer, doesn't it? >> We're in uncharted territory and all of the examples you used are problems that we couldn't have foreseen, those companies couldn't have foreseen. A problem may be created but then the person who suffers from that problem changes their behavior and it creates new problems as you point out with kids shifting where they're going to communicate with each other. So these are all uncharted waters and I think it's got to be scary if you're a company that does have large amounts of consumer data in particular, consumer packaged goods companies for example, you're looking at what's happening to these big companies and these data breaches and you know that you're sitting on a lot of customer data yourself, and that's scary. So we may see some backlash to this from companies that were all bought in to the idea of the 360 degree customer view and having these robust data sources about each one of your customers. Turns out now that that's kind of a dangerous place to be. But to your point, these are data companies, the companies that business people look up to now, that they emulate, are companies that have data at their core. And that's not going to change, and that's certainly got to be good for the role of the CDO. >> I've often said that the enterprise data warehouse failed to live up to its expectations and its promises. And Sarbanes-Oxley basically saved EDW because reporting became a critical component post Enron. Mark Ramsey talked today about EDW failing, master data management failing as kind of a mapping and masking exercise. The enterprise data model which was a top down push for a sort of distraction layer, that failed. You had all these failures and so we turned to governance. That failed. And so you've had this series of issues. >> Let me just point out, what do all those have in common? They're all top down. >> Right. >> All top down initiatives. And what Glaxo did is turn that model on its head and left the data where it was. Went and discovered it and figured it out without actually messing with the data. That may be the difference that changes the game. >> Yeah and it's prescription was basically taking a tactical approach to that problem, start small, get quick hits. And then I think they selected a workload that was appropriate for solving this problem which was clinical trials. And I have some questions for him. And of the big things that struck me is the edge. So as you see a new emerging data coming out of the edge, how are organizations going to deal with that? Because I think a lot of what he was talking about was a lot of legacy on-prem systems and data. Think about JEDI, a story we've been following on SiliconANGLE the joint enterprise defense infrastructure. This is all about the DOD basically becoming cloud enabled. So getting data out into the field during wartime fast. We're talking about satellite data, you're talking about telemetry, analytics, AI data. A lot of distributed data at the edge bringing new challenges to how organizations are going to deal with data problems. It's a whole new realm of complexity. >> And you talk about security issues. When you have a lot of data at the edge and you're sending data to the edge, you're bringing it back in from the edge, every device in the middle is from the smart thermostat. at the edge all the way up to the cloud is a potential failure point, a potential vulnerability point. These are uncharted waters, right? We haven't had to do this on a large scale. Organizations like the DOD are going to be the ones that are going to be the leaders in figuring this out because they are so aggressive. They have such an aggressive infrastructure and place. >> The other question I had, striking question listening to Mark Ramsey this morning. Again Mark Ramsey was former data God at GSK, GlaxoSmithKline now a consultant. We're going to hear from a number of folks like him and chief data officers. But he basically kind of poopooed, he used the example of build it and they will come. You know the Kevin Costner movie Field of Dreams. Don't go after the field of dreams. So my question is, and I wonder if you can weigh in on this is, everywhere we go we hear about digital transformation. They have these big digital transformation projects, they generally are top down. Every CEO wants to get digital right. Is that the wrong approach? I want to ask Mark Ramsey that. Are they doing field of dreams type stuff? Is it going to be yet another failure of traditional legacy systems to try to compete with cloud native and born in data era companies? >> Well he mentioned this morning that the research is already showing that digital transformation most initiatives are failing. Largely because of cultural reasons not technical reasons, and I think Ramsey underscored that point this morning. It's interesting that he led off by mentioning business process reengineering which you remember was a big fad in the 1990s, companies threw billions of dollars at trying to reinvent themselves and most of them failed. Is digital transformation headed down the same path? I think so. And not because the technology isn't there, it's because creating a culture where you can break down these silos and you can get everyone oriented around a single view of the organizations data. The bigger the organization the less likely that is to happen. So what does that mean for the CDO? Well, chief information officer at one point we said the CIO stood for career is over. I wonder if there'll be a corresponding analogy for the CDOs at some of these big organizations when it becomes obvious that pulling all that data together is just not feasible. It sounds like they've done something remarkable at GSK, maybe we'll learn from that example. But not all organizations have the executive support, which was critical to what they did, or just the organizational will to organize themselves around that central data storm. >> And I also said before I think the CDO is taking a lot of heat off the CIO and again my inference was the GSK use case and workload was actually quite narrow in clinical trials and was well suited to success. So my takeaway in this, if I were CDO what I would be doing is trying to figure out okay how does data contribute to the monetization of my organization? Maybe not directly selling the data, but what data do I have that's valuable and how can I monetize that in terms of either saving money, supply chain, logistics, et cetera, et cetera, or making money? Some kind of new revenue opportunity. And I would super glue myself for the line of business executive and go after a small hit. You're talking about digital transformations being top down and largely failing. Shadow digital transformations is maybe the answer to that. Aligning with a line of business, focusing on a very narrow use case, and building successes up that way using data as the ingredient to drive value. >> And big ideas. I recently wrote about Experian which launched a service last called Boost that enables the consumers to actually impact their own credit scores by giving Experian access to their bank accounts to see that they are at better credit risk than maybe portrayed in the credit store. And something like 600,000 people signed up in the first six months of this service. That's an example I think of using inspiration, creating new ideas about how data can be applied And in the process by the way, Experian gains data that they can use in other context to better understand their consumer customers. >> So digital meets data. Data is not the new oil, data is more valuable than oil because you can use it multiple times. The same data can be put in your car or in your house. >> Wish we could do that with the oil. >> You can't do that with oil. So what does that mean? That means it creates more data, more complexity, more security risks, more privacy risks, more compliance complexity, but yet at the same time more opportunities. So we'll be breaking that down all day, Paul and myself. Two days of coverage here at MIT, hashtag MITCDOIQ. You're watching The Cube, we'll be right back right after this short break. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

and Information Qualities Symposium 2019. and the data quality world and really about the data as an asset to the organization. and actually focus on some of the mission critical stuff and putting it at the center of the organization. In the early days Hadoop was the hot tech, It is a fundamental component of infrastructures. And that falls right in the lap of and all of the examples you used I've often said that the enterprise data warehouse what do all those have in common? and left the data where it was. And of the big things that struck me is the edge. Organizations like the DOD are going to be the ones Is that the wrong approach? the less likely that is to happen. and how can I monetize that in terms of either saving money, that enables the consumers to actually Data is not the new oil, You can't do that with oil.

ENTITIES

Entity	Category	Confidence
Mark Ramsey	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Paul	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Google	ORGANIZATION	0.99+
2013	DATE	0.99+
Ramsey	PERSON	0.99+
Kevin Costner	PERSON	0.99+
Enron	ORGANIZATION	0.99+
last year	DATE	0.99+
DOD	ORGANIZATION	0.99+
Experian	ORGANIZATION	0.99+
2019	DATE	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Dave	PERSON	0.99+
GSK	ORGANIZATION	0.99+
Glaxo	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
five megabytes	QUANTITY	0.99+
360 degree	QUANTITY	0.99+
two days	QUANTITY	0.99+
today	DATE	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Field of Dreams	TITLE	0.99+
billions of dollars	QUANTITY	0.99+
Mark	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
Yesterday	DATE	0.99+
over a trillion	QUANTITY	0.99+
1990s	DATE	0.98+
600,000 people	QUANTITY	0.98+
US	LOCATION	0.98+
this week	DATE	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
first six months	QUANTITY	0.98+
Instagram	ORGANIZATION	0.98+
The Cube	TITLE	0.98+
five years ago	DATE	0.97+
Capital One	ORGANIZATION	0.96+
first evidence	QUANTITY	0.96+
both	QUANTITY	0.96+
first	QUANTITY	0.95+
MIT	ORGANIZATION	0.93+
this morning	DATE	0.91+
Hadoop	TITLE	0.88+
one point	QUANTITY	0.87+
13th year	QUANTITY	0.86+
MIT CDOIQ conference	EVENT	0.84+
MITCDOIQ	TITLE	0.84+
each one	QUANTITY	0.82+
hundred practitioners	QUANTITY	0.82+
EDW	ORGANIZATION	0.81+
last eight years	DATE	0.81+
MIT Chief Data Officer and	EVENT	0.81+
Sarbanes-Oxley	PERSON	0.8+
top five companies	QUANTITY	0.78+
The Cube	ORGANIZATION	0.75+
Top five	QUANTITY	0.74+
single view	QUANTITY	0.7+
last 10 years	DATE	0.69+
Boost	TITLE	0.68+
a petabyte of data	QUANTITY	0.65+
EDW	TITLE	0.64+
SiliconANGLE	ORGANIZATION	0.64+

Dr Prakriteswar Santikary, ERT | MIT CDOIQ 2018

>> Live from the MIT campus in Cambridge, Massachusetts, it's the Cube, covering the 12th Annual MIT Chief Data Officer and Information Quality Symposium. Brought to you by SiliconANGLE Media. >> Welcome back to the Cube's coverage of MITCDOIQ here in Cambridge, Massachusetts. I'm your host, Rebecca Knight, along with my co-host, Peter Burris. We're joined by Dr. Santikary, he is the vice-president and chief data officer at ERT. Thanks so much for coming on the show. >> Thanks for inviting me. >> We're going to call you Santi, that's what you go by. So, start by telling our viewers a little bit about ERT. What you do, and what kind of products you deliver to clients. >> I'll be happy to do that. The ERT is a clinical trial small company and we are a global data and technology company that minimizes risks and uncertainties within clinical trials for our customers. Our customers are top pharma companies, biotechnologic companies, medical device companies and they trust us to run their clinical trials so that they can bring their life-saving drugs to the market on time and every time. So we have a huge responsibility in that regard, because they put their trust in us, so we serve as their custodians of data and the processes, and the therapeutic experience that you bring to the table as well as compliance-related expertise that we have. So not only do we provide data and technology expertise, we also provide science expertise, regulatory expertise, so that's one of the reasons they trust us. And we also have been around since 1977, so it's almost over 50 years, so we have this collective wisdom that we have gathered over the years. And we have really earned trust in this past and because we deal with safety and efficacy of drugs and these are the two big components that help MDA, or any regulatory authority for that matter, to approve the drugs. So we have a huge responsibility in this regard, as well. In terms of product, as I said, we are in the safety and efficacy side of the clinical trial process, and as part of that, we have multiple product lines. We have respiratory product lines, we have cardiac safety product lines, we have imaging. As you know, imaging is becoming more and more so important for every clinical trial and particularly on oncology space for sure. To measure the growth of the tumor and that kind of things. So we have a business that focuses exclusively on the imaging side. And then we have data and analytics side of the house, because we provide real-time information about the trial itself, so that our customers can really measure risks and uncertainties before they become a problem. >> At this symposium, you're going to be giving a talk about clinical trials and the problems of, the missteps that can happen when the data is not accurate. Lay out the problem for our viewers, and then we're going to talk about the best practices that have emerged. >> I think that clinical trial space is very complex by its own nature, and the process itself is very lengthy. If you know one of the statistics, for example, it takes about 10 to 15 years to really develop and commercialize a drug. And it usually costs about $2.5 to 3 billion. Per drug. So think about the enormity of this. So the challenges are too many. One is data collection itself. Your clinical trials are becoming more and more complex. Becoming more and more global. Getting patients to the sites is another problem. Patient selection and retention, another one. Regulatory guidelines is another big issue because not every regulated authority follows the same sets of rules and regulations. And cost. Cost is a big imperative to the whole thing, because the development life-cycle of a drug is so lengthy. And as I said, it takes about $3 billion to commercialize a drug and that cost comes down to the consumers. That means patients. So the cost of the health care is growing, is sky-rocketing. And in terms of data collection, there are lots of devices in the field, as you know. Wearables, mobile helds, so the data volume is a tremendous problem. And the vendors. Each pharmaceutical companies use so many vendors to run their trials. CRO's. The Clinical Research Organizations. They have EDC systems, they can have labs. You name it. So they outsource all these to different vendors. Now, how do you coordinate and how do you make them to collaborate? And that's where the data plays a big role because now the data is everywhere across different systems, and those systems don't talk to each other. So how do you really make real-time decisioning when you don't know where your data is? And data is the primary ingredient that you use to make decisions? So that's where data and analytics, and bringing that data in real-time, is a very, very critical service that we provide to our customers. >> When you look at medicine, obviously, the whole notion of evidence-based medicine has been around for 15 years now, and it's becoming a seminal feature of how we think about the process of delivering medical services and ultimately paying it forward to everything else, and partly that's because doctors are scientists and they have an affinity for data. But if we think about going forward, it seems to me as though learning more about the genome and genomics is catalyzing additional need and additional understanding of the role that drugs play in the human body and it almost becomes an information problem, where the drug, I don't want to say that a drug is software, but a drug is delivering something that, ultimately, is going to get known at a genomic level. So does that catalyze additional need for data? is that changing the way we think about clinical trials? Especially when we think about, as you said, it's getting more complex because we have to make sure that a drug has the desired effect with men and women, with people from here, people from there. Are we going to push the data envelope even harder over the next few years? >> Oh, you bet. And that's where the real world evidence is playing a big role. So, instead of patients coming to the clinical trials, clinical trial is going to the patient. It is becoming more and more patient-centric. >> Interesting. >> And the early part of protocol design, for example, the study design, that is step one. So more and more the real world evidence data is being used to design the protocol. The very first stage of the clinical trial. Another thing that is pushing the envelope is artificial intelligence and other data mining techniques and now people can be used to really mine that data, the MAR data, prescription data, claims data. Those are real evidence data coming from the real patients. So now you can use these artificial intelligence and mission learning techniques to mine that data then to really design the protocol and the study design instead of flipping through the year MAR data manually. So patient collection, for example, is no patients, no trials, right? So gathering patients, and the right set of patients, is one of the big problems. It takes a lot of that time to bring those patients and even more troublesome is to retain those patients over time. These, too, are big, big things that take a long time and site selection, as well. Which site is going to really be able to bring the right patients for the right trials? >> So, two quick comments on that. One of the things, when you say the patients, when someone has a chronic problem, a chronic disease, when they start to feel better as a consequence of taking the drug, they tend to not take the drug anymore. And that creates this ongoing cycle. But going back to what you're saying, does it also mean that clinical trial processes, because we can gather data more successfully over time, it used to be really segmented. We did the clinical trial and it stopped. Then the drug went into production and maybe we caught some data. But now because we can do a better job with data, the clinical trial concept can be sustained a little bit more. That data becomes even more valuable over time and we can add additional volumes of data back in, to improve the process. >> Is that shortening clinical trials? Tell us a little bit about that. >> Yes, as I said, it takes 10 to 15 years if we follow the current process, like Phase One, Phase Two, Phase Three. And then post-marketing, that is Phase Four. I'm not taking the pre-clinical side of these trials in the the picture. That's about 10 to 15 years, about $3 billion kind of thing. So when you use these kind of AI techniques and the real world evidence data and all this, the projection is that it will reduce the cycle by 60 to 70%. >> Wow. >> The whole study, beginning to end time. >> So from 15 down to four or five? >> Exactly. So think about, there are two advantages. One is obviously, you are creating efficiency within the system, and this drug industry and drug discovery industry is rife for disruption. Because it has been using that same process over and over for a long time. It's like, it is working, so why fix it? But unfortunately, it's not working. Because the health care cost has sky-rocketed. So these inefficiencies are going to get solved when we employ real world evidencing into the mixture. Real-time decision making. Risks analysis before they become risks. Instead of spending one year to recruit patients, you use AI techniques to get to the right patients in minutes, so think about the efficiency again. And also, the home monitoring, or mHealth type of program, where the patients don't need to come to the sites, the clinical sites, for check-up anymore. You can wear wearables that are MDA regulated and approved and then, they're going to do all the work from within the comfort of their home. So think about that. And the other thing is, very, terminally sick patients, for example. They don't have time, nor do they have the energy, to come to the clinical site for check-up. Because every day is important to them. So, this is the paradigm shift that is going on. Instead of patients coming to the clinical trials, clinical trials are coming to the patients. And that shift, that's a paradigm shift and that is happening because of these AI techniques. Blockchain. Precision Medicine is another one. You don't run a big clinical trial anymore. You just go micro-trial, you just group small number of patients. You don't run a trial on breast cancer anymore, you just say, breast cancer for these patients, so it's micro-trials. And that needs -- >> Well that can still be aggregated. >> Exactly. It still needs to be aggregated, but you can get the RTD's quickly, so that you can decide whether you need to keep investing in that trial, or not. Instead of waiting 10 years, only to find out that your trial is going to fail. So you are wasting not only your time, but also preventing patients from getting the right medicine on time. So you have that responsibility as a pharmaceutical company, as well. So yes, it is a paradigm shift and this whole industry is rife for disruption and ERT is right at the center. We have not only data and technology experience, but as I said, we have deep domain experience within the clinical domain as well as regulatory and compliance experience. You need all these to navigate through this turbulent water of clinical research. >> Revolutionary changes taking place. >> It is and the satisfaction is, you are really helping the patients. You know? >> And helping the doctor. >> Helping the doctors. >> At the end of the day, the drug company does not supply the drug. >> Exactly. >> The doctor is prescribing, based on knowledge that she has about that patient and that drug and how they're going to work together. >> And out of the good statistics, in 2017, just last year, 60% of the MDA approved drugs were supported through our platform. 60 percent. So there were, I think, 60 drugs got approved? I think 30 or 35 of them used our platform to run their clinical trial, so think about the satisfaction that we have. >> A job well done. >> Exactly. >> Well, thank you for coming on the show Santi, it's been really great having you on. >> Thank you very much. >> Yes. >> Thank you. >> I'm Rebecca Knight. For Peter Burris, we will have more from MITCDOIQ, and the Cube's coverage of it. just after this. (techno music)

Published Date : Aug 15 2018

SUMMARY :

Brought to you by SiliconANGLE Media. Thanks so much for coming on the show. We're going to call you Santi, that's what you go by. and the therapeutic experience that you bring to the table the missteps that can happen And data is the primary ingredient that you use is that changing the way we think about clinical trials? patients coming to the clinical trials, So more and more the real world evidence data is being used One of the things, when you say the patients, Is that shortening clinical trials? and the real world evidence data and all this, and then, they're going to do all the work is rife for disruption and ERT is right at the center. It is and the satisfaction is, At the end of the day, and how they're going to work together. And out of the good statistics, Well, thank you for coming on the show Santi, and the Cube's coverage of it.

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
David	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Alan	PERSON	0.99+
Jeff	PERSON	0.99+
Adrian	PERSON	0.99+
Peter Burris	PERSON	0.99+
Paul	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Adrian Swinscoe	PERSON	0.99+
Jeff Brewer	PERSON	0.99+
MAN Energy Solutions	ORGANIZATION	0.99+
2017	DATE	0.99+
Tony	PERSON	0.99+
Shelly	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Volkswagen	ORGANIZATION	0.99+
Tony Fergusson	PERSON	0.99+
Pega	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Paul Greenberg	PERSON	0.99+
James Hutton	PERSON	0.99+
Shelly Kramer	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Rob Walker	PERSON	0.99+
Dylan	PERSON	0.99+
10	QUANTITY	0.99+
June 2019	DATE	0.99+
Corey Quinn	PERSON	0.99+
Don	PERSON	0.99+
Santikary	PERSON	0.99+
Croom	PERSON	0.99+
china	LOCATION	0.99+
Tony Ferguson	PERSON	0.99+
30	QUANTITY	0.99+
60 drugs	QUANTITY	0.99+
roland cleo	PERSON	0.99+
UK	LOCATION	0.99+
Don Schuerman	PERSON	0.99+
cal poly	ORGANIZATION	0.99+
Santi	PERSON	0.99+
1985	DATE	0.99+
Duncan Macdonald	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
millions	QUANTITY	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
one year	QUANTITY	0.99+
10 years	QUANTITY	0.99+
Pegasystems	ORGANIZATION	0.99+
80%	QUANTITY	0.99+

Kickoff | MIT CDOIQ 2018

>> Live from the MIT Campus in Cambridge, Massachusetts, it's theCUBE. Covering the 12th Annual MIT chief data officer and Information Quality Symposium. Brought to you by: SiliconANGLE Media. >> Welcome to theCUBE's coverage of MITCDOIQ here in Cambridge, Massachusetts on the MIT Campus. I'm your host, Rebecca Knight, along with my co-host, Peter Burris. Peter, it's a pleasure to be here with you. Thanks for joining me. >> Absolutely, good to see you again, Rebecca. >> These are my stomping grounds. >> Ha! >> So welcome to Massachusetts. >> It's an absolutely beautiful day in Cambridge. >> It is, it is, indeed. I'm so excited to be hosting this with you, what do you think, this is about chief data officer information quality. We're really going to get inside the heads of these chief data officers, find out what's on their minds, what's keeping them up at night, how are they thinking about data, how are they pricing it, how are they keeping it clean, how are they optimizing it, exploiting it, how are they hiring for it? What do you think is the top issue of the day, in your mind. There's a lot to talk about here, what's number one? >> Well, I think the first thing, Rebecca, is that if you're going to have a chief in front of your name then, at least in my mind, that means the Board has directed you to generate some return on the assets that you've been entrusted with. I think the first thing that the CDO, the chief data officer, has to do is start to do a better job of pricing out the value of data, demonstrating how they're turning it into assets that can be utilized and exploited in a number of different ways to generate returns that are actually superior to some of the other assets in the business because data is getting greater investment these days. So I think the first thing is, how are you turning your data into an asset, because if you're not, why are you achieve anything? >> (laughs) No, that's a very good point. The other thing we were talking about before the cameras were rolling, is the role of the CDO, chief data officer, and the role of the CIO, chief information officer, and how those roles differ. I mean is that something that we're going to get into today? What do you think? >> I think it's something certainly to ask a lot of the chief data officers that are coming on, there's some confusion in the industry about what the relationship should be and how the roles are different. The chief data officer as a concept, has been around for probably 10-12 years, something like that. I mean, the first time I heard it was probably 2007-2008. The CIO role has always been about information, but it ended up being more about the technology, and then the question was, what does a Chief Technology Officer does? Well it was a Chief Technology Officer could have had a different role, but they also seem to be increasing the responsible for the technology. So if you look at a lot organizations that have a CDO, the CIO looks more often to be the individual in charge of the IT assets, the technology officer tends to be in charge of the IT infrastructure, and the CDO tends to be more associated with, again, the role that the data plays, increasingly associated with analytics. But I think, over the next few years, that set of relationships is going to change, and new regimes will be put in place as businesses start to re-institutionalize their work around their data, and what it really means to have data as an asset. >> And the other role we've not mentioned is the CDO, Chief Digital Officer, which is the convergence of those two roles as well. How do you see, you started out by saying this is really about optimizing the data and finding a way to make money from it. >> Or generate a return. >> Generate a return, exactly! Find value in it, exactly. >> One of the things about data, and one of the things about IT, historically, is that it often doesn't generate money directly, but rather indirectly, and that's one of the reasons why it has been difficult to sustain investments in. The costs are almost always direct, so if I invest in an IT project, for example, the costs show up immediately but the benefits come through whatever function I just invested in the application to support. And the same thing exists with data. So if we take a look at the Chief Digital Officer, often that's a job that has been developed largely close or approximate to the COO to better understand how operations are going to change as a consequence of an increasing use of data. So, the Chief Digital Officer is often an individual whose entrusted to think about as we re-institutionalize work around data, what is that going to mean to our operations and our engagement models too? So, I think it's a combination of operations and engagement. So the Chief Digital Officer is often very proximate to the COO thinking about how data is going to change the way organization works, change the way the organization engages, from a strategic standpoint first, but we're starting to see that role move more directly into operations. I don't want to say compete with the COO, but work much more closely with them in an operational level. >> Right, and of course, depending organization to organization. >> It's always different, and to what degree are your assets historically data-oriented, like if you're a media company or if you're a financial services company, those are companies that are very strong lineages of data as an asset. If you're a manufacturing company, and you're building digital twins, like a GE or something along those lines, then you might be a little bit newer to the game, but still you have to catch up because data is going to mush a lot of industries together, and it's going to to be hard to parse some of these industries in five to ten years. >> Well, precisely, one of the things you said was that the CDO, as a role, is really only 11-12 years old. In fact that this conference is in its 12th year, so really it started at the very beginning of the CDO journey itself, and we're now amidst the CDO movement. I mean, what do you think, how is the CDO thinking about his or her role within the larger AI revolution? >> Well, that's a great question, and it's one of the primary reasons why it's picking up pace. We've had a number of different technology introductions over the past 15 - 20 years that have bought us here. The notion of virtualizing machines changed or broke that relationship between applications and hardware. The idea of very high speed, very flexible, very easy to manage data center networking broke the way that we thought about how resources could be bought together. Very importantly, in the last six or seven years, the historical norm for storage was disc, which was more emphasized how do I persist the data that results from a transaction, and now we're moving to flash, and flash-based systems, which is more about how can I deliver data to new types of applications. That combination of things makes it possible to utilize a lot of these AI algorithms and a lot of these approaches to AI, many of which the algorithms have been around for 40-50 years, so we're catalyzing a new era in which we can think about delivering data faster with higher fidelity, with lower administrative costs because we're not copying everything and putting it in a lot of different places. That is making it possible to do these AI things. That's precisely one of the factors that's really driving the need to look at data as an asset because we can do more with it than we ever have before. You know, it's interesting, I have a little bromide, when people ask me what's really going on in the industry, what I like to say is for the first 50 years of the industry, it was known process, unknown technology. We knew we were going to do accounting, we knew we were going to do HR, there was largely given to us legal or regulatory or other types of considerations, but the unknown was, do we put it on a mainframe? Do we put it on a (mumbles) Do we use a database manager? How distributed is this going to be? We're now moving into an era where it's unknown process because we're focused on engagement or the role that data can play in changing operations, but the technology is going to be relatively common. It's going to be Cloud or Cloud-like. So, we don't have quite as, it's not to say that technology questions go away entirely, they don't, but it's not as focused on the technology questions, we can focus more on the outcomes, but we have a hard time challenging those outcomes or deciding what those outcomes are going to be, and that's one of the interesting things here. We're not only using data to deliver the outcomes, we're also using data to choose what outcomes to pursue. So it's an interesting recursive set of activities where the CDO is responsible for helping the business decide what are we going to do and also, how are we going to do it? >> Well, exactly. That's an excellent point, because there are so many, one of the things that we've heard about on the main stage this morning is the difficulty a lot of CDOs get with just buy-in, and really understanding, this is important, and this is not as important or this is what we're going to do, this is what we're saying the data is telling us, and these are the actions we're going to take. How do you change a culture? How do you get people to embrace it? >> Well, this is an adoption challenge, and an adoption challenges are always met by showing returns quickly and sustainably. So one of the first things, which is why I said, one of the first things that a CDO has to do is show the organization how data can be thought of as an asset, because once you do that, now you can start to describe some concrete returns that you are able to help deliver as a consequence of your chief role. So that's probably the first thing. But, I think, one of the other things to do is to start doing things like demonstrating the role that information quality plays within an organization. Now, information quality is almost always measured in terms of the output or the outcomes that it supports, but there are questions of fidelity, there are questions of what data are we going to use, what data are we not going to use? How are we going to get rid of data? There's a lot of questions related to information quality that have process elements to them, and those processes are just now being introduced to the organization, and doing a good job of that, and acculturating people to understanding the role of equality plays, information quality plays, is another part of it. So I think you have to demonstrate that you have conceived and can execute on a regime of value, and at the same time you have to demonstrate that you have particular insight into some of those on-going processes that are capable of sustaining that value. It's a combination of those two things that, I think, the chief data officer's going to have to do to demonstrate that they belong at the table, on-going. >> Well, today we're going to be talking to an array of people, some from MIT who study this stuff >> I hear they're smart people. >> Yeah, maybe. A little bit. We'll see, we'll see. MIT, some people from the US Government, so CDOs from the US Army, the Air Force, we've got people from industry too, we've also got management consultants coming on to talk about some best practices, so it's going to be a great day. We're going to really dig in here. >> Looking forward to it. >> Yes. I'm Rebecca Knight, for Peter Burris, we will have more from MITCDOIQ in just a little bit. (techno music)

Published Date : Jul 18 2018

SUMMARY :

Brought to you by: SiliconANGLE Media. Massachusetts on the MIT Campus. Absolutely, good to beautiful day in Cambridge. issue of the day, in your mind. the chief data officer, has to do rolling, is the role of the CDO, and the CDO tends to be is the CDO, Chief Digital Officer, Generate a return, exactly! and one of the things depending organization to organization. and to what degree of the things you said the need to look at data as an asset one of the things that we've and at the same time so CDOs from the US Army, the Air Force, we will have more from

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Peter Burris	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Peter	PERSON	0.99+
five	QUANTITY	0.99+
Massachusetts	LOCATION	0.99+
Cambridge	LOCATION	0.99+
MIT	ORGANIZATION	0.99+
12th year	QUANTITY	0.99+
US Army	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.99+
MITCDOIQ	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
two roles	QUANTITY	0.99+
one	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
ten years	QUANTITY	0.99+
2007-2008	DATE	0.99+
40-50 years	QUANTITY	0.99+
first 50 years	QUANTITY	0.98+
11-12 years	QUANTITY	0.98+
today	DATE	0.98+
first thing	QUANTITY	0.98+
US Government	ORGANIZATION	0.98+
two things	QUANTITY	0.97+
One	QUANTITY	0.97+
10-12 years	QUANTITY	0.96+
first time	QUANTITY	0.95+
this morning	DATE	0.93+
Information Quality Symposium	EVENT	0.92+
first things	QUANTITY	0.9+
MIT Campus	LOCATION	0.86+
12th Annual MIT chief data officer	EVENT	0.84+
first	QUANTITY	0.79+
CDO	ORGANIZATION	0.78+
seven years	QUANTITY	0.78+
Campus	LOCATION	0.77+
CDO	TITLE	0.65+
MIT CDOIQ 2018	EVENT	0.64+
last six	DATE	0.59+
years	DATE	0.59+
Air Force	ORGANIZATION	0.58+
Kickoff	EVENT	0.57+
things	QUANTITY	0.54+
15	QUANTITY	0.52+
CDO	EVENT	0.51+
20 years	QUANTITY	0.48+
past	DATE	0.44+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for MITCDOIQ: