Opening Session feat. Jon Ramsey, AWS | AWS Startup Showcase S2 E4 | Cybersecurity
>>Hello, everyone. Welcome to the AWS startup showcase. This is season two, episode four, the ongoing series covering exciting startups from the AWS ecosystem to talk about cybersecurity. I'm your host, John furrier. And today I'm excited for this keynote presentation and I'm joined by John Ramsey, vice president of AWS security, John, welcome to the cubes coverage of the startup community within AWS. And thanks for this keynote presentation, >>Happy to be here. >>So, John, what do you guys, what do you do at AWS? Take, take minutes to explain your role, cuz it's very comprehensive. We saw at AWS reinforce event recently in Boston, a broad coverage of topics from Steven Schmid CJ, a variety of the executives. What's your role in particular at AWS? >>If you look at AWS, there are, there is a shared security responsibility model and CJ, the C the CSO for AWS is responsible for securing the AWS portion of the shared security responsibility model. Our customers are responsible for securing their part of the shared security responsible, responsible model. For me, I provide services to those customers to help them secure their part of that model. And those services come in different different categories. The first category is threat detection with guard. We that does real time detection and alerting and detective is then used to investigate those alerts to determine if there is an incident vulnerability management, which is inspector, which looks for third party vulnerabilities and security hub, which looks for configuration vulnerabilities and then Macy, which does sensitive data discovery. So I have those sets of services underneath me to help provide, to help customers secure their part of their shared security responsibility model. >>Okay, well, thanks for the call out there. I want to get that out there because I think it's important to note that, you know, everyone talks inside out, outside in customer focus. 80 of us has always been customer focused. We've been covering you guys for a long time, but you do have to secure the core cloud that you provide and you got great infrastructure tools technology down to the, down to the chip level. So that's cool. You're on the customer side. And right now we're seeing from these startups that are serving them. We had interviewed here at the showcase. There's a huge security transformation going on within the security market. It's the plane at 35,000 feet. That's engines being pulled out and rechange, as they say, this is huge. And, and what, what's it take for your, at customers with the enterprises out there that are trying to be more cyber resilient from threats, but also at the same time, protect what they also got. They can't just do a wholesale change overnight. They gotta be, you know, reactive, but proactive. How does it, what, what do they need to do to be resilient? That's the >>Question? Yeah. So, so I, I think it's important to focus on spending your resources. Everyone has constrained security resources and you have to focus those resources in the areas and the ways that reduce the greatest amount of risk. So risk really can be summed up is assets that I have that are most valuable that have a vulnerability that a threat is going to attack in that world. Then you wanna mitigate the threat or mitigate the vulnerability to protect the asset. If you have an asset that's vulnerable, but a threat isn't going to attack, that's less risky, but that changes over time. The threat and vulnerability windows are continuously evolving as threats, developing trade craft as vulnerabilities are being discovered as new software is being released. So it's a continuous picture and it's an adaptive picture where you have to continuously monitor what's happening. You, if you like use the N framework cybersecurity framework, you identify what you have to protect. >>That's the asset parts. Then you have to protect it. That's putting controls in place so that you don't have an incident. Then you from a threat perspective, then you ha to de detect an incident or, or a breach or a, a compromise. And then you respond and then you remediate and you have to continuously do that cycle to be in a position to, to de to have cyber resiliency. And one of the powers of the cloud is if you're building your applications in a cloud native form, you, your ability to respond can be very surgical, which is very important because then you don't introduce risk when you're responding. And by design, the cloud was, is, is architected to be more resilient. So being able to stay cyber resilient in a cloud native architecture is, is important characteristic. >>Yeah. And I think that's, I mean, it sounds so easy. Just identify what's to be protected. You monitor it. You're protected. You remediate sounds easy, but there's a lot of change going on and you got the cloud scale. And so you got security, you got cloud, you guys's a lot of things going on there. How do you think about security and how does the cloud help customers? Because again, there's two things going on. There's a shared responsibility model. And at the end of the day, the customer's responsible on their side. That's right, right. So that's right. Cloud has some tools. How, how do you think about going about security and, and where cloud helps specifically? >>Yeah, so really it's about there, there's a model called observe, orient, decide an actor, the ULO and it was created by John Boyd. He was a fighter pilot in the Korean war. And he knew that if I could observe what the opponent is doing, orient myself to my goals and their goals, make a decision on what the next best action is, and then act, and then follow that UTI loop, or, or also said a sense sense, making, deciding, and acting. If I can do that faster than the, than the enemy, then I can, I will win every fight. So in the cyber world, being in a position where you are observing and that's where cloud can really help you, because you can interrogate the infrastructure, you can look at what's happening, you can build baselines from it. And then you can look at deviations from, from the norm. It's just one way to observe this orient yourself around. Does this represent something that increases risk? If it does, then what's the next best action that I need to take, make that decision and then act. And that's also where the cloud is really powerful, cuz there's this huge con control plane that lets you lets you enable or disable resources or reconfigure resources. And if you're in, in the, in the situation where you can continuously do that very, very rapidly, you can, you can outpace and out maneuver the adversary. >>Yeah. You know, I remember I interviewed Steven Schmidt in 2014 and at that time everybody was poo pooing. Oh man, the cloud is so unsecure. He made a statement to me and we wrote about this. The cloud is more secure and will be more secure because it can be complicated to the hacker, but also easy for the, for provisioning. So he kind of brought up this, this discussion around how cloud would be more secure turns out he's right. He was right now. People are saying, oh, the cloud's more secure than, than standalone. What's different John now than not even going back to 2014, just go back a few years. Cloud is helpful, is more interrogation. You mentioned, this is important. What's, what's changed in the cloud per se in AWS that enables customers and say third parties who are trying to comply and manage risk as well. So you have this shared back and forth. What's different in the cloud now than just a few years ago that that's helping security. >>Yeah. So if you look at the, the parts of the shared responsibility model, AWS is the further up the stack you go from just infrastructure to platforms, say containers up to serverless the, the, we are taking more of the responsibility of that, of that stack. And in the process, we are investing resources and capabilities. For example, guard duty takes an S audit feed for containers to be able to monitor what's happening from a container perspective. And then in server list, really the majority of what, what needs to be defended is, is part of our responsibility model. So that that's an important shift because in that world, we have a very large team in our world. We have a very large team who knows the infrastructure who knows the threat and who knows how to protect customers all the way up to the, to the, to the boundary. And so that, that's a really important consideration. When you think about how you design your design, your applications is you want the developers to focus on the business logic, the business value and let, but still, also the security of the code that they're writing, but let us take over the rest of it so that you don't have to worry about it. >>Great, good, good insight there. I want to get your thoughts too. On another trend here at the showcase, one of the things that's emerging besides the normal threat landscape and the compliance and whatnot is API protection. I mean APIs, that's what made the cloud great. Right? So, you know, and it's not going away, it's only gonna get better cuz we live in an interconnected digital world. So, you know, APIs are gonna be lingual Franko what they say here. Companies just can't sit back and expect third parties complying with cyber regulations and best practices. So how do security and organizations be proactive? Not just on API, it's just a, a signal in my mind of, of, of more connections. So you got shared responsibility, AWS, your customers and your customers, partners and customers of connection points. So we live in an interconnected world. How do security teams and organizations be proactive on the cyber risk management piece? >>Yeah. So when it comes to APIs, the, the thing you look for is the trust boundaries. Where are the trust boundaries in the system between the user and the, in the machine, the machine and another machine on the network, the API is a trust boundary. And it, it is a place where you need to facilitate some kind of some form of control because what you're, what could happen on the trust boundaries, it could be used to, to attack. Like I trust that someone's gonna give me something that is legitimate, but you don't know that that a actually is true. You should assume that the, the one side of the trust boundary is, is malicious and you have to validate it. And by default, make sure that you know, that what you're getting is actually trustworthy and, and valid. So think of an API is just a trust boundary and that whatever you're gonna receive at that boundary is not gonna be legitimate in that you need to validate, validate the contents of, of whatever you receive. >>You know, I was noticing online, I saw my land who runs S3 a us commenting about 10 years anniversary, 10, 10 year birthday of S3, Amazon simple storage service. A lot of the customers are using all their applications with S3 means it's file repository for their application, workflow ingesting literally thousands and trillions of objects from S3 today. You guys have about, I mean, trillions of objects on S3, this is big part of the application workflow. Data security has come up as a big discussion item. You got S3. I mean, forget about the misconfiguration about S3 buckets. That's kind of been reported on beyond that as application workflows, tap into S3 and data becomes the conversation around securing data. How do you talk to customers about that? Because that's also now part of the scaling of these modern cloud native applications, managing data on Preem cross in flight at rest in motion. What's your view on data security, John? >>Yeah. Data security is also a trust boundary. The thing that's going to access the data there, you have to validate it. The challenge with data security is, is customers don't really know where all their data is or even where their sensitive data is. And that continues to be a large problem. That's why we have services like Macy, which are whose job is to find in S3 the data that you need to protect the most because it's because it's sensitive. Getting the least privilege has always been the, the goal when it comes, when it comes to data security. The problem is, is least privilege is really, really hard to, to achieve because there's so many different common nations of roles and accounts and org orgs. And, and so there, there's also another technology called access analyzer that we have that helps customers figure out like this is this the right, if are my intended authorizations, the authorizations I have, are they the ones that are intended for that user? And you have to continuously review that as a, as a means to make sure that you're getting as close to least privilege as you possibly can. >>Well, one of the, the luxuries of having you here on the cube keynote for this showcase is that you also have the internal view at AWS, but also you have the external view with customers. So I have to ask you, as you talk to customers, obviously there's a lot of trends. We're seeing more managed services in areas where there's skill gaps, but teams are also overloaded too. We're hearing stories about security teams, overwhelmed by the solutions that they have to deploy quickly and scale up quickly cost effectively the need for in instrumentation. Sometimes it's intrusive. Sometimes it agentless sensors, OT. I mean, it's getting crazy at re Mars. We saw a bunch of stuff there. This is a reality, the teams aspect of it. Can you share your experiences and observations on how companies are organizing, how they're thinking about team formation, how they're thinking about all these new things coming at them, new environments, new scale choices. What, what do you seeing on, on the customer side relative to security team? Yeah. And their role and relationship to the cloud and, and the technologies. >>Yeah, yeah. A absolutely it. And we have to remember at the end of the day on one end of the wire is a black hat on the other end of the wire is a white hat. And so you need people and, and people are a critical component of being able to defend in the context of security operations alert. Fatigue is absolutely a problem. The, the alerts, the number of alerts, the volume of alerts is, is overwhelming. And so you have to have a means to effectively triage them and get the ones into investigation that, that you think will be the most, the, the most significant going back to the risk equation, you found, you find those alerts and events that are, are the ones that, that could harm you. The most. You'll also one common theme is threat hunting. And the concept behind threat hunting is, is I don't actually wait for an alert I lean in and I'm proactive instead of reactive. >>So I find the system that I at least want the hacker in. I go to that system and I look for any anomalies. I look for anything that might make me think that there is a, that there is a hacker there or a compromise or some unattended consequence. And the reason you do that is because it reduces your dwell time, time between you get compromised to the time detect something, which is you, which might be, you know, months, because there wasn't an alert trigger. So that that's also a very important aspect for, for AWS and our security services. We have a strategy across all of the security services that we call end to end, or how do we move from APIs? Because they're all API driven and security buyers generally not most do not ha have like a development team, like their security operators and they want a solution. And so we're moving more from APIs to outcomes. So how do we stitch all the services together in a way so that the time, the time that an analyst, the SOC analyst spends or someone doing investigation or someone doing incident response is the, is the most important time, most valuable time. And in the process of stitching this all together and helping our customers with alert, fatigue, we'll be doing things that will use sort of inference and machine learning to help prioritize the greatest risk for our customers. >>That's a great, that's a great call out. And that brings up the point of you get the frontline, so to speak and back office, front office kind of approach here. The threats are out there. There's a lot of leaning in, which is a great point. I think that's a good, good comment and insight there. The question I have for you is that everyone's kind of always talks about that, but there's the, the, I won't say boring, the important compliance aspect of things, you know, this has become huge, right? So there's a lot of blocking and tackling that's needed behind the scenes on the compliance side, as well as prevention, right? So can you take us through in your mind how customers are looking at the best strategies for compliance and security, because there's a lot of work you gotta get done and you gotta lay out everything as you mentioned, but compliance specifically to report is also a big thing for >>This. Yeah. Yeah. Compliance is interesting. I suggest taking a security approach to compliance instead of a compliance approach to security. If you're compliant, you may not be secure, but if you're secure, you'll be compliant. And the, the really interesting thing about compliance also is that as soon as something like a, a, a category of control is required in, in some form of compliance, compliance regime, the effectiveness of that control is reduced because the threats go well, I'm gonna presume that they have this control. I'm gonna presume cuz they're compliant. And so now I'm gonna change my tactic to evade the control. So if you only are ever following compliance, you're gonna miss a whole set of tactics that threats have developed because they presume you're compliant and you have those controls in place. So you wanna make sure you have something that's outside of the outside of the realm of compliance, because that's the thing that will trip them up. That's the thing that they're not expecting that threats not expecting and that that's what we'll be able to detect them. >>Yeah. And it almost becomes one of those things where it's his fault, right? So, you know, finger pointing with compliance, you get complacent. I can see that. Can you give an example? Cause I think that's probably something that people are really gonna want to know more about because it's common sense. But can you give an example of security driving compliance? Is there >>Yeah, sure. So there's there they're used just as an example, like multifactor authentication was used everywhere that for, for banks in high risk transactions, in real high risk transactions. And then that like that was a security approach to compliance. Like we said, that's a, that's a high net worth individual. We're gonna give them a token and that's how they're gonna authenticate. And there was no, no, the F F I C didn't say at the time that there needed to be multifactor authentication. And then after a period of time, when account takeover was, was on the rise, the F F I C the federally financial Institute examiner's council, something like that said, we, you need to do multifactor authentication. Multifactor authentication was now on every account. And then the threat went down to, okay, well, we're gonna do man in the browser attacks after the user authenticates, which now is a new tactic in that tactic for those high net worth individuals that had multifactor didn't exist before became commonplace. Yeah. And so that, that, that's a, that's an example of sort of the full life cycle and the important lesson there is that security controls. They have a diminishing halflife of effectiveness. They, they need to be continuous and adaptive or else the value of them is gonna decrease over time. >>Yeah. And I think that's a great call up because agility and speed is a big factor when he's merging threats. It's not a stable, mature hacker market. They're evolving too. All right. Great stuff. I know your time's very valuable, John. I really appreciate you coming on the queue. A couple more questions for you. We have 10 amazing startups here in the, a AWS ecosystem, all private looking grade performance wise, they're all got the kind of the same vibe of they're kind of on something new. They're doing something new and clever and different than what was, what was kind of done 10 years ago. And this is where the cloud advantage is coming in cloud scale. You mentioned that some of those things, data, so you start to see new things emerge. How, how would you talk to CSOs or CXOs that are watching about how to evaluate startups like these they're, they're, they're somewhat, still small relative to some of the bigger players, but they've got unique solutions and they're doing things a little bit differently. How should some, how should CSOs and Steve evaluate them? How can startups work with the CSOs? What's your advice to both the buyer and the startup to, to bring their product to the market. And what's the best way to do that? >>Yeah. So the first thing is when you talk to a CSO, be respected, be respectful of their time like that. Like, they'll appreciate that. I remember when I was very, when I just just started, I went to talk to one of the CISOs as one of the five major banks and he sat me down and he said, and I tried to tell him what I had. And he was like son. And he went through his book and he had, he had 10 of every, one thing that I had. And I realized that, and I, I was grateful for him giving me an explanation. And I said to him, I said, look, I'm sorry. I wasted your time. I will not do that again. I apologize. I, if I can't bring any value, I won't come back. But if I think I can bring you something of value now that I know what I know, please, will you take the meeting? >>He was like, of course. And so be respectful of their time. They know what the problem is. They know what the threat is. You be, be specific about how you're different right now. There is so much confusion in the market about what you do. Like if you're really have something that's differentiated, be very, very specific about it. And don't be afraid of it, like lean into it and explain the value to that. And that, that, that would, would save a, a lot of time and a lot and make the meeting more valuable for the CSO >>And the CISOs. Are they evaluate these startups? How should they look at them? What are some kind of markers that you would say would be good, kind of things to look for size of the team reviews technology, or is it doesn't matter? It's more of a everyone's environment's different. What >>Would your, yeah. And, you know, for me, I, I always look first to the security value. Cause if there isn't security value, nothing else matters. So there's gotta be some security value. Then I tend to look at the management team, quite frankly, what are, what are the, what are their experiences and what, what do they know that that has led them to do something different that is driving security value. And then after that, for me, I tend to look to, is this someone that I can have a long term relationship with? Is this someone that I can, you know, if I have a problem and I call them, are they gonna, you know, do this? Or are they gonna say, yes, we're in, we're in this together, we'll figure it out. And then finally, if, if for AWS, you know, scale is important. So we like to look at, at scale in terms of, is this a solution that I can, that I can, that I can get to, to the scale that I needed at >>Awesome. Awesome. John Ramsey, vice president of security here on the cubes. Keynote. John, thank you for your time. I really appreciate, I know how busy you are with that for the next minute, or so share a little bit of what you're up to. What's on your plate. What are you thinking about as you go out to the marketplace, talk to customers what's on your agenda. What's your talk track, put a plug in for what you're up to. >>Yeah. So for, for the services I have, we, we are, we are absolutely moving. As I mentioned earlier, from APIs to outcomes, we're moving up the stack to be able to defend both containers, as well as, as serverless we're, we're moving out in terms of we wanna get visibility and signal, not just from what we see in AWS, but from other places to inform how do we defend AWS? And then also across, across the N cybersecurity framework in terms of we're doing a lot of, we, we have amazing detection capability and we have this infrastructure that we could respond, do like micro responses to be able to, to interdict the threat. And so me moving across the N cybersecurity framework from detection to respond. >>All right, thanks for your insight and your time sharing in this keynote. We've got great 10 great, amazing startups. Congratulations for all your success at AWS. You guys doing a great job, shared responsibility that the threats are out there. The landscape is changing. The scale's increasing more data tsunamis coming every day, more integration, more interconnected, it's getting more complex. So you guys are doing a lot of great work there. Thanks for your time. Really appreciate >>It. Thank you, John. >>Okay. This is the AWS startup showcase. Season two, episode four of the ongoing series covering the exciting startups coming out of the, a AWS ecosystem. This episode's about cyber security and I'm your host, John furrier. Thanks for watching.
SUMMARY :
episode four, the ongoing series covering exciting startups from the AWS ecosystem to talk about So, John, what do you guys, what do you do at AWS? If you look at AWS, there are, there is a shared security responsibility We've been covering you guys for a long time, but you do have to secure the core cloud that you provide and you got So it's a continuous picture and it's an adaptive picture where you have to continuously monitor And one of the powers of the cloud is if you're building your applications in a cloud And so you got security, you got cloud, you guys's a lot of things going on there. So in the cyber world, being in a position where you are observing and So you have this shared back AWS is the further up the stack you go from just infrastructure to platforms, So you got shared responsibility, And it, it is a place where you need to facilitate some How do you talk to customers about that? the data there, you have to validate it. security teams, overwhelmed by the solutions that they have to deploy quickly and scale up quickly cost And so you have to have a And the reason you do that is because it reduces your dwell time, time between you get compromised to the And that brings up the point of you get the frontline, so to speak and back office, So you wanna make sure you have something that's outside of the outside of the realm of So, you know, finger pointing with examiner's council, something like that said, we, you need to do multifactor authentication. You mentioned that some of those things, data, so you start to see new things emerge. And I said to him, I said, look, I'm sorry. the market about what you do. And the CISOs. And, you know, for me, I, I always look first to the security value. What are you thinking about as you go out to the marketplace, talk to customers what's on your And so me moving across the N cybersecurity framework from detection So you guys are doing a lot of great work there. the exciting startups coming out of the, a AWS ecosystem.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve | PERSON | 0.99+ |
Jon Ramsey | PERSON | 0.99+ |
John Boyd | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Ramsey | PERSON | 0.99+ |
John | PERSON | 0.99+ |
10 | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
35,000 feet | QUANTITY | 0.99+ |
Steven Schmidt | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
S3 | TITLE | 0.99+ |
80 | QUANTITY | 0.99+ |
first category | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
10 years ago | DATE | 0.98+ |
10 amazing startups | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
John furrier | PERSON | 0.98+ |
Korean war | EVENT | 0.98+ |
trillions of objects | QUANTITY | 0.97+ |
five major banks | QUANTITY | 0.97+ |
one way | QUANTITY | 0.97+ |
10 year | QUANTITY | 0.97+ |
Macy | ORGANIZATION | 0.96+ |
one thing | QUANTITY | 0.94+ |
first thing | QUANTITY | 0.93+ |
first | QUANTITY | 0.92+ |
one side | QUANTITY | 0.91+ |
thousands and trillions of objects | QUANTITY | 0.91+ |
both containers | QUANTITY | 0.9+ |
about 10 years | QUANTITY | 0.86+ |
few years ago | DATE | 0.84+ |
one common theme | QUANTITY | 0.84+ |
Season two | QUANTITY | 0.82+ |
Franko | PERSON | 0.8+ |
Steven Schmid CJ | PERSON | 0.78+ |
episode four | OTHER | 0.76+ |
Startup Showcase S2 E4 | EVENT | 0.76+ |
Preem | TITLE | 0.74+ |
F F I C | ORGANIZATION | 0.71+ |
one end | QUANTITY | 0.7+ |
couple more questions | QUANTITY | 0.7+ |
season | QUANTITY | 0.66+ |
episode | QUANTITY | 0.62+ |
Macy | TITLE | 0.58+ |
F I | OTHER | 0.56+ |
CSO | ORGANIZATION | 0.54+ |
two | OTHER | 0.53+ |
Mark Ramsey, Ramsey International LLC | MIT CDOIQ 2019
>> From Cambridge, Massachusetts. It's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. We're here at MIT, sweltering Cambridge, Massachusetts. You're watching theCUBE, the leader in live tech coverage, my name is Dave Vellante. I'm here with my co-host, Paul Gillin. Special coverage of the MITCDOIQ. The Chief Data Officer event, this is the 13th year of the event, we started seven years ago covering it, Mark Ramsey is here. He's the Chief Data and Analytics Officer Advisor at Ramsey International, LLC and former Chief Data Officer of GlaxoSmithKline. Big pharma, Mark, thanks for coming onto theCUBE. >> Thanks for having me. >> You're very welcome, fresh off the keynote. Fascinating keynote this evening, or this morning. Lot of interest here, tons of questions. And we have some as well, but let's start with your history in data. I sat down after 10 years, but I could have I could have stretched it to 20. I'll sit down with the young guns. But there was some folks in there with 30 plus year careers. How about you, what does your data journey look like? >> Well, my data journey, of course I was able to stand up for the whole time because I was in the front, but I actually started about 32, a little over 32 years ago and I was involved with building. What I always tell folks is that Data and Analytics has been a long journey, and the name has changed over the years, but we've been really trying to tackle the same problems of using data as a strategic asset. So when I started I was with an insurance and financial services company, building one of the first data warehouse environments in the insurance industry, and that was in the 87, 88 range, and then once I was able to deliver that, I ended up transitioning into being in consulting for IBM and basically spent 18 years with IBM in consulting and services. When I joined, the name had evolved from Data Warehousing to Business Intelligence and then over the years it was Master Data Management, Customer 360. Analytics and Optimization, Big Data. And then in 2013, I joined Samsung Mobile as their first Chief Data Officer. So, moving out of consulting, I really wanted to own the end-to-end delivery of advanced solutions in the Data Analytics space and so that made the transition to Samsung quite interesting, very much into consumer electronics, mobile phones, tablets and things of that nature, and then in 2015 I joined GSK as their first Chief Data Officer to deliver a Data Analytics solution. >> So you have long data history and Paul, Mark took us through. And you're right, Mark-o, it's a lot of the same narrative, same wine, new bottle but the technology's obviously changed. The opportunities are greater today. But you took us through Enterprise Data Warehouse which was ETL and then MAP and then Master Data Management which is kind of this mapping and abstraction layer, then an Enterprise Data Model, top-down. And then that all failed, so we turned to Governance which has been very very difficult and then you came up with another solution that we're going to dig into, but is it the same wine, new bottle from the industry? >> I think it has been over the last 20, 30 years, which is why I kind of did the experiment at the beginning of how long folks have been in the industry. I think that certainly, the technology has advanced, moving to reduction in the amount of schema that's required to move data so you can kind of move away from the map and move type of an approach of a data warehouse but it is tackling the same type of problems and like I said in the session it's a little bit like Einstein's phrase of doing the same thing over and over again and expecting a different answer is certainly the definition of insanity and what I really proposed at the session was let's come at this from a very different perspective. Let's actually use Data Analytics on the data to make it available for these purposes, and I do think I think it's a different wine now and so I think it's just now a matter of if folks can really take off and head that direction. >> What struck me about, you were ticking off some of the issues that have failed like Data Warehouses, I was surprised to hear you say Data Governance really hasn't worked because there's a lot of talk around that right now, but all of those are top-down initiatives, and what you did at GSK was really invert that model and go from the bottom up. What were some of the barriers that you had to face organizationally to get the cooperation of all these people in this different approach? >> Yeah, I think it's still key. It's not a complete bottoms up because then you do end up really just doing data for the sake of data, which is also something that's been tried and does not work. I think it has to be a balance and that's really striking that right balance of really tackling the data at full perspective but also making sure that you have very definitive use cases to deliver value for the organization and then striking the balance of how you do that and I think of the things that becomes a struggle is you're talking about very large breadth and any time you're covering multiple functions within a business it's getting the support of those different business functions and I think part of that is really around executive support and what that means, I did mention it in the session, that executive support to me is really stepping up and saying that the data across the organization is the organization's data. It isn't owned by a particular person or a particular scientist, and I think in a lot of organization, that gatekeeper mentality really does put barriers up to really tackling the full breadth of the data. >> So I had a question around digital initiatives. Everywhere you go, every C-level Executive is trying to get digital right, and a lot of this is top-down, a lot of it is big ideas and it's kind of the North Star. Do you think that that's the wrong approach? That maybe there should be a more tactical line of business alignment with that threaded leader as opposed to this big picture. We're going to change and transform our company, what are your thoughts? >> I think one of the struggles is just I'm not sure that organizations really have a good appreciation of what they mean when they talk about digital transformation. I think there's in most of the industries it is an initiative that's getting a lot of press within the organizations and folks want to go through digital transformation but in some cases that means having a more interactive experience with consumers and it's maybe through sensors or different ways to capture data but if they haven't solved the data problem it just becomes another source of data that we're going to mismanage and so I do think there's a risk that we're going to see the same outcome from digital that we have when folks have tried other approaches to integrate information, and if you don't solve the basic blocking and tackling having data that has higher velocity and more granularity, if you're not able to solve that because you haven't tackled the bigger problem, I'm not sure it's going to have the impact that folks really expect. >> You mentioned that at GSK you collected 15 petabytes of data of which only one petabyte was structured. So you had to make sense of all that unstructured data. What did you learn about that process? About how to unlock value from unstructured data as a result of that? >> Yeah, and I think this is something. I think it's extremely important in the unstructured data to apply advanced analytics against the data to go through a process of making sense of that information and a lot of folks talk about or have talked about historically around text mining of trying to extract an entity out of unstructured data and using that for the value. There's a few steps before you even get to that point, and first of all it's classifying the information to understand which documents do you care about and which documents do you not care about and I always use the story that in this vast amount of documents there's going to be, somebody has probably uploaded the cafeteria menu from 10 years ago. That has no scientific value, whereas a protocol document for a clinical trial has significant value, you don't want to look through manually a billion documents to separate those, so you have to apply the technology even in that first step of classification, and then there's a number of steps that ultimately lead you to understanding the relationship of the knowledge that's in the documents. >> Side question on that, so you had discussed okay, if it's a menu, get rid of it but there's certain restrictions where you got to keep data for decades. It struck me, what about work in process? Especially in the pharmaceutical industry. I mean, post Federal Rules of Civil Procedure was everybody looking for a smoking gun. So, how are organizations dealing with what to keep and what to get rid of? >> Yeah, and I think certainly the thinking has been to remove the excess and it's to your point, how do you draw the line as to what is excess, right, so you don't want to just keep every document because then if an organization is involved in any type of litigation and there's disclosure requirements, you don't want to have to have thousands of documents. At the same time, there are requirements and so it's like a lot of things. It's figuring out how do you abide by the requirements, but that is not an easy thing to do, and it really is another driver, certainly document retention has been a big thing over a number of years but I think people have not applied advanced analytics to the level that they can to really help support that. >> Another Einstein bro-mahd, you know. Keep everything you must but no more. So, you put forth a proposal where you basically had this sort of three approaches, well, combined three approaches. The crawlers to go, the spiders to go out and do the discovery and I presume that's where the classification is done? >> That's really the identification of all of the source information >> Okay, so find out what you got, okay. >> so that's kind of the start. Find out what you have. >> Step two is the data repository. Putting that in, I thought it was when I heard you I said okay it must be a logical data repository, but you said you basically told the CIO we're copying all the data and putting it into essentially one place. >> A physical location, yes. >> Okay, and then so I got another question about that and then use bots in the pipeline to move the data and then you sort of drew the diagram of the back end to all the databases. Unstructured, structured, and then all the fun stuff up front, visualization. >> Which people love to focus on the fun stuff, right? Especially, you can't tell how many articles are on you got to apply deep learning and machine learning and that's where the answers are, we have to have the data and that's the piece that people are missing. >> So, my question there is you had this tactical mindset, it seems like you picked a good workload, the clinical trials and you had at least conceptually a good chance of success. Is that a fair statement? >> Well, the clinical trials was one aspect. Again, we tackled the entire data landscape. So it was all of the data across all of R&D. It wasn't limited to just, that's that top down and bottom up, so the bottom up is tackle everything in the landscape. The top down is what's important to the organization for decision making. >> So, that's actually the entire R&D application portfolio. >> Both internal and external. >> So my follow up question there is so that largely was kind of an inside the four walls of GSK, workload or not necessarily. My question was what about, you hear about these emerging Edge applications, and that's got to be a nightmare for what you described. In other words, putting all the data into one physical place, so it must be like a snake swallowing a basketball. Thoughts on that? >> I think some of it really does depend on you're always going to have these, IOT is another example where it's a large amount of streaming information, and so I'm not proposing that all data in every format in every location needs to be centralized and homogenized, I think you have to add some intelligence on top of that but certainly from an edge perspective or an IOT perspective or sensors. The data that you want to then make decisions around, so you're probably going to have a filter level that will impact those things coming in, then you filter it down to where you're going to really want to make decisions on that and then that comes together with the other-- >> So it's a prioritization exercise, and that presumably can be automated. >> Right, but I think we always have these cases where we can say well what about this case, and you know I guess what I'm saying is I've not seen organizations tackle their own data landscape challenges and really do it in an aggressive way to get value out of the data that's within their four walls. It's always like I mentioned in the keynote. It's always let's do a very small proof of concept, let's take a very narrow chunk. And what ultimately ends up happening is that becomes the only solution they build and then they go to another area and they build another solution and that's why we end up with 15 or 25-- (all talk over each other) >> The conventional wisdom is you start small. >> And fail. >> And you go on from there, you fail and that's now how you get big things done. >> Well that's not how you support analytic algorithms like machine learning and deep learning. You can't feed those just fragmented data of one aspect of your business and expect it to learn intelligent things to then make recommendations, you've got to have a much broader perspective. >> I want to ask you about one statistic you shared. You found 26 thousand relational database schemas for capturing experimental data and you standardized those into one. How? >> Yeah, I mean we took advantage of the Tamr technology that Michael Stonebraker created here at MIT a number of years ago which is really, again, it's applying advanced analytics to the data and using the content of the data and the characteristics of the data to go from dispersed schemas into a unified schema. So if you look across 26 thousand schemas using machine learning, you then can understand what's the consolidated view that gives you one perspective across all of those different schemas, 'cause ultimately when you give people flexibility they love to take advantage of it but it doesn't mean that they're actually doing things in an extremely different way, 'cause ultimately they're capturing the same kind of data. They're just calling things different names and they might be using different formats but in that particular case we use Tamr very heavily, and that again is back to my example of using advanced analytics on the data to make it available to do the fun stuff. The visualization and the advanced analytics. >> So Mark, the last question is you well know that the CDO role emerged in these highly regulated industries and I guess in the case of pharma quasi-regulated industries but now it seems to be permeating all industries. We have Goka-lan from McDonald's and virtually every industry is at least thinking about this role or has some kind of de facto CDO, so if you were slotted in to a CDO role, let's make it generic. I know it depends on the industry but where do you start as a CDO for an organization large company that doesn't have a CDO. Even a mid-sized organization, where do you start? >> Yeah, I mean my approach is that a true CDO is maximizing the strategic value of data within the organization. It isn't a regulatory requirement. I know a lot of the banks started there 'cause they needed someone to be responsible for data quality and data privacy but for me the most critical thing is understanding the strategic objectives of the organization and how will data be used differently in the future to drive decisions and actions and the effectiveness of the business. In some cases, there was a lot of discussion around monetizing the value of data. People immediately took that to can we sell our data and make money as a different revenue stream, I'm not a proponent of that. It's internally monetizing your data. How do you triple the size of the business by using data as a strategic advantage and how do you change the executives so what is good enough today is not good enough tomorrow because they are really focused on using data as their decision making tool, and that to me is the difference that a CDO needs to make is really using data to drive those strategic decision points. >> And that nuance you mentioned I think is really important. Inderpal Bhandari, who is the Chief Data Officer of IBM often says how can you monetize the data and you're right, I don't think he means selling data, it's how does data contribute, if I could rephrase what you said, contribute to the value of the organization, that can be cutting costs, that can be driving new revenue streams, that could be saving lives if you're a hospital, improving productivity. >> Yeah, and I think what I've shared typically shared with executives when I've been in the CDO role is that they need to change their behavior, right? If a CDO comes in to an organization and a year later, the executives are still making decisions on the same data PowerPoints with spinning logos and they said ooh, we've got to have 'em. If they're still making decisions that way then the CDO has not been successful. The executives have to change what their level of expectation is in order to make a decision. >> Change agents, top down, bottom up, last question. >> Going back to GSK, now that they've completed this massive data consolidation project how are things different for that business? >> Yeah, I mean you look how Barron joined as the President of R&D about a year and a half ago and his primary focus is using data and analytics and machine learning to drive the decision making in the discovery of a new medicine and the environment that has been created is a key component to that strategic initiative and so they are actually completely changing the way they're selecting new targets for new medicines based on data and analytics. >> Mark, thanks so much for coming on theCUBE. >> Thanks for having me. >> Great keynote this morning, you're welcome. All right, keep it right there everybody. We'll be back with our next guest. This is theCUBE, Dave Vellante with Paul Gillin. Be right back from MIT. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media. Special coverage of the MITCDOIQ. I could have stretched it to 20. and so that made the transition to Samsung and then you came up with another solution on the data to make it available some of the issues that have failed striking the balance of how you do that and it's kind of the North Star. the bigger problem, I'm not sure it's going to You mentioned that at GSK you against the data to go through a process of Especially in the pharmaceutical industry. as to what is excess, right, so you and do the discovery and I presume Okay, so find out what you so that's kind of the start. all the data and putting it into essentially one place. and then you sort of drew the diagram of and that's the piece that people are missing. So, my question there is you had this Well, the clinical trials was one aspect. My question was what about, you hear about these and homogenized, I think you have to exercise, and that presumably can be automated. and then they go to another area and that's now how you get big things done. Well that's not how you support analytic and you standardized those into one. on the data to make it available to do the fun stuff. and I guess in the case of pharma the difference that a CDO needs to make is of the organization, that can be Yeah, and I think what I've shared and the environment that has been created This is theCUBE, Dave Vellante with Paul Gillin.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Paul Gillin | PERSON | 0.99+ |
Mark | PERSON | 0.99+ |
Mark Ramsey | PERSON | 0.99+ |
15 petabytes | QUANTITY | 0.99+ |
Samsung | ORGANIZATION | 0.99+ |
Inderpal Bhandari | PERSON | 0.99+ |
Michael Stonebraker | PERSON | 0.99+ |
2013 | DATE | 0.99+ |
Paul | PERSON | 0.99+ |
GlaxoSmithKline | ORGANIZATION | 0.99+ |
Barron | PERSON | 0.99+ |
Ramsey International, LLC | ORGANIZATION | 0.99+ |
26 thousand schemas | QUANTITY | 0.99+ |
GSK | ORGANIZATION | 0.99+ |
18 years | QUANTITY | 0.99+ |
2015 | DATE | 0.99+ |
thousands | QUANTITY | 0.99+ |
Einstein | PERSON | 0.99+ |
Cambridge, Massachusetts | LOCATION | 0.99+ |
tomorrow | DATE | 0.99+ |
Samsung Mobile | ORGANIZATION | 0.99+ |
26 thousand | QUANTITY | 0.99+ |
Ramsey International LLC | ORGANIZATION | 0.99+ |
30 plus year | QUANTITY | 0.99+ |
a year later | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Federal Rules of Civil Procedure | TITLE | 0.99+ |
20 | QUANTITY | 0.99+ |
25 | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
one petabyte | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
15 | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
three approaches | QUANTITY | 0.98+ |
13th year | QUANTITY | 0.98+ |
one aspect | QUANTITY | 0.97+ |
MIT | ORGANIZATION | 0.97+ |
seven years ago | DATE | 0.97+ |
McDonald's | ORGANIZATION | 0.96+ |
MIT Chief Data Officer and | EVENT | 0.95+ |
R&D | ORGANIZATION | 0.95+ |
10 years ago | DATE | 0.95+ |
this morning | DATE | 0.94+ |
this evening | DATE | 0.93+ |
one place | QUANTITY | 0.93+ |
one perspective | QUANTITY | 0.92+ |
about a year and a half ago | DATE | 0.91+ |
over 32 years ago | DATE | 0.9+ |
a lot of talk | QUANTITY | 0.9+ |
a billion documents | QUANTITY | 0.9+ |
CDO | TITLE | 0.89+ |
decades | QUANTITY | 0.88+ |
one statistic | QUANTITY | 0.87+ |
2019 | DATE | 0.85+ |
first data | QUANTITY | 0.84+ |
of years ago | DATE | 0.83+ |
Step two | QUANTITY | 0.8+ |
Tamr | OTHER | 0.77+ |
Information Quality Symposium 2019 | EVENT | 0.77+ |
PowerPoints | TITLE | 0.76+ |
documents | QUANTITY | 0.75+ |
theCUBE | ORGANIZATION | 0.75+ |
one physical | QUANTITY | 0.73+ |
10 years | QUANTITY | 0.72+ |
87, 88 range | QUANTITY | 0.71+ |
President | PERSON | 0.7+ |
Chief Data Officer | PERSON | 0.7+ |
Enterprise Data Warehouse | ORGANIZATION | 0.66+ |
Goka-lan | ORGANIZATION | 0.66+ |
first Chief Data | QUANTITY | 0.63+ |
first Chief Data Officer | QUANTITY | 0.63+ |
Edge | TITLE | 0.63+ |
tons | QUANTITY | 0.62+ |
Dr. Mark Ramsey & Bruno Aziza | BigData NYC 2017
>> Live from Mid Town Manhattan. It's the Cube, covering BIGDATA New York City 2017. Brought to you by, SiliconANGLE Media and it's ecosystems sponsors. >> Hey welcome back everyone live here in New York City for the Cube special presentation of BIGDATA NYC. Here all week with the Cube in conjunction with Strata Data even happening around the corner. I'm John Furrier the host. James Kobielus, our next two guests Doctor Mark Ramsey, chief data officer and senior vice president of R&D at GSK, Glasgow Pharma company. And Bruno as he's the CMO at Fscale, both Cube alumni. Welcome back. >> Thank for having us. >> So Bruno I want to start with you because I think that Doctor Mark has some great use cases I want to dig into and go deep on with Jim. But Fscale, give us the update of the company. You guys doing well, what's happening? How's the, you have the vision of this data layer we talked a couple years ago. It's working so tell us, give us the update. >> A lot of things have happened since we talked last. I think you might have seen some of the news in terms of growth. Ten X growth since we started and mainly driven around the customer use cases. That's why I'm excited to hear from Mark and share his stories with the rest of the audience here. We have a presentation at Strata tomorrow with Vivens. It's a great IOT use case as well. So what we're seeing is the industry is changing in terms of how it's spying the idea platforms. In the past, people would buy idea platforms vertically. They'd buy the visualization, they'd buy the sementic and buy the best of great integration. We're now live in a world where there's a multitude of BI tools. And the data platforms are not standardized either. And so what we're kind of riding as a trend is this idea of the need for the universal semantic layer. This idea that you can have a universal set of semantics. In a dictionary or ontology. that can be shared across all types of business users and business use cases. Or across any data. That's really the trend that's driving our growth. And you'll see it today at this show with the used cases and the customers. And of course some of the announcements that we're doing. We're announcing a new offer with cloud there and tableau. And so we're really excited about again how they in space and the partner ecosystems embracing our solutions. >> And you guys really have a Switzerland kind of strategy. You're going to play neutral, play nicely with everybody. Because you're in a different, your abstraction layer is really more on the data. >> That's right. The whole value proposition is that you don't want to move your data. And you don't want to move your users away from the tools that they already know but you do want them to be able to take advantage of the data that you store. And this concept of virtualized layer and you're universal semantic layer that enables the use case to happen faster. Is a big value proposition to all of them. >> Doctor Mark Ramsey, I want to get your quick thoughts on this. I'm obviously your customer so. I mean you're not bias, you ponder pressure everyday. Competitive noise out there is high in this area and you're a chief data officer. You run R&D so you got that 20 miles stare into the future. You've got experience running data at a wide scale. I mean there's a lot of other potential solutions out there. What made it attractive for you? >> Well it feels a need that we have around really that virtualization. So we can leave the data in the format that it is on the platform. And then allow the users to use like Bruno was mentioning. Use a number of standardized tools to access that information. And it also gives us an ability to learn how folks are consuming the data. So they will use a variety of tools, they'll interact with the data. At scale gives us a great capability to really look under the cover, see how they're using the data. And if we need to physicalize some of that to make easier access in the long term. It gives us that... >> It's really an agility model kind to data. You're kind of agile. >> Yeah its kind of a way to make, you know so if you're using a dash boarding tool it allows you to interact with the data. And then as you see how folks are actually consuming the information. Then you can physicalize it and make that readily available. So it is, it gives you that agile cycles to go through. >> In your use of the solution, what have you seen in terms of usage patterns. What are your users using at scale for? Have you been surprised by how they're using it? And where do you plan to go in terms of the use cases you're addressing going forward with this technology? >> This technology allows us to give the users the ability to query the data. So for example we use standardized ontologies in several of the areas. And standardized ontologies are great because the data is in one format. However that's not necessarily how the business would like to look at the data and so it gives us an ability to make the data appear like the way the users would like to consume the information. And then we understand which parts of the model they're actually flexing and then we can make the decision to physicalize that. Cause again it's a great technology but virtualization there is a cost. Because the machines have to create the illusion of the data being a certain way. If you know it's something that's going to be used day in and day out then you can move it to a physicalized version. >> Is there a specific threshold when you were looking at the metrics of usage. When you know that particular data, particular views need to be physicalized. What is that threshold or what are those criteria? >> I think it's, normally is a combination of the number of connections that you have. So the joins of the data across the number of repositories of data. And that balanced with the volume of data so if you're dealing with thousands of rows verses billions of rows then that can lead you to make that decision faster. There isn't a defined metric that says, well we have this number of rows and this many columns and this size that it really will lead you down that path. But the nice thing is you can experiment and so it does give you that ability to sort of prototype and see, are folks consuming the data before you evoke the energy to make it physical. >> You know federated, I use the word federated but semantic virtualization layers clearly have been around for quite sometime. A lot of solution providers offer them. A lot of customers have used them for disparate use cases. One of the wraps traditionally again estimating virtualization is that it's simply sort of a stop gap between chaos on the one end. You know where you have dozens upon dozens of databases with no unified roll up. That's a stop gap on the way to full centralization or migration to a big data hub. Did you see semantic virtualization as being sort of your target architecture for your operational BI and so forth? Or do you on some level is it simply like I said a stop gap or transitional approach on the way to some more centralized environment? >> I think you're talking about kind of two different scenarios here. One is in federated I would agree, when folks attempted to use that to bring disparate data sources together to make it look like it was consolidated. And they happen to be on different platforms, that was definitely a atop gap on a journey to really addressing the problem. Thing that's a little different here is we're talking about this running on a standardized platform. So it's not platformed disparate it's on the platform the data is being accessed on the platform. It really gives us that flexibility to allow the consumer of the data to have a variety of views of the data without actually physicalizing each of them. So I don' know that it's on a journey cause we're never going to get to where we're going to make the data look as so many different ways. But it's very different than you know ten, 15 years ago. When folks were trying to solve disparate data sources using federation. >> Would it be fair to characterize what you do as agile visualization of the data on a data lake platform? Is that what it's essentially about? >> Yeah that, it certainly enables that. In our particular case we use the data lake as the foundation and then we actually curate the data into standardized ontologies and then really, the consumer access layer is where we're applying virtualization. In the creation of the environment that we have we've integrated about a dozen different technologies. So one of the things we're focused on is trying to create an ecosystem. And at scale is one of the components of that. It gives us flexibility so that we don't have to physicalize. >> Well you'd have to stand up any costs. So you have the flexibility with at scale. I get this right? You get the data and people can play with it without actually provisioning. It's like okay save some cash, but then also you double down on winners that come in. >> Things that are a winner you check the box, you physicalize it. You provide that access. >> You get crowd sourcing benefits like going on in your. >> You know exactly. >> The curation you mentioned. So the curation goes on inside of at scale. Are you using a different tool or something you hand wrote in house to do that? Essentially it's a data governance and data cleansing. >> That is, we use technology called Tamer. That is a machine learning based data curation tool, that's one of our fundamental tools for curation. So one of the things in the life sciences industry is you tend to have several data sources that are slightly aligned. But they're actually different and so machine learning is an excellent application. >> Lets get into the portfolio. Obviously as a CTO you've got to build a holistic view. You have a tool chest of tools and a platform. How do you look at the big picture? On that scale if it's been beautifully makes a lot of sense. So good for those guys. But you know big picture is, you got to have a variety of things in your arsenal. How do you architect that tool shed or your platform? Is everything a hammer, everything's a nail. You've got all of them though. All the things to build. >> You bring up a great point cause unfortunately a lot of times. We'll use your analogy, it's like a tool shed. So you don't want 12 lawnmowers right? In your tool shed right? So one of the challenges is that a lot of the folks in this ecosystem. They start with one area of focus and then they try to grow into area of focuses. Which means that suddenly everybody's starts to be a lawnmower, cause they think that's... >> They start as a hammer and turn into a lawn mower. >> Right. >> How did that happen, that's called pivoting. >> You can mow your lawn with a hammer but. So it's really that portfolio of tools that all together get the job done. So certainly there's a data acquisition component, there's the curation component. There's visualization machines learning, there's the foundational layer of the environment. So all of those things, our approach has been to select. The kind of best in class tools around that and then work together and... Bruno and the team at scale have been part of this. We've actually had partner summits of how do we bring that ecosystem together. >> Is your stuff mostly on prime, obviously a lot of pharma IP there. So you guys have the game that poll patent thing which is well documented. You don't want to open up the kimono and start the cloth until it's releasing so. You obviously got to keep things confidential. Mix of cloud, on prime, is it 100 percent on prime? Is there some versing for the cloud? Is it a private cloud, how do you guys look at the cloud piece? >> Yeah majority of what we're doing is on prime. The profile for us is that we persist the data. So it's not. In some cases when we're doing some of the more advanced analytics we burst to the cloud for additional processors. But the model of persisting the data means that it's much more economical to have on prime instance of what we're doing. But it is a combination, but the majority of what we're doing is on prime. >> So will you hold on Jim, one more question. I mean obviously everyone's knocking on your door. You know how to get in that account. They spend a lot of money. But you're pretty disciplined it sounds like you've got to a good view of you don't want people to come in and turn into someone that you don't want them to be. But you also run R&D so you got to have to understand the head room. How do you look at the head room of what you need down the road in terms of how you interface with the suppliers that knock on your door. Whether it's at scale currently working with you now. And then people just trying to get in there and sell you a hammer or a lawn mower. Whatever they have they're going to try, you know you're dealing with the vendor pressure. >> Right well a lot of that is around what problem we're trying to solve. And we drive all of that based on the use cases and the value to the business. I mean and so if we identify gaps that we need to address. Some of those are more specific to life sciences types of challenges where they're very specific types of tools that the population of partners is quite small. And other things. We're building an actual production, operational environment. We're not building a proof of concept, so security is extremely important. We're coberosa enabled end to end to out rest inflight. Which means it breaks some of the tools and so there's criteria of things that need to be in place in order to... >> So you got anything about scale big time? So not just putting a beach head together. But foundationally building out platform. Having the tools that fit general purpose and also specialty but scales a big thing right? >> And it's also we're addressing what we see is three different cohorts of consumers of the data. One is more in the guided analytics, the more traditional dashboards, reports. One is in more of computational notebooks, more of the scientific using R, Python, other languages. The third is more kind of almost at the bare middle level machine learning, tenser flow a number of tools that people directly interact. People don't necessarily fit nicely into those three cohorts so we're also seeing that, there's a blend. And that's something that we're also... >> There's a fourth cohort. >> Yeah well you know someone's using a computational notebook but they want to draw upon a dashboard graphic. And then they want to run a predefined tenser flow and pull all that together so. >> And what you just said, tied up the question I was going to ask. So it's perfect so. One of my core focuses is as a Wikibon analyst is on deep learning. On AI so in semantic data virtualization in a life sciences pharma context. You have undoubtedly a lot of image data, visual data. So in terms of curating that and enabling you know virtualized access to what extent are you using deep learning, tenser flow, convolutional neural networks to be able to surface up the visual patterns that can conceivably be searched using a variety of techniques. Is that a part of your overall implementation of at scale for your particular use cases currently? Or do you plan to go there in terms of like tenser flow? >> No I mean we're active, very active. In deep learning, artificial intelligence, machine learning. Again it depends on which problem you're trying to solve and so we again, there's a number of components that come together when you're looking at the image analytics. Verses using data to drive out certain decisions. But we're acting in all of those areas. Our ultimate goal is to transform the way that R&D is done within a pharmaceutical company. To accelerate the, right now it takes somewhere between five and 15 years to develop a new medicine. The goal is to really to do a lot more analytics to shorten that time significantly. Helps the patients, gets the medicines to market faster. >> That's your end game you've got to create an architecture that enables the data to add value. >> Right. >> The business. Doctor Mark Ramsey thanks so much for sharing the insight from your environment. Bruno you got something there to show us. What do you got there? He always brings a prop on. >> A few years ago I think I had a tattoo on my neck or something like this. But I'm happy that I brought this because you could see how big Mark's vision is. the reason why he's getting recognized by club they're on the data awards and so forth. Is because he's got a huge vision and it's a great opportunity for a lot of CTOs out there. I think the average CEO spent a 100 million dollars to deploy big data solutions over the last five years. But they're not able to consumer all the data they produce. I think in your case you consume about a 100 percent of the instructor data. And the average in this space is we're able to consume about one percent of the data. And this is essentially the analogy today that you're dealing with if you're on the enterprise. We'd spent a lot of time putting data in large systems and so forth. But the tool set that we give, that you did officers in their team is a cocktail straw lik this in order to drink out of it. >> That's a data lake actually. >> It's an actual lake. It's a Slurpee cup. Multiple Slurpees with the same straw. >> Who has the Hudson river water here? >> I can't answer that question I think I'd have to break a few things if I did. But the idea here is that it's not very satisfying. Enough the frustration business users and business units. When at scale's done is we built this, this is the straw you want. So I would kind of help CTOs contemplate this idea of the Slurpee and the cocktail straw. How much money are you spending here and how much money are you spending there. Because the speed at which you can get the insights to the business user. >> You got to get that straw you got to break it down so it's available everywhere. So I think that's a great innovation and it makes me thirsty. >> You know what, you can have it. >> Bruno thanks for coming from at scale. Doctor Mark Ramsey good to see you again great to have you come back. Again anytime love to have chief data officers on. Really a pioneering position, is the critical position in all organizations. It will be in the future and will continue being. Thanks for sharing your insights. It's the Cube, more live coverage after this short break. (tech music)
SUMMARY :
Brought to you by, And Bruno as he's the CMO at Fscale, So Bruno I want to start with you And of course some of the announcements that we're doing. And you guys really have a Switzerland And you don't want to move your users You run R&D so you got that in the format that it is on the platform. It's really an agility model kind to data. So it is, it gives you that agile cycles to go through. And where do you plan to go and day out then you can move it to a physicalized version. When you know that particular data, particular views But the nice thing is you can experiment You know where you have dozens upon dozens of databases So it's not platformed disparate it's on the platform So one of the things we're focused on So you have the flexibility with at scale. Things that are a winner you check the box, You get crowd sourcing benefits So the curation goes on So one of the things in the life sciences industry you got to have a variety of things in your arsenal. So one of the challenges is that a lot of the folks Bruno and the team at scale have been part of this. So you guys have the game that poll patent thing but the majority of what we're doing is on prime. of what you need down the road and the value to the business. So you got anything about scale big time? more of the scientific using R, Python, other languages. Yeah well you know someone's using to what extent are you using deep learning, Helps the patients, gets the medicines to market faster. that enables the data to add value. Bruno you got something there to show us. that you did officers in their team is a cocktail straw It's a Slurpee cup. Because the speed at which you can get the insights you got to break it down so it's available everywhere. Doctor Mark Ramsey good to see you again
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Mark | PERSON | 0.99+ |
Bruno | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
20 miles | QUANTITY | 0.99+ |
Mark Ramsey | PERSON | 0.99+ |
100 percent | QUANTITY | 0.99+ |
12 lawnmowers | QUANTITY | 0.99+ |
GSK | ORGANIZATION | 0.99+ |
100 million dollars | QUANTITY | 0.99+ |
Fscale | ORGANIZATION | 0.99+ |
third | QUANTITY | 0.99+ |
dozens | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
15 years | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
today | DATE | 0.99+ |
Bruno Aziza | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
fourth cohort | QUANTITY | 0.98+ |
NYC | LOCATION | 0.98+ |
Cube | ORGANIZATION | 0.98+ |
Hudson river | LOCATION | 0.98+ |
Vivens | ORGANIZATION | 0.98+ |
Switzerland | LOCATION | 0.98+ |
three cohorts | QUANTITY | 0.98+ |
Doctor | PERSON | 0.98+ |
billions of rows | QUANTITY | 0.97+ |
Ten X | QUANTITY | 0.97+ |
tomorrow | DATE | 0.97+ |
two guests | QUANTITY | 0.97+ |
one format | QUANTITY | 0.97+ |
thousands of rows | QUANTITY | 0.97+ |
BIGDATA | ORGANIZATION | 0.97+ |
prime | COMMERCIAL_ITEM | 0.96+ |
one more question | QUANTITY | 0.96+ |
couple years ago | DATE | 0.96+ |
Dr. | PERSON | 0.96+ |
agile | TITLE | 0.96+ |
R&D | ORGANIZATION | 0.95+ |
two different scenarios | QUANTITY | 0.95+ |
about one percent | QUANTITY | 0.95+ |
five | QUANTITY | 0.93+ |
Strata Data | ORGANIZATION | 0.93+ |
three different cohorts | QUANTITY | 0.92+ |
Mid Town Manhattan | LOCATION | 0.92+ |
dozens of databases | QUANTITY | 0.92+ |
Wikibon | ORGANIZATION | 0.92+ |
ten, | DATE | 0.89+ |
about a 100 percent | QUANTITY | 0.89+ |
BigData | ORGANIZATION | 0.88+ |
2017 | DATE | 0.86+ |
one area | QUANTITY | 0.81+ |
BIGDATA New York City 2017 | EVENT | 0.79+ |
last five years | DATE | 0.78+ |
15 years ago | DATE | 0.78+ |
about a dozen different technologies | QUANTITY | 0.76+ |
A few years ago | DATE | 0.76+ |
one end | QUANTITY | 0.74+ |
Glasgow Pharma | ORGANIZATION | 0.7+ |
things | QUANTITY | 0.69+ |
R | TITLE | 0.65+ |
AWS Startup Showcase S2S4 promo2
(dramatic wooshing) >> Hello, and I'm John Furrier, host of theCUBE. Check out the upcoming Season 2, Episode 4 AWS Startup Showcase featuring Cybersecurity. We got 10 hot growing startups. We got keynote from Jon Ramsey, Vice President of AWS Security, as well as amazing Heroes, AWS Cloud Heroes in security, Liz Rice, and we've got some amazing, talented people sharing their insights. Here on theCUBE, every episode is a new topic. This topic is cybersecurity. Check it out. It's an ongoing series. It's the hottest startups in the ecosystem of AWS, Amazon Web Services. It's theCUBE.
SUMMARY :
It's the hottest startups
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Liz Rice | PERSON | 0.99+ |
Jon Ramsey | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
AWS Security | ORGANIZATION | 0.98+ |
10 hot growing startups | QUANTITY | 0.9+ |
Vice President | PERSON | 0.87+ |
Heroes | TITLE | 0.87+ |
Season 2 | QUANTITY | 0.79+ |
theCUBE | ORGANIZATION | 0.75+ |
every episode | QUANTITY | 0.7+ |
AWS Cloud | ORGANIZATION | 0.7+ |
Episode 4 | QUANTITY | 0.69+ |
Startup Showcase | EVENT | 0.67+ |
S2S4 | EVENT | 0.45+ |
AWS Startup Showcase S2S4 promo1
(air whooshing) (cymbal crashing) >> Hello everybody, I'm John Furrier, host of theCUBE. Join us for the season two, episode four of the ongoing series, The AWS Startup Showcase. For this episode, it's all about cybersecurity, hackers, super hackers, super cloud, all 10 companies presenting are the latest, hottest companies in cybersecurity startups. Of course, John Ramsey will be keynoting. He's the vice president of AWS, a security team. And of course, we've got great expert panels with the heroes, Liz Rice from Open Source, talking about kernaling in Linux kernal, security programming to best practices for CSOs. If you're a CSO or CXO, check it out.
SUMMARY :
of the ongoing series,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Liz Rice | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
John Ramsey | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Linux | TITLE | 0.99+ |
CXO | TITLE | 0.98+ |
10 companies | QUANTITY | 0.9+ |
CSO | TITLE | 0.89+ |
Startup Showcase | TITLE | 0.87+ |
season two | QUANTITY | 0.86+ |
episode four | QUANTITY | 0.73+ |
Startup Showcase S2S4 | TITLE | 0.69+ |
theCUBE | ORGANIZATION | 0.64+ |
CSOs | TITLE | 0.61+ |
Open | ORGANIZATION | 0.44+ |
4-video test
>>don't talk mhm, >>Okay, thing is my presentation on coherent nonlinear dynamics and combinatorial optimization. This is going to be a talk to introduce an approach we're taking to the analysis of the performance of coherent using machines. So let me start with a brief introduction to easing optimization. The easing model represents a set of interacting magnetic moments or spins the total energy given by the expression shown at the bottom left of this slide. Here, the signal variables are meditate binary values. The Matrix element J. I. J. Represents the interaction, strength and signed between any pair of spins. I. J and A Chive represents a possible local magnetic field acting on each thing. The easing ground state problem is to find an assignment of binary spin values that achieves the lowest possible value of total energy. And an instance of the easing problem is specified by giving numerical values for the Matrix J in Vector H. Although the easy model originates in physics, we understand the ground state problem to correspond to what would be called quadratic binary optimization in the field of operations research and in fact, in terms of computational complexity theory, it could be established that the easing ground state problem is np complete. Qualitatively speaking, this makes the easing problem a representative sort of hard optimization problem, for which it is expected that the runtime required by any computational algorithm to find exact solutions should, as anatomically scale exponentially with the number of spends and for worst case instances at each end. Of course, there's no reason to believe that the problem instances that actually arrives in practical optimization scenarios are going to be worst case instances. And it's also not generally the case in practical optimization scenarios that we demand absolute optimum solutions. Usually we're more interested in just getting the best solution we can within an affordable cost, where costs may be measured in terms of time, service fees and or energy required for a computation. This focuses great interest on so called heuristic algorithms for the easing problem in other NP complete problems which generally get very good but not guaranteed optimum solutions and run much faster than algorithms that are designed to find absolute Optima. To get some feeling for present day numbers, we can consider the famous traveling salesman problem for which extensive compilations of benchmarking data may be found online. A recent study found that the best known TSP solver required median run times across the Library of Problem instances That scaled is a very steep route exponential for end up to approximately 4500. This gives some indication of the change in runtime scaling for generic as opposed the worst case problem instances. Some of the instances considered in this study were taken from a public library of T SPS derived from real world Veil aside design data. This feels I TSP Library includes instances within ranging from 131 to 744,710 instances from this library with end between 6880 13,584 were first solved just a few years ago in 2017 requiring days of run time and a 48 core to King hurts cluster, while instances with and greater than or equal to 14,233 remain unsolved exactly by any means. Approximate solutions, however, have been found by heuristic methods for all instances in the VLS i TSP library with, for example, a solution within 0.14% of a no lower bound, having been discovered, for instance, with an equal 19,289 requiring approximately two days of run time on a single core of 2.4 gigahertz. Now, if we simple mindedly extrapolate the root exponential scaling from the study up to an equal 4500, we might expect that an exact solver would require something more like a year of run time on the 48 core cluster used for the N equals 13,580 for instance, which shows how much a very small concession on the quality of the solution makes it possible to tackle much larger instances with much lower cost. At the extreme end, the largest TSP ever solved exactly has an equal 85,900. This is an instance derived from 19 eighties VLSI design, and it's required 136 CPU. Years of computation normalized to a single cord, 2.4 gigahertz. But the 24 larger so called world TSP benchmark instance within equals 1,904,711 has been solved approximately within ophthalmology. Gap bounded below 0.474%. Coming back to the general. Practical concerns have applied optimization. We may note that a recent meta study analyzed the performance of no fewer than 37 heuristic algorithms for Max cut and quadratic pioneer optimization problems and found the performance sort and found that different heuristics work best for different problem instances selected from a large scale heterogeneous test bed with some evidence but cryptic structure in terms of what types of problem instances were best solved by any given heuristic. Indeed, their their reasons to believe that these results from Mexico and quadratic binary optimization reflected general principle of performance complementarity among heuristic optimization algorithms in the practice of solving heart optimization problems there. The cerise is a critical pre processing issue of trying to guess which of a number of available good heuristic algorithms should be chosen to tackle a given problem. Instance, assuming that any one of them would incur high costs to run on a large problem, instances incidence, making an astute choice of heuristic is a crucial part of maximizing overall performance. Unfortunately, we still have very little conceptual insight about what makes a specific problem instance, good or bad for any given heuristic optimization algorithm. This has certainly been pinpointed by researchers in the field is a circumstance that must be addressed. So adding this all up, we see that a critical frontier for cutting edge academic research involves both the development of novel heuristic algorithms that deliver better performance, with lower cost on classes of problem instances that are underserved by existing approaches, as well as fundamental research to provide deep conceptual insight into what makes a given problem in, since easy or hard for such algorithms. In fact, these days, as we talk about the end of Moore's law and speculate about a so called second quantum revolution, it's natural to talk not only about novel algorithms for conventional CPUs but also about highly customized special purpose hardware architectures on which we may run entirely unconventional algorithms for combinatorial optimization such as easing problem. So against that backdrop, I'd like to use my remaining time to introduce our work on analysis of coherent using machine architectures and associate ID optimization algorithms. These machines, in general, are a novel class of information processing architectures for solving combinatorial optimization problems by embedding them in the dynamics of analog, physical or cyber physical systems, in contrast to both MAWR traditional engineering approaches that build using machines using conventional electron ICS and more radical proposals that would require large scale quantum entanglement. The emerging paradigm of coherent easing machines leverages coherent nonlinear dynamics in photonic or Opto electronic platforms to enable near term construction of large scale prototypes that leverage post Simoes information dynamics, the general structure of of current CM systems has shown in the figure on the right. The role of the easing spins is played by a train of optical pulses circulating around a fiber optical storage ring. A beam splitter inserted in the ring is used to periodically sample the amplitude of every optical pulse, and the measurement results are continually read into a refugee A, which uses them to compute perturbations to be applied to each pulse by a synchronized optical injections. These perturbations, air engineered to implement the spin, spin coupling and local magnetic field terms of the easing Hamiltonian, corresponding to a linear part of the CME Dynamics, a synchronously pumped parametric amplifier denoted here as PPL and Wave Guide adds a crucial nonlinear component to the CIA and Dynamics as well. In the basic CM algorithm, the pump power starts very low and has gradually increased at low pump powers. The amplitude of the easing spin pulses behaviors continuous, complex variables. Who Israel parts which can be positive or negative, play the role of play the role of soft or perhaps mean field spins once the pump, our crosses the threshold for parametric self oscillation. In the optical fiber ring, however, the attitudes of the easing spin pulses become effectively Qantas ized into binary values while the pump power is being ramped up. The F P J subsystem continuously applies its measurement based feedback. Implementation of the using Hamiltonian terms, the interplay of the linear rised using dynamics implemented by the F P G A and the threshold conversation dynamics provided by the sink pumped Parametric amplifier result in the final state of the optical optical pulse amplitude at the end of the pump ramp that could be read as a binary strain, giving a proposed solution of the easing ground state problem. This method of solving easing problem seems quite different from a conventional algorithm that runs entirely on a digital computer as a crucial aspect of the computation is performed physically by the analog, continuous, coherent, nonlinear dynamics of the optical degrees of freedom. In our efforts to analyze CIA and performance, we have therefore turned to the tools of dynamical systems theory, namely, a study of modifications, the evolution of critical points and apologies of hetero clinic orbits and basins of attraction. We conjecture that such analysis can provide fundamental insight into what makes certain optimization instances hard or easy for coherent using machines and hope that our approach can lead to both improvements of the course, the AM algorithm and a pre processing rubric for rapidly assessing the CME suitability of new instances. Okay, to provide a bit of intuition about how this all works, it may help to consider the threshold dynamics of just one or two optical parametric oscillators in the CME architecture just described. We can think of each of the pulse time slots circulating around the fiber ring, as are presenting an independent Opio. We can think of a single Opio degree of freedom as a single, resonant optical node that experiences linear dissipation, do toe out coupling loss and gain in a pump. Nonlinear crystal has shown in the diagram on the upper left of this slide as the pump power is increased from zero. As in the CME algorithm, the non linear game is initially to low toe overcome linear dissipation, and the Opio field remains in a near vacuum state at a critical threshold. Value gain. Equal participation in the Popeo undergoes a sort of lazing transition, and the study states of the OPIO above this threshold are essentially coherent states. There are actually two possible values of the Opio career in amplitude and any given above threshold pump power which are equal in magnitude but opposite in phase when the OPI across the special diet basically chooses one of the two possible phases randomly, resulting in the generation of a single bit of information. If we consider to uncoupled, Opio has shown in the upper right diagram pumped it exactly the same power at all times. Then, as the pump power has increased through threshold, each Opio will independently choose the phase and thus to random bits are generated for any number of uncoupled. Oppose the threshold power per opio is unchanged from the single Opio case. Now, however, consider a scenario in which the two appeals air, coupled to each other by a mutual injection of their out coupled fields has shown in the diagram on the lower right. One can imagine that depending on the sign of the coupling parameter Alfa, when one Opio is lazing, it will inject a perturbation into the other that may interfere either constructively or destructively, with the feel that it is trying to generate by its own lazing process. As a result, when came easily showed that for Alfa positive, there's an effective ferro magnetic coupling between the two Opio fields and their collective oscillation threshold is lowered from that of the independent Opio case. But on Lee for the two collective oscillation modes in which the two Opio phases are the same for Alfa Negative, the collective oscillation threshold is lowered on Lee for the configurations in which the Opio phases air opposite. So then, looking at how Alfa is related to the J. I. J matrix of the easing spin coupling Hamiltonian, it follows that we could use this simplistic to a p o. C. I am to solve the ground state problem of a fair magnetic or anti ferro magnetic ankles to easing model simply by increasing the pump power from zero and observing what phase relation occurs as the two appeals first start delays. Clearly, we can imagine generalizing this story toe larger, and however the story doesn't stay is clean and simple for all larger problem instances. And to find a more complicated example, we only need to go to n equals four for some choices of J J for n equals, for the story remains simple. Like the n equals two case. The figure on the upper left of this slide shows the energy of various critical points for a non frustrated and equals, for instance, in which the first bifurcated critical point that is the one that I forget to the lowest pump value a. Uh, this first bifurcated critical point flows as symptomatically into the lowest energy easing solution and the figure on the upper right. However, the first bifurcated critical point flows to a very good but sub optimal minimum at large pump power. The global minimum is actually given by a distinct critical critical point that first appears at a higher pump power and is not automatically connected to the origin. The basic C am algorithm is thus not able to find this global minimum. Such non ideal behaviors needs to become more confident. Larger end for the n equals 20 instance, showing the lower plots where the lower right plot is just a zoom into a region of the lower left lot. It can be seen that the global minimum corresponds to a critical point that first appears out of pump parameter, a around 0.16 at some distance from the idiomatic trajectory of the origin. That's curious to note that in both of these small and examples, however, the critical point corresponding to the global minimum appears relatively close to the idiomatic projector of the origin as compared to the most of the other local minima that appear. We're currently working to characterize the face portrait topology between the global minimum in the antibiotic trajectory of the origin, taking clues as to how the basic C am algorithm could be generalized to search for non idiomatic trajectories that jump to the global minimum during the pump ramp. Of course, n equals 20 is still too small to be of interest for practical optimization applications. But the advantage of beginning with the study of small instances is that we're able reliably to determine their global minima and to see how they relate to the 80 about trajectory of the origin in the basic C am algorithm. In the smaller and limit, we can also analyze fully quantum mechanical models of Syrian dynamics. But that's a topic for future talks. Um, existing large scale prototypes are pushing into the range of in equals 10 to the 4 10 to 5 to six. So our ultimate objective in theoretical analysis really has to be to try to say something about CIA and dynamics and regime of much larger in our initial approach to characterizing CIA and behavior in the large in regime relies on the use of random matrix theory, and this connects to prior research on spin classes, SK models and the tap equations etcetera. At present, we're focusing on statistical characterization of the CIA ingredient descent landscape, including the evolution of critical points in their Eigen value spectra. As the pump power is gradually increased. We're investigating, for example, whether there could be some way to exploit differences in the relative stability of the global minimum versus other local minima. We're also working to understand the deleterious or potentially beneficial effects of non ideologies, such as a symmetry in the implemented these and couplings. Looking one step ahead, we plan to move next in the direction of considering more realistic classes of problem instances such as quadratic, binary optimization with constraints. Eso In closing, I should acknowledge people who did the hard work on these things that I've shown eso. My group, including graduate students Ed winning, Daniel Wennberg, Tatsuya Nagamoto and Atsushi Yamamura, have been working in close collaboration with Syria Ganguly, Marty Fair and Amir Safarini Nini, all of us within the Department of Applied Physics at Stanford University. On also in collaboration with the Oshima Moto over at NTT 55 research labs, Onda should acknowledge funding support from the NSF by the Coherent Easing Machines Expedition in computing, also from NTT five research labs, Army Research Office and Exxon Mobil. Uh, that's it. Thanks very much. >>Mhm e >>t research and the Oshie for putting together this program and also the opportunity to speak here. My name is Al Gore ism or Andy and I'm from Caltech, and today I'm going to tell you about the work that we have been doing on networks off optical parametric oscillators and how we have been using them for icing machines and how we're pushing them toward Cornum photonics to acknowledge my team at Caltech, which is now eight graduate students and five researcher and postdocs as well as collaborators from all over the world, including entity research and also the funding from different places, including entity. So this talk is primarily about networks of resonate er's, and these networks are everywhere from nature. For instance, the brain, which is a network of oscillators all the way to optics and photonics and some of the biggest examples or metal materials, which is an array of small resonate er's. And we're recently the field of technological photonics, which is trying thio implement a lot of the technological behaviors of models in the condensed matter, physics in photonics and if you want to extend it even further, some of the implementations off quantum computing are technically networks of quantum oscillators. So we started thinking about these things in the context of icing machines, which is based on the icing problem, which is based on the icing model, which is the simple summation over the spins and spins can be their upward down and the couplings is given by the JJ. And the icing problem is, if you know J I J. What is the spin configuration that gives you the ground state? And this problem is shown to be an MP high problem. So it's computational e important because it's a representative of the MP problems on NPR. Problems are important because first, their heart and standard computers if you use a brute force algorithm and they're everywhere on the application side. That's why there is this demand for making a machine that can target these problems, and hopefully it can provide some meaningful computational benefit compared to the standard digital computers. So I've been building these icing machines based on this building block, which is a degenerate optical parametric. Oscillator on what it is is resonator with non linearity in it, and we pump these resonate er's and we generate the signal at half the frequency of the pump. One vote on a pump splits into two identical photons of signal, and they have some very interesting phase of frequency locking behaviors. And if you look at the phase locking behavior, you realize that you can actually have two possible phase states as the escalation result of these Opio which are off by pie, and that's one of the important characteristics of them. So I want to emphasize a little more on that and I have this mechanical analogy which are basically two simple pendulum. But there are parametric oscillators because I'm going to modulate the parameter of them in this video, which is the length of the string on by that modulation, which is that will make a pump. I'm gonna make a muscular. That'll make a signal which is half the frequency of the pump. And I have two of them to show you that they can acquire these face states so they're still facing frequency lock to the pump. But it can also lead in either the zero pie face states on. The idea is to use this binary phase to represent the binary icing spin. So each opio is going to represent spin, which can be either is your pie or up or down. And to implement the network of these resonate er's, we use the time off blood scheme, and the idea is that we put impulses in the cavity. These pulses air separated by the repetition period that you put in or t r. And you can think about these pulses in one resonator, xaz and temporarily separated synthetic resonate Er's if you want a couple of these resonator is to each other, and now you can introduce these delays, each of which is a multiple of TR. If you look at the shortest delay it couples resonator wanted to 2 to 3 and so on. If you look at the second delay, which is two times a rotation period, the couple's 123 and so on. And if you have and minus one delay lines, then you can have any potential couplings among these synthetic resonate er's. And if I can introduce these modulators in those delay lines so that I can strength, I can control the strength and the phase of these couplings at the right time. Then I can have a program will all toe all connected network in this time off like scheme, and the whole physical size of the system scales linearly with the number of pulses. So the idea of opium based icing machine is didn't having these o pos, each of them can be either zero pie and I can arbitrarily connect them to each other. And then I start with programming this machine to a given icing problem by just setting the couplings and setting the controllers in each of those delight lines. So now I have a network which represents an icing problem. Then the icing problem maps to finding the face state that satisfy maximum number of coupling constraints. And the way it happens is that the icing Hamiltonian maps to the linear loss of the network. And if I start adding gain by just putting pump into the network, then the OPI ohs are expected to oscillate in the lowest, lowest lost state. And, uh and we have been doing these in the past, uh, six or seven years and I'm just going to quickly show you the transition, especially what happened in the first implementation, which was using a free space optical system and then the guided wave implementation in 2016 and the measurement feedback idea which led to increasing the size and doing actual computation with these machines. So I just want to make this distinction here that, um, the first implementation was an all optical interaction. We also had an unequal 16 implementation. And then we transition to this measurement feedback idea, which I'll tell you quickly what it iss on. There's still a lot of ongoing work, especially on the entity side, to make larger machines using the measurement feedback. But I'm gonna mostly focused on the all optical networks and how we're using all optical networks to go beyond simulation of icing Hamiltonian both in the linear and non linear side and also how we're working on miniaturization of these Opio networks. So the first experiment, which was the four opium machine, it was a free space implementation and this is the actual picture off the machine and we implemented a small and it calls for Mexico problem on the machine. So one problem for one experiment and we ran the machine 1000 times, we looked at the state and we always saw it oscillate in one of these, um, ground states of the icing laboratoria. So then the measurement feedback idea was to replace those couplings and the controller with the simulator. So we basically simulated all those coherent interactions on on FB g. A. And we replicated the coherent pulse with respect to all those measurements. And then we injected it back into the cavity and on the near to you still remain. So it still is a non. They're dynamical system, but the linear side is all simulated. So there are lots of questions about if this system is preserving important information or not, or if it's gonna behave better. Computational wars. And that's still ah, lot of ongoing studies. But nevertheless, the reason that this implementation was very interesting is that you don't need the end minus one delight lines so you can just use one. Then you can implement a large machine, and then you can run several thousands of problems in the machine, and then you can compare the performance from the computational perspective Looks so I'm gonna split this idea of opium based icing machine into two parts. One is the linear part, which is if you take out the non linearity out of the resonator and just think about the connections. You can think about this as a simple matrix multiplication scheme. And that's basically what gives you the icing Hambletonian modeling. So the optical laws of this network corresponds to the icing Hamiltonian. And if I just want to show you the example of the n equals for experiment on all those face states and the history Graham that we saw, you can actually calculate the laws of each of those states because all those interferences in the beam splitters and the delay lines are going to give you a different losses. And then you will see that the ground states corresponds to the lowest laws of the actual optical network. If you add the non linearity, the simple way of thinking about what the non linearity does is that it provides to gain, and then you start bringing up the gain so that it hits the loss. Then you go through the game saturation or the threshold which is going to give you this phase bifurcation. So you go either to zero the pie face state. And the expectation is that Theis, the network oscillates in the lowest possible state, the lowest possible loss state. There are some challenges associated with this intensity Durban face transition, which I'm going to briefly talk about. I'm also going to tell you about other types of non aerodynamics that we're looking at on the non air side of these networks. So if you just think about the linear network, we're actually interested in looking at some technological behaviors in these networks. And the difference between looking at the technological behaviors and the icing uh, machine is that now, First of all, we're looking at the type of Hamilton Ian's that are a little different than the icing Hamilton. And one of the biggest difference is is that most of these technological Hamilton Ian's that require breaking the time reversal symmetry, meaning that you go from one spin to in the one side to another side and you get one phase. And if you go back where you get a different phase, and the other thing is that we're not just interested in finding the ground state, we're actually now interesting and looking at all sorts of states and looking at the dynamics and the behaviors of all these states in the network. So we started with the simplest implementation, of course, which is a one d chain of thes resonate, er's, which corresponds to a so called ssh model. In the technological work, we get the similar energy to los mapping and now we can actually look at the band structure on. This is an actual measurement that we get with this associate model and you see how it reasonably how How? Well, it actually follows the prediction and the theory. One of the interesting things about the time multiplexing implementation is that now you have the flexibility of changing the network as you are running the machine. And that's something unique about this time multiplex implementation so that we can actually look at the dynamics. And one example that we have looked at is we can actually go through the transition off going from top A logical to the to the standard nontrivial. I'm sorry to the trivial behavior of the network. You can then look at the edge states and you can also see the trivial and states and the technological at states actually showing up in this network. We have just recently implement on a two D, uh, network with Harper Hofstadter model and when you don't have the results here. But we're one of the other important characteristic of time multiplexing is that you can go to higher and higher dimensions and keeping that flexibility and dynamics, and we can also think about adding non linearity both in a classical and quantum regimes, which is going to give us a lot of exotic, no classical and quantum, non innate behaviors in these networks. Yeah, So I told you about the linear side. Mostly let me just switch gears and talk about the nonlinear side of the network. And the biggest thing that I talked about so far in the icing machine is this face transition that threshold. So the low threshold we have squeezed state in these. Oh, pios, if you increase the pump, we go through this intensity driven phase transition and then we got the face stays above threshold. And this is basically the mechanism off the computation in these O pos, which is through this phase transition below to above threshold. So one of the characteristics of this phase transition is that below threshold, you expect to see quantum states above threshold. You expect to see more classical states or coherent states, and that's basically corresponding to the intensity off the driving pump. So it's really hard to imagine that it can go above threshold. Or you can have this friends transition happen in the all in the quantum regime. And there are also some challenges associated with the intensity homogeneity off the network, which, for example, is if one opioid starts oscillating and then its intensity goes really high. Then it's going to ruin this collective decision making off the network because of the intensity driven face transition nature. So So the question is, can we look at other phase transitions? Can we utilize them for both computing? And also can we bring them to the quantum regime on? I'm going to specifically talk about the face transition in the spectral domain, which is the transition from the so called degenerate regime, which is what I mostly talked about to the non degenerate regime, which happens by just tuning the phase of the cavity. And what is interesting is that this phase transition corresponds to a distinct phase noise behavior. So in the degenerate regime, which we call it the order state, you're gonna have the phase being locked to the phase of the pump. As I talked about non degenerate regime. However, the phase is the phase is mostly dominated by the quantum diffusion. Off the off the phase, which is limited by the so called shallow towns limit, and you can see that transition from the general to non degenerate, which also has distinct symmetry differences. And this transition corresponds to a symmetry breaking in the non degenerate case. The signal can acquire any of those phases on the circle, so it has a you one symmetry. Okay, and if you go to the degenerate case, then that symmetry is broken and you only have zero pie face days I will look at. So now the question is can utilize this phase transition, which is a face driven phase transition, and can we use it for similar computational scheme? So that's one of the questions that were also thinking about. And it's not just this face transition is not just important for computing. It's also interesting from the sensing potentials and this face transition, you can easily bring it below threshold and just operated in the quantum regime. Either Gaussian or non Gaussian. If you make a network of Opio is now, we can see all sorts off more complicated and more interesting phase transitions in the spectral domain. One of them is the first order phase transition, which you get by just coupling to Opio, and that's a very abrupt face transition and compared to the to the single Opio phase transition. And if you do the couplings right, you can actually get a lot of non her mission dynamics and exceptional points, which are actually very interesting to explore both in the classical and quantum regime. And I should also mention that you can think about the cup links to be also nonlinear couplings. And that's another behavior that you can see, especially in the nonlinear in the non degenerate regime. So with that, I basically told you about these Opio networks, how we can think about the linear scheme and the linear behaviors and how we can think about the rich, nonlinear dynamics and non linear behaviors both in the classical and quantum regime. I want to switch gear and tell you a little bit about the miniaturization of these Opio networks. And of course, the motivation is if you look at the electron ICS and what we had 60 or 70 years ago with vacuum tube and how we transition from relatively small scale computers in the order of thousands of nonlinear elements to billions of non elements where we are now with the optics is probably very similar to 70 years ago, which is a table talk implementation. And the question is, how can we utilize nano photonics? I'm gonna just briefly show you the two directions on that which we're working on. One is based on lithium Diabate, and the other is based on even a smaller resonate er's could you? So the work on Nana Photonic lithium naive. It was started in collaboration with Harvard Marko Loncar, and also might affair at Stanford. And, uh, we could show that you can do the periodic polling in the phenomenon of it and get all sorts of very highly nonlinear processes happening in this net. Photonic periodically polls if, um Diabate. And now we're working on building. Opio was based on that kind of photonic the film Diabate. And these air some some examples of the devices that we have been building in the past few months, which I'm not gonna tell you more about. But the O. P. O. S. And the Opio Networks are in the works. And that's not the only way of making large networks. Um, but also I want to point out that The reason that these Nana photonic goblins are actually exciting is not just because you can make a large networks and it can make him compact in a in a small footprint. They also provide some opportunities in terms of the operation regime. On one of them is about making cat states and Opio, which is, can we have the quantum superposition of the zero pie states that I talked about and the Net a photonic within? I've It provides some opportunities to actually get closer to that regime because of the spatial temporal confinement that you can get in these wave guides. So we're doing some theory on that. We're confident that the type of non linearity two losses that it can get with these platforms are actually much higher than what you can get with other platform their existing platforms and to go even smaller. We have been asking the question off. What is the smallest possible Opio that you can make? Then you can think about really wavelength scale type, resonate er's and adding the chi to non linearity and see how and when you can get the Opio to operate. And recently, in collaboration with us see, we have been actually USC and Creole. We have demonstrated that you can use nano lasers and get some spin Hamilton and implementations on those networks. So if you can build the a P. O s, we know that there is a path for implementing Opio Networks on on such a nano scale. So we have looked at these calculations and we try to estimate the threshold of a pos. Let's say for me resonator and it turns out that it can actually be even lower than the type of bulk Pip Llano Pos that we have been building in the past 50 years or so. So we're working on the experiments and we're hoping that we can actually make even larger and larger scale Opio networks. So let me summarize the talk I told you about the opium networks and our work that has been going on on icing machines and the measurement feedback. And I told you about the ongoing work on the all optical implementations both on the linear side and also on the nonlinear behaviors. And I also told you a little bit about the efforts on miniaturization and going to the to the Nano scale. So with that, I would like Thio >>three from the University of Tokyo. Before I thought that would like to thank you showing all the stuff of entity for the invitation and the organization of this online meeting and also would like to say that it has been very exciting to see the growth of this new film lab. And I'm happy to share with you today of some of the recent works that have been done either by me or by character of Hong Kong. Honest Group indicates the title of my talk is a neuro more fic in silica simulator for the communities in machine. And here is the outline I would like to make the case that the simulation in digital Tektronix of the CME can be useful for the better understanding or improving its function principles by new job introducing some ideas from neural networks. This is what I will discuss in the first part and then it will show some proof of concept of the game and performance that can be obtained using dissimulation in the second part and the protection of the performance that can be achieved using a very large chaos simulator in the third part and finally talk about future plans. So first, let me start by comparing recently proposed izing machines using this table there is elected from recent natural tronics paper from the village Park hard people, and this comparison shows that there's always a trade off between energy efficiency, speed and scalability that depends on the physical implementation. So in red, here are the limitation of each of the servers hardware on, interestingly, the F p G, a based systems such as a producer, digital, another uh Toshiba beautification machine or a recently proposed restricted Bozeman machine, FPD A by a group in Berkeley. They offer a good compromise between speed and scalability. And this is why, despite the unique advantage that some of these older hardware have trust as the currency proposition in Fox, CBS or the energy efficiency off memory Sisters uh P. J. O are still an attractive platform for building large organizing machines in the near future. The reason for the good performance of Refugee A is not so much that they operate at the high frequency. No, there are particular in use, efficient, but rather that the physical wiring off its elements can be reconfigured in a way that limits the funding human bottleneck, larger, funny and phenols and the long propagation video information within the system. In this respect, the LPGA is They are interesting from the perspective off the physics off complex systems, but then the physics of the actions on the photos. So to put the performance of these various hardware and perspective, we can look at the competition of bringing the brain the brain complete, using billions of neurons using only 20 watts of power and operates. It's a very theoretically slow, if we can see and so this impressive characteristic, they motivate us to try to investigate. What kind of new inspired principles be useful for designing better izing machines? The idea of this research project in the future collaboration it's to temporary alleviates the limitations that are intrinsic to the realization of an optical cortex in machine shown in the top panel here. By designing a large care simulator in silicone in the bottom here that can be used for digesting the better organization principles of the CIA and this talk, I will talk about three neuro inspired principles that are the symmetry of connections, neural dynamics orphan chaotic because of symmetry, is interconnectivity the infrastructure? No. Next talks are not composed of the reputation of always the same types of non environments of the neurons, but there is a local structure that is repeated. So here's the schematic of the micro column in the cortex. And lastly, the Iraqi co organization of connectivity connectivity is organizing a tree structure in the brain. So here you see a representation of the Iraqi and organization of the monkey cerebral cortex. So how can these principles we used to improve the performance of the icing machines? And it's in sequence stimulation. So, first about the two of principles of the estimate Trian Rico structure. We know that the classical approximation of the car testing machine, which is the ground toe, the rate based on your networks. So in the case of the icing machines, uh, the okay, Scott approximation can be obtained using the trump active in your position, for example, so the times of both of the system they are, they can be described by the following ordinary differential equations on in which, in case of see, I am the X, I represent the in phase component of one GOP Oh, Theo f represents the monitor optical parts, the district optical Parametric amplification and some of the good I JoJo extra represent the coupling, which is done in the case of the measure of feedback coupling cm using oh, more than detection and refugee A and then injection off the cooking time and eso this dynamics in both cases of CNN in your networks, they can be written as the grand set of a potential function V, and this written here, and this potential functionally includes the rising Maccagnan. So this is why it's natural to use this type of, uh, dynamics to solve the icing problem in which the Omega I J or the eyes in coping and the H is the extension of the icing and attorney in India and expect so. Not that this potential function can only be defined if the Omega I j. R. A. Symmetric. So the well known problem of this approach is that this potential function V that we obtain is very non convicts at low temperature, and also one strategy is to gradually deformed this landscape, using so many in process. But there is no theorem. Unfortunately, that granted conventions to the global minimum of There's even Tony and using this approach. And so this is why we propose, uh, to introduce a macro structures of the system where one analog spin or one D O. P. O is replaced by a pair off one another spin and one error, according viable. And the addition of this chemical structure introduces a symmetry in the system, which in terms induces chaotic dynamics, a chaotic search rather than a learning process for searching for the ground state of the icing. Every 20 within this massacre structure the role of the er variable eyes to control the amplitude off the analog spins toe force. The amplitude of the expense toe become equal to certain target amplitude a uh and, uh, and this is done by modulating the strength off the icing complaints or see the the error variable E I multiply the icing complaint here in the dynamics off air d o p. O. On then the dynamics. The whole dynamics described by this coupled equations because the e I do not necessarily take away the same value for the different. I thesis introduces a symmetry in the system, which in turn creates security dynamics, which I'm sure here for solving certain current size off, um, escape problem, Uh, in which the X I are shown here and the i r from here and the value of the icing energy showing the bottom plots. You see this Celtics search that visit various local minima of the as Newtonian and eventually finds the global minimum? Um, it can be shown that this modulation off the target opportunity can be used to destabilize all the local minima off the icing evertonians so that we're gonna do not get stuck in any of them. On more over the other types of attractors I can eventually appear, such as limits I contractors, Okot contractors. They can also be destabilized using the motivation of the target and Batuta. And so we have proposed in the past two different moderation of the target amateur. The first one is a modulation that ensure the uh 100 reproduction rate of the system to become positive on this forbids the creation off any nontrivial tractors. And but in this work, I will talk about another moderation or arrested moderation which is given here. That works, uh, as well as this first uh, moderation, but is easy to be implemented on refugee. So this couple of the question that represent becoming the stimulation of the cortex in machine with some error correction they can be implemented especially efficiently on an F B. G. And here I show the time that it takes to simulate three system and also in red. You see, at the time that it takes to simulate the X I term the EI term, the dot product and the rising Hamiltonian for a system with 500 spins and Iraq Spain's equivalent to 500 g. O. P. S. So >>in >>f b d a. The nonlinear dynamics which, according to the digital optical Parametric amplification that the Opa off the CME can be computed in only 13 clock cycles at 300 yards. So which corresponds to about 0.1 microseconds. And this is Toby, uh, compared to what can be achieved in the measurements back O C. M. In which, if we want to get 500 timer chip Xia Pios with the one she got repetition rate through the obstacle nine narrative. Uh, then way would require 0.5 microseconds toe do this so the submission in F B J can be at least as fast as ah one g repression. Uh, replicate pulsed laser CIA Um, then the DOT product that appears in this differential equation can be completed in 43 clock cycles. That's to say, one microseconds at 15 years. So I pieced for pouring sizes that are larger than 500 speeds. The dot product becomes clearly the bottleneck, and this can be seen by looking at the the skating off the time the numbers of clock cycles a text to compute either the non in your optical parts or the dog products, respect to the problem size. And And if we had infinite amount of resources and PGA to simulate the dynamics, then the non illogical post can could be done in the old one. On the mattress Vector product could be done in the low carrot off, located off scales as a look at it off and and while the guide off end. Because computing the dot product involves assuming all the terms in the product, which is done by a nephew, GE by another tree, which heights scarce logarithmic any with the size of the system. But This is in the case if we had an infinite amount of resources on the LPGA food, but for dealing for larger problems off more than 100 spins. Usually we need to decompose the metrics into ah, smaller blocks with the block side that are not you here. And then the scaling becomes funny, non inner parts linear in the end, over you and for the products in the end of EU square eso typically for low NF pdf cheap PGA you the block size off this matrix is typically about 100. So clearly way want to make you as large as possible in order to maintain this scanning in a log event for the numbers of clock cycles needed to compute the product rather than this and square that occurs if we decompose the metrics into smaller blocks. But the difficulty in, uh, having this larger blocks eyes that having another tree very large Haider tree introduces a large finding and finance and long distance start a path within the refugee. So the solution to get higher performance for a simulator of the contest in machine eyes to get rid of this bottleneck for the dot product by increasing the size of this at the tree. And this can be done by organizing your critique the electrical components within the LPGA in order which is shown here in this, uh, right panel here in order to minimize the finding finance of the system and to minimize the long distance that a path in the in the fpt So I'm not going to the details of how this is implemented LPGA. But just to give you a idea off why the Iraqi Yahiko organization off the system becomes the extremely important toe get good performance for similar organizing machine. So instead of instead of getting into the details of the mpg implementation, I would like to give some few benchmark results off this simulator, uh, off the that that was used as a proof of concept for this idea which is can be found in this archive paper here and here. I should results for solving escape problems. Free connected person, randomly person minus one spring last problems and we sure, as we use as a metric the numbers of the mattress Victor products since it's the bottleneck of the computation, uh, to get the optimal solution of this escape problem with the Nina successful BT against the problem size here and and in red here, this propose FDJ implementation and in ah blue is the numbers of retrospective product that are necessary for the C. I am without error correction to solve this escape programs and in green here for noisy means in an evening which is, uh, behavior with similar to the Cartesian mission. Uh, and so clearly you see that the scaring off the numbers of matrix vector product necessary to solve this problem scales with a better exponents than this other approaches. So So So that's interesting feature of the system and next we can see what is the real time to solution to solve this SK instances eso in the last six years, the time institution in seconds to find a grand state of risk. Instances remain answers probability for different state of the art hardware. So in red is the F B g. A presentation proposing this paper and then the other curve represent Ah, brick a local search in in orange and silver lining in purple, for example. And so you see that the scaring off this purpose simulator is is rather good, and that for larger plant sizes we can get orders of magnitude faster than the state of the art approaches. Moreover, the relatively good scanning off the time to search in respect to problem size uh, they indicate that the FPD implementation would be faster than risk. Other recently proposed izing machine, such as the hope you know, natural complimented on memories distance that is very fast for small problem size in blue here, which is very fast for small problem size. But which scanning is not good on the same thing for the restricted Bosman machine. Implementing a PGA proposed by some group in Broken Recently Again, which is very fast for small parliament sizes but which canning is bad so that a dis worse than the proposed approach so that we can expect that for programs size is larger than 1000 spins. The proposed, of course, would be the faster one. Let me jump toe this other slide and another confirmation that the scheme scales well that you can find the maximum cut values off benchmark sets. The G sets better candidates that have been previously found by any other algorithms, so they are the best known could values to best of our knowledge. And, um or so which is shown in this paper table here in particular, the instances, uh, 14 and 15 of this G set can be We can find better converse than previously known, and we can find this can vary is 100 times faster than the state of the art algorithm and CP to do this which is a very common Kasich. It s not that getting this a good result on the G sets, they do not require ah, particular hard tuning of the parameters. So the tuning issuing here is very simple. It it just depends on the degree off connectivity within each graph. And so this good results on the set indicate that the proposed approach would be a good not only at solving escape problems in this problems, but all the types off graph sizing problems on Mexican province in communities. So given that the performance off the design depends on the height of this other tree, we can try to maximize the height of this other tree on a large F p g a onda and carefully routing the components within the P G A and and we can draw some projections of what type of performance we can achieve in the near future based on the, uh, implementation that we are currently working. So here you see projection for the time to solution way, then next property for solving this escape programs respect to the prime assize. And here, compared to different with such publicizing machines, particularly the digital. And, you know, 42 is shown in the green here, the green line without that's and, uh and we should two different, uh, hypothesis for this productions either that the time to solution scales as exponential off n or that the time of social skills as expression of square root off. So it seems, according to the data, that time solution scares more as an expression of square root of and also we can be sure on this and this production show that we probably can solve prime escape problem of science 2000 spins, uh, to find the rial ground state of this problem with 99 success ability in about 10 seconds, which is much faster than all the other proposed approaches. So one of the future plans for this current is in machine simulator. So the first thing is that we would like to make dissimulation closer to the rial, uh, GOP oh, optical system in particular for a first step to get closer to the system of a measurement back. See, I am. And to do this what is, uh, simulate Herbal on the p a is this quantum, uh, condoms Goshen model that is proposed described in this paper and proposed by people in the in the Entity group. And so the idea of this model is that instead of having the very simple or these and have shown previously, it includes paired all these that take into account on me the mean off the awesome leverage off the, uh, European face component, but also their violence s so that we can take into account more quantum effects off the g o p. O, such as the squeezing. And then we plan toe, make the simulator open access for the members to run their instances on the system. There will be a first version in September that will be just based on the simple common line access for the simulator and in which will have just a classic or approximation of the system. We don't know Sturm, binary weights and museum in term, but then will propose a second version that would extend the current arising machine to Iraq off F p g. A, in which we will add the more refined models truncated, ignoring the bottom Goshen model they just talked about on the support in which he valued waits for the rising problems and support the cement. So we will announce later when this is available and and far right is working >>hard comes from Universal down today in physics department, and I'd like to thank the organizers for their kind invitation to participate in this very interesting and promising workshop. Also like to say that I look forward to collaborations with with a file lab and Yoshi and collaborators on the topics of this world. So today I'll briefly talk about our attempt to understand the fundamental limits off another continues time computing, at least from the point off you off bullion satisfy ability, problem solving, using ordinary differential equations. But I think the issues that we raise, um, during this occasion actually apply to other other approaches on a log approaches as well and into other problems as well. I think everyone here knows what Dorien satisfy ability. Problems are, um, you have boolean variables. You have em clauses. Each of disjunction of collaterals literally is a variable, or it's, uh, negation. And the goal is to find an assignment to the variable, such that order clauses are true. This is a decision type problem from the MP class, which means you can checking polynomial time for satisfy ability off any assignment. And the three set is empty, complete with K three a larger, which means an efficient trees. That's over, uh, implies an efficient source for all the problems in the empty class, because all the problems in the empty class can be reduced in Polian on real time to reset. As a matter of fact, you can reduce the NP complete problems into each other. You can go from three set to set backing or two maximum dependent set, which is a set packing in graph theoretic notions or terms toe the icing graphs. A problem decision version. This is useful, and you're comparing different approaches, working on different kinds of problems when not all the closest can be satisfied. You're looking at the accusation version offset, uh called Max Set. And the goal here is to find assignment that satisfies the maximum number of clauses. And this is from the NPR class. In terms of applications. If we had inefficient sets over or np complete problems over, it was literally, positively influenced. Thousands off problems and applications in industry and and science. I'm not going to read this, but this this, of course, gives a strong motivation toe work on this kind of problems. Now our approach to set solving involves embedding the problem in a continuous space, and you use all the east to do that. So instead of working zeros and ones, we work with minus one across once, and we allow the corresponding variables toe change continuously between the two bounds. We formulate the problem with the help of a close metrics. If if a if a close, uh, does not contain a variable or its negation. The corresponding matrix element is zero. If it contains the variable in positive, for which one contains the variable in a gated for Mitt's negative one, and then we use this to formulate this products caused quote, close violation functions one for every clause, Uh, which really, continuously between zero and one. And they're zero if and only if the clause itself is true. Uh, then we form the define in order to define a dynamic such dynamics in this and dimensional hyper cube where the search happens and if they exist, solutions. They're sitting in some of the corners of this hyper cube. So we define this, uh, energy potential or landscape function shown here in a way that this is zero if and only if all the clauses all the kmc zero or the clauses off satisfied keeping these auxiliary variables a EMS always positive. And therefore, what you do here is a dynamics that is a essentially ingredient descend on this potential energy landscape. If you were to keep all the M's constant that it would get stuck in some local minimum. However, what we do here is we couple it with the dynamics we cooperated the clothes violation functions as shown here. And if he didn't have this am here just just the chaos. For example, you have essentially what case you have positive feedback. You have increasing variable. Uh, but in that case, you still get stuck would still behave will still find. So she is better than the constant version but still would get stuck only when you put here this a m which makes the dynamics in in this variable exponential like uh, only then it keeps searching until he finds a solution on deer is a reason for that. I'm not going toe talk about here, but essentially boils down toe performing a Grady and descend on a globally time barren landscape. And this is what works. Now I'm gonna talk about good or bad and maybe the ugly. Uh, this is, uh, this is What's good is that it's a hyperbolic dynamical system, which means that if you take any domain in the search space that doesn't have a solution in it or any socially than the number of trajectories in it decays exponentially quickly. And the decay rate is a characteristic in variant characteristic off the dynamics itself. Dynamical systems called the escape right the inverse off that is the time scale in which you find solutions by this by this dynamical system, and you can see here some song trajectories that are Kelty because it's it's no linear, but it's transient, chaotic. Give their sources, of course, because eventually knowledge to the solution. Now, in terms of performance here, what you show for a bunch off, um, constraint densities defined by M overran the ratio between closes toe variables for random, said Problems is random. Chris had problems, and they as its function off n And we look at money toward the wartime, the wall clock time and it behaves quite value behaves Azat party nominally until you actually he to reach the set on set transition where the hardest problems are found. But what's more interesting is if you monitor the continuous time t the performance in terms off the A narrow, continuous Time t because that seems to be a polynomial. And the way we show that is, we consider, uh, random case that random three set for a fixed constraint density Onda. We hear what you show here. Is that the right of the trash hold that it's really hard and, uh, the money through the fraction of problems that we have not been able to solve it. We select thousands of problems at that constraint ratio and resolve them without algorithm, and we monitor the fractional problems that have not yet been solved by continuous 90. And this, as you see these decays exponentially different. Educate rates for different system sizes, and in this spot shows that is dedicated behaves polynomial, or actually as a power law. So if you combine these two, you find that the time needed to solve all problems except maybe appear traction off them scales foreign or merely with the problem size. So you have paranormal, continuous time complexity. And this is also true for other types of very hard constraints and sexual problems such as exact cover, because you can always transform them into three set as we discussed before, Ramsey coloring and and on these problems, even algorithms like survey propagation will will fail. But this doesn't mean that P equals NP because what you have first of all, if you were toe implement these equations in a device whose behavior is described by these, uh, the keys. Then, of course, T the continue style variable becomes a physical work off. Time on that will be polynomial is scaling, but you have another other variables. Oxidative variables, which structured in an exponential manner. So if they represent currents or voltages in your realization and it would be an exponential cost Al Qaeda. But this is some kind of trade between time and energy, while I know how toe generate energy or I don't know how to generate time. But I know how to generate energy so it could use for it. But there's other issues as well, especially if you're trying toe do this son and digital machine but also happens. Problems happen appear. Other problems appear on in physical devices as well as we discuss later. So if you implement this in GPU, you can. Then you can get in order off to magnitude. Speed up. And you can also modify this to solve Max sad problems. Uh, quite efficiently. You are competitive with the best heuristic solvers. This is a weather problems. In 2016 Max set competition eso so this this is this is definitely this seems like a good approach, but there's off course interesting limitations, I would say interesting, because it kind of makes you think about what it means and how you can exploit this thes observations in understanding better on a low continues time complexity. If you monitored the discrete number the number of discrete steps. Don't buy the room, Dakota integrator. When you solve this on a digital machine, you're using some kind of integrator. Um and you're using the same approach. But now you measure the number off problems you haven't sold by given number of this kid, uh, steps taken by the integrator. You find out you have exponential, discrete time, complexity and, of course, thistles. A problem. And if you look closely, what happens even though the analog mathematical trajectory, that's the record here. If you monitor what happens in discrete time, uh, the integrator frustrates very little. So this is like, you know, third or for the disposition, but fluctuates like crazy. So it really is like the intervention frees us out. And this is because of the phenomenon of stiffness that are I'll talk a little bit a more about little bit layer eso. >>You know, it might look >>like an integration issue on digital machines that you could improve and could definitely improve. But actually issues bigger than that. It's It's deeper than that, because on a digital machine there is no time energy conversion. So the outside variables are efficiently representing a digital machine. So there's no exponential fluctuating current of wattage in your computer when you do this. Eso If it is not equal NP then the exponential time, complexity or exponential costs complexity has to hit you somewhere. And this is how um, but, you know, one would be tempted to think maybe this wouldn't be an issue in a analog device, and to some extent is true on our devices can be ordered to maintain faster, but they also suffer from their own problems because he not gonna be affect. That classes soldiers as well. So, indeed, if you look at other systems like Mirandizing machine measurement feedback, probably talk on the grass or selected networks. They're all hinge on some kind off our ability to control your variables in arbitrary, high precision and a certain networks you want toe read out across frequencies in case off CM's. You required identical and program because which is hard to keep, and they kind of fluctuate away from one another, shift away from one another. And if you control that, of course that you can control the performance. So actually one can ask if whether or not this is a universal bottleneck and it seems so aside, I will argue next. Um, we can recall a fundamental result by by showing harder in reaction Target from 1978. Who says that it's a purely computer science proof that if you are able toe, compute the addition multiplication division off riel variables with infinite precision, then you could solve any complete problems in polynomial time. It doesn't actually proposals all where he just chose mathematically that this would be the case. Now, of course, in Real warned, you have also precision. So the next question is, how does that affect the competition about problems? This is what you're after. Lots of precision means information also, or entropy production. Eso what you're really looking at the relationship between hardness and cost of computing off a problem. Uh, and according to Sean Hagar, there's this left branch which in principle could be polynomial time. But the question whether or not this is achievable that is not achievable, but something more cheerful. That's on the right hand side. There's always going to be some information loss, so mental degeneration that could keep you away from possibly from point normal time. So this is what we like to understand, and this information laws the source off. This is not just always I will argue, uh, in any physical system, but it's also off algorithm nature, so that is a questionable area or approach. But China gets results. Security theoretical. No, actual solar is proposed. So we can ask, you know, just theoretically get out off. Curiosity would in principle be such soldiers because it is not proposing a soldier with such properties. In principle, if if you want to look mathematically precisely what the solar does would have the right properties on, I argue. Yes, I don't have a mathematical proof, but I have some arguments that that would be the case. And this is the case for actually our city there solver that if you could calculate its trajectory in a loss this way, then it would be, uh, would solve epic complete problems in polynomial continuous time. Now, as a matter of fact, this a bit more difficult question, because time in all these can be re scared however you want. So what? Burns says that you actually have to measure the length of the trajectory, which is a new variant off the dynamical system or property dynamical system, not off its parameters ization. And we did that. So Suba Corral, my student did that first, improving on the stiffness off the problem off the integrations, using implicit solvers and some smart tricks such that you actually are closer to the actual trajectory and using the same approach. You know what fraction off problems you can solve? We did not give the length of the trajectory. You find that it is putting on nearly scaling the problem sites we have putting on your skin complexity. That means that our solar is both Polly length and, as it is, defined it also poorly time analog solver. But if you look at as a discreet algorithm, if you measure the discrete steps on a digital machine, it is an exponential solver. And the reason is because off all these stiffness, every integrator has tow truck it digitizing truncate the equations, and what it has to do is to keep the integration between the so called stability region for for that scheme, and you have to keep this product within a grimace of Jacoby in and the step size read in this region. If you use explicit methods. You want to stay within this region? Uh, but what happens that some off the Eigen values grow fast for Steve problems, and then you're you're forced to reduce that t so the product stays in this bonded domain, which means that now you have to you're forced to take smaller and smaller times, So you're you're freezing out the integration and what I will show you. That's the case. Now you can move to increase its soldiers, which is which is a tree. In this case, you have to make domain is actually on the outside. But what happens in this case is some of the Eigen values of the Jacobean, also, for six systems, start to move to zero. As they're moving to zero, they're going to enter this instability region, so your soul is going to try to keep it out, so it's going to increase the data T. But if you increase that to increase the truncation hours, so you get randomized, uh, in the large search space, so it's it's really not, uh, not going to work out. Now, one can sort off introduce a theory or language to discuss computational and are computational complexity, using the language from dynamical systems theory. But basically I I don't have time to go into this, but you have for heart problems. Security object the chaotic satellite Ouch! In the middle of the search space somewhere, and that dictates how the dynamics happens and variant properties off the dynamics. Of course, off that saddle is what the targets performance and many things, so a new, important measure that we find that it's also helpful in describing thesis. Another complexity is the so called called Makarov, or metric entropy and basically what this does in an intuitive A eyes, uh, to describe the rate at which the uncertainty containing the insignificant digits off a trajectory in the back, the flow towards the significant ones as you lose information because off arrows being, uh grown or are developed in tow. Larger errors in an exponential at an exponential rate because you have positively up north spawning. But this is an in variant property. It's the property of the set of all. This is not how you compute them, and it's really the interesting create off accuracy philosopher dynamical system. A zay said that you have in such a high dimensional that I'm consistent were positive and negatively upon of exponents. Aziz Many The total is the dimension of space and user dimension, the number off unstable manifold dimensions and as Saddam was stable, manifold direction. And there's an interesting and I think, important passion, equality, equality called the passion, equality that connect the information theoretic aspect the rate off information loss with the geometric rate of which trajectory separate minus kappa, which is the escape rate that I already talked about. Now one can actually prove a simple theorems like back off the envelope calculation. The idea here is that you know the rate at which the largest rated, which closely started trajectory separate from one another. So now you can say that, uh, that is fine, as long as my trajectory finds the solution before the projective separate too quickly. In that case, I can have the hope that if I start from some region off the face base, several close early started trajectories, they kind of go into the same solution orphaned and and that's that's That's this upper bound of this limit, and it is really showing that it has to be. It's an exponentially small number. What? It depends on the end dependence off the exponents right here, which combines information loss rate and the social time performance. So these, if this exponents here or that has a large independence or river linear independence, then you then you really have to start, uh, trajectories exponentially closer to one another in orderto end up in the same order. So this is sort off like the direction that you're going in tow, and this formulation is applicable toe all dynamical systems, uh, deterministic dynamical systems. And I think we can We can expand this further because, uh, there is, ah, way off getting the expression for the escaped rate in terms off n the number of variables from cycle expansions that I don't have time to talk about. What? It's kind of like a program that you can try toe pursuit, and this is it. So the conclusions I think of self explanatory I think there is a lot of future in in, uh, in an allo. Continue start computing. Um, they can be efficient by orders of magnitude and digital ones in solving empty heart problems because, first of all, many of the systems you like the phone line and bottleneck. There's parallelism involved, and and you can also have a large spectrum or continues time, time dynamical algorithms than discrete ones. And you know. But we also have to be mindful off. What are the possibility of what are the limits? And 11 open question is very important. Open question is, you know, what are these limits? Is there some kind off no go theory? And that tells you that you can never perform better than this limit or that limit? And I think that's that's the exciting part toe to derive thes thes this levian 10.
SUMMARY :
bifurcated critical point that is the one that I forget to the lowest pump value a. the chi to non linearity and see how and when you can get the Opio know that the classical approximation of the car testing machine, which is the ground toe, than the state of the art algorithm and CP to do this which is a very common Kasich. right the inverse off that is the time scale in which you find solutions by first of all, many of the systems you like the phone line and bottleneck.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Exxon Mobil | ORGANIZATION | 0.99+ |
Andy | PERSON | 0.99+ |
Sean Hagar | PERSON | 0.99+ |
Daniel Wennberg | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
USC | ORGANIZATION | 0.99+ |
Caltech | ORGANIZATION | 0.99+ |
2016 | DATE | 0.99+ |
100 times | QUANTITY | 0.99+ |
Berkeley | LOCATION | 0.99+ |
Tatsuya Nagamoto | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
1978 | DATE | 0.99+ |
Fox | ORGANIZATION | 0.99+ |
six systems | QUANTITY | 0.99+ |
Harvard | ORGANIZATION | 0.99+ |
Al Qaeda | ORGANIZATION | 0.99+ |
September | DATE | 0.99+ |
second version | QUANTITY | 0.99+ |
CIA | ORGANIZATION | 0.99+ |
India | LOCATION | 0.99+ |
300 yards | QUANTITY | 0.99+ |
University of Tokyo | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Burns | PERSON | 0.99+ |
Atsushi Yamamura | PERSON | 0.99+ |
0.14% | QUANTITY | 0.99+ |
48 core | QUANTITY | 0.99+ |
0.5 microseconds | QUANTITY | 0.99+ |
NSF | ORGANIZATION | 0.99+ |
15 years | QUANTITY | 0.99+ |
CBS | ORGANIZATION | 0.99+ |
NTT | ORGANIZATION | 0.99+ |
first implementation | QUANTITY | 0.99+ |
first experiment | QUANTITY | 0.99+ |
123 | QUANTITY | 0.99+ |
Army Research Office | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
1,904,711 | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
first version | QUANTITY | 0.99+ |
Steve | PERSON | 0.99+ |
2000 spins | QUANTITY | 0.99+ |
five researcher | QUANTITY | 0.99+ |
Creole | ORGANIZATION | 0.99+ |
three set | QUANTITY | 0.99+ |
second part | QUANTITY | 0.99+ |
third part | QUANTITY | 0.99+ |
Department of Applied Physics | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
85,900 | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
one problem | QUANTITY | 0.99+ |
136 CPU | QUANTITY | 0.99+ |
Toshiba | ORGANIZATION | 0.99+ |
Scott | PERSON | 0.99+ |
2.4 gigahertz | QUANTITY | 0.99+ |
1000 times | QUANTITY | 0.99+ |
two times | QUANTITY | 0.99+ |
two parts | QUANTITY | 0.99+ |
131 | QUANTITY | 0.99+ |
14,233 | QUANTITY | 0.99+ |
more than 100 spins | QUANTITY | 0.99+ |
two possible phases | QUANTITY | 0.99+ |
13,580 | QUANTITY | 0.99+ |
5 | QUANTITY | 0.99+ |
4 | QUANTITY | 0.99+ |
one microseconds | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
first part | QUANTITY | 0.99+ |
500 spins | QUANTITY | 0.99+ |
two identical photons | QUANTITY | 0.99+ |
3 | QUANTITY | 0.99+ |
70 years ago | DATE | 0.99+ |
Iraq | LOCATION | 0.99+ |
one experiment | QUANTITY | 0.99+ |
zero | QUANTITY | 0.99+ |
Amir Safarini Nini | PERSON | 0.99+ |
Saddam | PERSON | 0.99+ |
Susan Wilson, Informatica & Blake Andrews, New York Life | MIT CDOIQ 2019
(techno music) >> From Cambridge, Massachusetts, it's theCUBE. Covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts everybody, we're here with theCUBE at the MIT Chief Data Officer Information Quality Conference. I'm Dave Vellante with my co-host Paul Gillin. Susan Wilson is here, she's the vice president of data governance and she's the leader at Informatica. Blake Anders is the corporate vice president of data governance at New York Life. Folks, welcome to theCUBE, thanks for coming on. >> Thank you. >> Thank you. >> So, Susan, interesting title; VP, data governance leader, Informatica. So, what are you leading at Informatica? >> We're helping our customers realize their business outcomes and objectives. Prior to joining Informatica about 7 years ago, I was actually a customer myself, and so often times I'm working with our customers to understand where they are, where they going, and how to best help them; because we recognize data governance is more than just a tool, it's a capability that represents people, the processes, the culture, as well as the technology. >> Yeah so you've walked the walk, and you can empathize with what your customers are going through. And Blake, your role, as the corporate VP, but more specifically the data governance lead. >> Right, so I lead the data governance capabilities and execution group at New York Life. We're focused on providing skills and tools that enable government's activities across the enterprise at the company. >> How long has that function been in place? >> We've been in place for about two and half years now. >> So, I don't know if you guys heard Mark Ramsey this morning, the key-note, but basically he said, okay, we started with enterprise data warehouse, we went to master data management, then we kind of did this top-down enterprise data model; that all failed. So we said, all right, let's pump the governance. Here you go guys, you fix our corporate data problem. Now, right tool for the right job but, and so, we were sort of joking, did data governance fail? No, you always have to have data governance. It's like brushing your teeth. But so, like I said, I don't know if you heard that, but what are your thoughts on that sort of evolution that he described? As sort of, failures of things like EDW to live up to expectations and then, okay guys over to you. Is that a common theme? >> It is a common theme, and what we're finding with many of our customers is that they had tried many of the, if you will, the methodologies around data governance, right? Around policies and structures. And we describe this as the Data 1.0 journey, which was more application-centric reporting to Data 2.0 to data warehousing. And a lot of the failed attempts, if you will, at centralizing, if you will, all of your data, to now Data 3.0, where we look at the explosion of data, the volumes of data, the number of data consumers, the expectations of the chief data officer to solve business outcomes; crushing under the scale of, I can't fit all of this into a centralized data at repository, I need something that will help me scale and to become more agile. And so, that message does resonate with us, but we're not saying data warehouses don't exist. They absolutely do for trusted data sources, but the ability to be agile and to address many of your organizations needs and to be able to service multiple consumers is top-of-mind for many of our customers. >> And the mind set from 1.0 to 2.0 to 3.0 has changed. From, you know, data as a liability, to now data as this massive asset. It's sort of-- >> Value, yeah. >> Yeah, and the pendulum is swung. It's almost like a see-saw. Where, and I'm not sure it's ever going to flip back, but it is to a certain extent; people are starting to realize, wow, we have to be careful about what we do with our data. But still, it's go, go, go. But, what's the experience at New York Life? I mean, you know. A company that's been around for a long time, conservative, wants to make sure risk averse, obviously. >> Right. >> But at the same time, you want to keep moving as the market moves. >> Right, and we look at data governance as really an enabler and a value-add activity. We're not a governance practice for the sake of governance. We're not there to create a lot of policies and restrictions. We're there to add value and to enable innovation in our business and really drive that execution, that efficiency. >> So how do you do that? Square that circle for me, because a lot of people think, when people think security and governance and compliance they think, oh, that stifles innovation. How do you make governance an engine of innovation? >> You provide transparency around your data. So, it's transparency around, what does the data mean? What data assets do we have? Where can I find that? Where are my most trusted sources of data? What does the quality of that data look like? So all those things together really enable your data consumers to take that information and create new value for the company. So it's really about enabling your value creators throughout the organization. >> So data is an ingredient. I can tell you where it is, I can give you some kind of rating as to the quality of that data and it's usefulness. And then you can take it and do what you need to do with it in your specific line of business. >> That's right. >> Now you said you've been at this two and half years, so what stages have you gone through since you first began the data governance initiative. >> Sure, so our first year, year and half was really focused on building the foundations, establishing the playbook for data governance and building our processes and understanding how data governance needed to be implemented to fit New York Life in the culture of the company. The last twelve months or so has really been focused on operationalizing governance. So we've got the foundations in place, now it's about implementing tools to further augment those capabilities and help assist our data stewards and give them a better skill set and a better tool set to do their jobs. >> Are you, sort of, crowdsourcing the process? I mean, you have a defined set of people who are responsible for governance, or is everyone taking a role? >> So, it is a two-pronged approach, we do have dedicated data stewards. There's approximately 15 across various lines of business throughout the company. But, we are building towards a data democratization aspect. So, we want people to be self-sufficient in finding the data that they need and understanding the data. And then, when they have questions, relying on our stewards as a network of subject matter experts who also have some authorizations to make changes and adapt the data as needed. >> Susan, one of the challenges that we see is that the chief data officers often times are not involved in some of these skunkworks AI projects. They're sort of either hidden, maybe not even hidden, but they're in the line of business, they're moving. You know, there's a mentality of move fast and break things. The challenge with AI is, if you start operationalizing AI and you're breaking things without data quality, without data governance, you can really affect lives. We've seen it. In one of these unintended consequences. I mean, Facebook is the obvious example and there are many, many others. But, are you seeing that? How are you seeing organizations dealing with that problem? >> As Blake was mentioning often times what it is about, you've got to start with transparency, and you got to start with collaborating across your lines of businesses, including the data scientists, and including in terms of what they are doing. And actually provide that level of transparency, provide a level of collaboration. And a lot of that is through the use of our technology enablers to basically go out and find where the data is and what people are using and to be able to provide a mechanism for them to collaborate in terms of, hey, how do I get access to that? I didn't realize you were the SME for that particular component. And then also, did you realize that there is a policy associated to the data that you're managing and it can't be shared externally or with certain consumer data sets. So, the objective really is around how to create a platform to ensure that any one in your organization, whether I'm in the line of business, that I don't have a technical background, or someone who does have a technical background, they can come and access and understand that information and connect with their peers. >> So you're helping them to discover the data. What do you do at that stage? >> What we do at that stage is, creating insights for anyone in the organization to understand it from an impact analysis perspective. So, for example, if I'm going to make changes, to as well as discovery. Where exactly is my information? And so we have-- >> Right. How do you help your customers discover that data? >> Through machine learning and artificial intelligence capabilities of our, specifically, our data catalog, that allows us to do that. So we use such things like similarity based matching which help us to identify. It doesn't have to be named, in miscellaneous text one, it could be named in that particular column name. But, in our ability to scan and discover we can identify in that column what is potentially social security number. It might have resided over years of having this data, but you may not realize that it's still stored there. Our ability to identify that and report that out to the data stewards as well as the data analysts, as well as to the privacy individuals is critical. So, with that being said, then they can actually identify the appropriate policies that need to be adhered to, alongside with it in terms of quality, in terms of, is there something that we need to archive. So that's where we're helping our customers in that aspect. >> So you can infer from the data, the meta data, and then, with a fair degree of accuracy, categorize it and automate that. >> Exactly. We've got a customer that actually ran this and they said that, you know, we took three people, three months to actually physically tag where all this information existed across something like 7,000 critical data elements. And, basically, after the set up and the scanning procedures, within seconds we were able to get within 90% precision. Because, again, we've dealt a lot with meta data. It's core to our artificial intelligence and machine learning. And it's core to how we built out our platforms to share that meta data, to do something with that meta data. It's not just about sharing the glossary and the definition information. We also want to automate and reduce the manual burden. Because we recognize with that scale, manual documentation, manual cataloging and tagging just, >> It doesn't work. >> It doesn't work. It doesn't scale. >> Humans are bad at it. >> They're horrible at it. >> So I presume you have a chief data officer at New York Life, is that correct? >> We have a chief data and analytics officer, yes. >> Okay, and you work within that group? >> Yes, that is correct. >> Do you report it to that? >> Yes, so-- >> And that individual, yeah, describe the organization. >> So that sits in our lines of business. Originally, our data governance office sat in technology. And then, our early 2018 we actually re-orged into the business under the chief data and analytics officer when that role was formed. So we sit under that group along with a data solutions and governance team that includes several of our data stewards and also some others, some data engineer-type roles. And then, our center for data science and analytics as well that contains a lot of our data science teams in that type of work. >> So in thinking about some of these, I was describing to Susan, as these skunkworks projects, is the data team, the chief data officer's team involved in those projects or is it sort of a, go run water through the pipes, get an MVP and then you guys come in. How does that all work? >> We're working to try to centralize that function as much as we can, because we do believe there's value in the left hand knowing what the right hand is doing in those types of things. So we're trying to build those communications channels and build that network of data consumers across the organization. >> It's hard right? >> It is. >> Because the line of business wants to move fast, and you're saying, hey, we can help. And they think you're going to slow them down, but in fact, you got to make the case and show the success because you're actually not going to slow them down to terms of the ultimate outcome. I think that's the case that you're trying to make, right? >> And that's one of the things that we try to really focus on and I think that's one of the advantages to us being embedded in the business under the CDAO role, is that we can then say our objectives are your objectives. We are here to add value and to align with what you're working on. We're not trying to slow you down or hinder you, we're really trying to bring more to the table and augment what you're already trying to achieve. >> Sometimes getting that organization right means everything, as we've seen. >> Absolutely. >> That's right. >> How are you applying governance discipline to unstructured data? >> That's actually something that's a little bit further down our road map, but one of the things that we have started doing is looking at our taxonomy's for structured data and aligning those with the taxonomy's that we're using to classify unstructured data. So, that's something we're in the early stages with, so that when we get to that process of looking at more of our unstructured content, we can, we already have a good feel for there's alignment between the way that we think about and organize those concepts. >> Have you identified automation tools that can help to bring structure to that unstructured data? >> Yes, we have. And there are several tools out there that we're continuing to investigate and look at. But, that's one of the key things that we're trying to achieve through this process is bringing structure to unstructured content. >> So, the conference. First year at the conference. >> Yes. >> Kind of key take aways, things that interesting to you, learnings? >> Oh, yes, well the number of CDO's that are here and what's top of mind for them. I mean, it ranges from, how do I stand up my operating model? We just had a session just about 30 minutes ago. A lot of questions around, how do I set up my organization structure? How do I stand up my operating model so that I could be flexible? To, right, the data scientists, to the folks that are more traditional in structured and trusted data. So, still these things are top-of-mind and because they're recognizing the market is also changing too. And the growing amount of expectations, not only solving business outcomes, but also regulatory compliance, privacy is also top-of-mind for a lot of customers. In terms of, how would I get started? And what's the appropriate structure and mechanism for doing so? So we're getting a lot of those types of questions as well. So, the good thing is many of us have had years of experience in this phase and the convergence of us being able to support our customers, not only in our principles around how we implement the framework, but also the technology is really coming together very nicely. >> Anything you'd add, Blake? >> I think it's really impressive to see the level of engagement with thought leaders and decision makers in the data space. You know, as Susan mentioned, we just got out of our session and really, by the end of it, it turned into more of an open discussion. There was just this kind of back and forth between the participants. And so it's really engaging to see that level of passion from such a distinguished group of individuals who are all kind of here to share thoughts and ideas. >> Well anytime you come to a conference, it's sort of any open forum like this, you learn a lot. When you're at MIT, it's like super-charged. With the big brains. >> Exactly, you feel it when you come on the campus. >> You feel smarter when you walk out of here. >> Exactly, I know. >> Well, guys, thanks so much for coming to theCUBE. It was great to have you. >> Thank you for having us. We appreciate it, thank you. >> You're welcome. All right, keep it right there everybody. Paul and I will be back with our next guest. You're watching theCUBE from MIT in Cambridge. We'll be right back. (techno music)
SUMMARY :
Brought to you by SiliconANGLE Media. Susan Wilson is here, she's the vice president So, what are you leading at Informatica? and how to best help them; but more specifically the data governance lead. Right, so I lead the data governance capabilities and then, okay guys over to you. And a lot of the failed attempts, if you will, And the mind set from 1.0 to 2.0 to 3.0 has changed. Where, and I'm not sure it's ever going to flip back, But at the same time, Right, and we look at data governance So how do you do that? What does the quality of that data look like? and do what you need to do with it so what stages have you gone through in the culture of the company. in finding the data that they need is that the chief data officers often times and to be able to provide a mechanism What do you do at that stage? So, for example, if I'm going to make changes, How do you help your customers discover that data? and report that out to the data stewards and then, with a fair degree of accuracy, categorize it And it's core to how we built out our platforms It doesn't work. And that individual, And then, our early 2018 we actually re-orged is the data team, the chief data officer's team and build that network of data consumers but in fact, you got to make the case and show the success and to align with what you're working on. Sometimes getting that organization right but one of the things that we have started doing is bringing structure to unstructured content. So, the conference. And the growing amount of expectations, and decision makers in the data space. it's sort of any open forum like this, you learn a lot. when you come on the campus. Well, guys, thanks so much for coming to theCUBE. Thank you for having us. Paul and I will be back with our next guest.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Paul Gillin | PERSON | 0.99+ |
Susan | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
Susan Wilson | PERSON | 0.99+ |
Blake | PERSON | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Cambridge | LOCATION | 0.99+ |
Mark Ramsey | PERSON | 0.99+ |
Blake Anders | PERSON | 0.99+ |
three months | QUANTITY | 0.99+ |
three people | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
New York Life | ORGANIZATION | 0.99+ |
early 2018 | DATE | 0.99+ |
Cambridge, Massachusetts | LOCATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
First year | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
two and half years | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
approximately 15 | QUANTITY | 0.98+ |
7,000 critical data elements | QUANTITY | 0.97+ |
about two and half years | QUANTITY | 0.97+ |
first year | QUANTITY | 0.96+ |
two | QUANTITY | 0.96+ |
about 30 minutes ago | DATE | 0.96+ |
theCUBE | ORGANIZATION | 0.95+ |
Blake Andrews | PERSON | 0.95+ |
MIT Chief Data Officer and | EVENT | 0.93+ |
MIT Chief Data Officer Information Quality Conference | EVENT | 0.91+ |
EDW | ORGANIZATION | 0.86+ |
last twelve months | DATE | 0.86+ |
skunkworks | ORGANIZATION | 0.85+ |
CDAO | ORGANIZATION | 0.85+ |
this morning | DATE | 0.83+ |
MIT | ORGANIZATION | 0.83+ |
7 years ago | DATE | 0.78+ |
year | QUANTITY | 0.75+ |
Information Quality Symposium 2019 | EVENT | 0.74+ |
3.0 | OTHER | 0.66+ |
York Life | ORGANIZATION | 0.66+ |
2.0 | OTHER | 0.59+ |
MIT CDOIQ 2019 | EVENT | 0.58+ |
half | QUANTITY | 0.52+ |
Data 2.0 | OTHER | 0.52+ |
Data 3.0 | TITLE | 0.45+ |
1.0 | OTHER | 0.43+ |
Data | OTHER | 0.21+ |
Stewart Bond, IDC | MIT CDOIQ 2019
>> from Cambridge, Massachusetts. It's three Cube covering M. I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back to M I. T. CDO I Q everybody, you're watching the cube we got. We go out to the events we extract the signal from the noise is day one of this conference. Chief Data Officer event. I'm Dave, along with my co host, Paul Gillen. Stuart Bond is here is a research director of International Data Corporation I DC Stewart. Welcome to the Cube. Thanks for coming on. Thank you for having me. You're very welcome. So your space data intelligence tell us about your swim lane? Sure. >> So my role it I. D. C is a ZAY. Follow the data integration and data intelligence software market. So I follow all the different vendors in the market. I look at what kinds of solutions they're bringing to market, what kinds of problems. They're solving both business and technical for their clients. And so I can then report on the trends and market sizes, forecasts and such, And within that part of what I what I cover is everything from data integration which is more than traditionally E T l change data capture data movements, data, virtualization types of technologies as well as what we call date integrity of one. And I'm calling data intelligence, which is all of the Tell the metadata about the data. It's the data catalogs meditating management's data lineage. It's the data quality data profiling, master data intelligence. It's all of the data about the data and understanding really answering what I call a entering the five W's and h of data. It's the who, what, where, when, why and how. Data. So that's the market that I'm covering and following, and that's why I'm >> here. Were you here this morning for Mark Ramsey's Yes, I talk. So he kind of went to you. Heard it started with the D W kind of through E T L under the bus. Well, MGM, then the Enterprise data model said all that failed. But that stuff's not going away, and I'm sure they're black. So still using, you know, all those all that tooling today. So what was your reaction to that you were not in your head and yeah, it's true or saying, Well, maybe there's a little we'll have what we've been saying. The mainframe is gonna go away for years and >> still around, so I think they're obviously there's still those technologies out there and they're still being used. You can look at any of the major dtl vendors and there's new ones coming to the market, so that's still alive and well. There's no doubt that it's out there and its biggest segment of the market that I followed. So there's no source tooling, right? Yes, >> there's no doubt that it's still >> there. But Mark's vision of where things are going, where things are heading with, with data intelligence really being at the Cory talk about those spiders talked about that central depository of information about knowledge of the data. That's where things are heading to, whether you call it a data hub, whether you call it a date, a platform, not really a one big, huge data pop for one big, huge data depository, but one a place where you can go to get the information but natives you can find out where the data is. You could find out what it means, both the business context as well as the technical information you find out who's using that data. You can find out when it's being used, Why it's being used in. Why do we even have it and how it should >> be used? So it's being used >> appropriately. So you would say that his vision, actually what he implemented was visionary skating. They skated to the puck, so to speak, and that's we're going >> to see more of that. Where are seeing more of that? That's why we've seen such a jump in the number of vendors that air providing data catalogue solutions. I did, Uh, I d. C has this work product calling market glance. I did that >> beginning of 2018. >> I just did it again. In the middle of this year, the number of vendors that offer data catalogue solutions has significantly interest 240% increase in the number of vendors that offer that now itself of a small base. These air, not exhaustive studies. It may be that I didn't know about all those data catalogue vendors a year and 1/2 ago, but may also be that people are now saying that we've got a data catalogue, >> but you've really got a >> peel back the layers a little bit. Understand what these different data catalysts are and what they're doing because not all of them are crediting. >> We'll hear Radar. You don't know about it. 99% of the world mark talked this morning about some interesting new technologies. They were using Spider Ring to find the data bots to classify the data tools wrangle the data. I mean, there's a lot of new technology being applied to this area. What? Which of those technologies do you think has the greatest promise right now? And how? How how automated can this process become? >> It's the spider ring, and it's the cataloging of the data. It's understanding what you've got out there that is growing crazy. Just started to track that it's growing a lot that has the most promised. And as I said, I think that's going to be the data platform in the future. Is the intelligence knowing about where your data is? You men go on, get it. You know it's not a matter of all. The data is one place anymore. Data's everywhere Date is in hybrid cloud. It's in on premise. It's in private. Cloud isn't hosted. It's everywhere. I just did a survey. I got the results back in June 2019 just a month ago, and the data is all over the place. So really having that knowledge having that intelligence about where your data is, that has the most promise. As faras, the automation is concerned. Next step there. It's not just about collecting the information about where your data is, but it's actually applying the analytics, the machine learning and the artificial intelligence to that metadata collection that you've got so that you can then start to create those bots to create those pipelines to start to automate those tasks. We're starting to see some vendors move in that area, moving that direction. There's a lot of promise there >> you guys, at least when I remember. You see, the software is pretty robust taxonomy. I'm sure it's evolved over the years. So how do you sort of define your space? I'm interested in How big is that space, you know, in terms of market size and is a growing and where do you see it going? >> Right. So my my coverage of data integration and data intelligence is fairly small. It's a small, little marketed. I D. C. I'm part of a larger team that looks a data management, the analytics and information management. So we've got people on our team like a damn vessel. Who covers the analytics? Advanced Analytics show Nautical Palo Carlson. He's been on the cable covers, innovative technologies, those I apologize. I don't have that number off the top. >> Okay, No, But your space, my space is it. That's that Software market is so fragmented. And what I d. C has always done well, as you put people on those fragments and you know, deep in there. So So how you've been ableto not make your eyes bleed when you do that, challenging so the data and put it all together. >> It's important. Integration markets about 66 and 1/2 1,000,000,000 >> dollars. Substantial size. Yeah, but again, a lot of vendors Growing number of events in the markets growing, >> the market continues to grow as the data is becoming more distributed, more dispersed. There's no need to continue to integrate that data. There's also that need that growing >> need for that date intelligence. It's not >> just, you know, we've had a lot of enquiries lately about data being fed into machine learning artificial intelligence and people realizing our data isn't clean. We have to clean up our data because we're garbage in garbage. Out is probably more important now than ever before because you don't have someone saying, I don't think that day is right. You've got machines were looking at data instead. The technology that's out there and the problem with data quality. It's on a new problem. It's the same problem we've had for years. All of the technology is there to clean that data up, and that's a part of what I saw. I look at the data quality vendors experience here, sink sort in all of the other data quality capabilities that you get from in from Attica, from Tahoe or from a click podium. Metal is there, and so that part is growing. And there's a lot of more interest in that data quality and that data intelligence side again so the right data can be used. Good data can be used to trust in that data. Can the increase we used for the right reasons as well That's adding that context. Understand that Samantha having all that metadata that goes around that data so that could be used. Most of >> it is one of those markets that you may be relatively small. It's not 100,000,000,000 but it it enables a lot of larger markets. So okay, so it's 66 and 1/2 1,000,000,000 it's growing. It is a growing single digits, double digits. It's growing. It's hovering around the double dip double. It is okay, it's 10%. And then and then who were the, You know, big players who was driving the shares there? Is there a dominant player there? Bunch of >> so infirm. Atticus Number one in the market. Okay, followed by IBM. And I say peas right up there. Sass is there. Tell End is making a good Uh, okay, they're making a nice with Yeah, but there there's a number of different players. There's There's a lot of different players in that market. >> And in the leading market share player has what, 10%? 15%? 50%? Is it like a dominant divine spot? That's tough to say. You got a big It's over 1,000,000,000,000,000,000 right? So they've got maybe 1/6 of the market. Okay, so but it's not like Cisco as 2/3 of the networking market or anything like that. And what about the cloud guys? A participating in this guy's deal with >> the cloud guys? Yeah, the ClA got so there are some pure cloud solutions. There's a relative, for example. Pure cloud MBM mastered a management there. There's I'd say there's less pure cloud than there used to be. But, you know, but someone like an infra matic is really pushing that clouds presence in that cloud >> running these tools, this tooling in in the cloud But the cloud guys directly or not competing at this >> point. So Amazon Google? Yes, Those cloud guys. Yes. Okay, there, there. Google announced data flow back in our data. Sorry. Data fusion back. Google. >> Yeah, that's right. >> And so there they've got an e t l two on the cloud now. Ah, Amazon has blue yet which is both a catalog and an e t l tool. Microsoft course has data factory in azure. >> So those guys are coming on. I'm guessing if you talk to in dramatic and they said, Well, they're not as robust as we are. And we got a big install base and we go multi cloud is that kind of posturing of the incumbents or yeah, that's posturing. And maybe that's I don't mean it is a pejorative. If I were, those guys would be doing the same thing. You know, we were talking earlier about how the cloud guys essentially killed the Duke. All right, do you Do you see the same thing happening here, or is it well, the will the tool vendors be able to stay ahead in your view, >> depends on how they execute. If they're there and they're available in the cloud along with along with those clapper viers, they're able to provide solutions in the same same way the same elasticity, the same type of consumption based pricing models that pod vendors air offering. They can compete with that. They still have a better solution. Easton What >> in multi cloud in hybrid is a big part of their value problems that the cloud guys aren't really going hard after. I mean, this sort of dangling your toe in the water, some of them some of the >> cloud guys they have. They have the hybrid capabilities because they've got some of what they're what they built comes from on premises, worlds as well. So they've got that ability. Microsoft in particular >> on Google, >> Google that the data fusion came out of >> You're saying, But it's part of the Antos initiative. Er, >> um, I apologize. Folks are watching, >> but soup of acronyms notices We're starting a little bit. What tools have you seen or technology? Have you seen making governance of unstructured data? That looks promising? Uh, so I don't really cover >> the instructor data space that much. What I can say is Justus in the structure data world. It's about the metadata. It's about having the proper tags about that unstructured data. It's about getting the information of that unstructured data so that it can then be governed appropriately, making structure out of that, that is, I can't really say, because I don't cover that market explicitly. But I think again it comes back to the same type of data intelligence having that intelligence about that data by understanding what's in there. >> What advice are you giving to, you know, the buyers in your community and the sellers in your community, >> So the buyer's within the market. I talk a lot about that. The need for that data intelligence, so data governance to me is not a technology you can't go by data governance data governance is an organizational disappoint. Technology is a part of that. To me, the data intelligence technology is a part of that. So, really, organizations, if they really want a good handle, get a good handle on what data they have, how to use that, how to be enabled by that data. They need to have that date intelligence into go look for solutions that can help him pull that data intelligence out. But the other part of that is measurement. It's critical to measure because you can't improve what you're not measuring. So you know that type of approach to it is critical Eve, and you've got to be able to have people in the organization. You've got to be able to have cooperation collaboration across the business. I t. The the gifted office chief Officer office. You've gotta have that collaboration. You've gotta have accountability and for in order for that, to really be successful. For the vendors in the space hybrid is the new reality. In my survey data, it shows clearly that hybrid is where things are. It's not just cloud, it's not just on promise Tiebreak. That's where the future is. They've got to be able to have solutions that work in that environment. Working that hybrid cloud ability has got to be able to have solutions that can be purchased and used again in the same sort of elastic type of method that they're able to get consumers able to get. Service is from other vendors in that same >> height, so we gotta run. Thank you so much for sharing your insights and your data. And I know we were fired. I was firing a lot of questions. Did pretty well, not having the report in front of me. I know what that's like. So thank you for sharing and good luck with your challenges in the future. You got You got a lot of a lot of data to collect and a lot of fast moving markets. So come back any time. Share with you right now, Okay? And thank you for watching Paul and I will be back with our next guest right after this short break from M I t cdo. Right back
SUMMARY :
Brought to you by Silicon Angle Media. We go out to the events we extract the signal from the noise is day one of this conference. It's all of the So what was your reaction to that you were You can look at any of the major dtl vendors and there's new ones coming to the market, the information but natives you can find out where the data is. So you would say that his vision, actually what he implemented in the number of vendors that air providing data catalogue solutions. significantly interest 240% increase in the number of vendors that offer that now peel back the layers a little bit. 99% of the world mark It's not just about collecting the information about where your data is, but it's actually applying the I'm sure it's evolved over the years. I don't have that number off the top. that, challenging so the data and put it all together. It's important. number of events in the markets growing, the market continues to grow as the data is becoming more distributed, need for that date intelligence. All of the technology is there to clean that data up, and that's a part of what I saw. It's hovering around the double dip double. There's There's a lot of different players in that market. And in the leading market share player has what, 10%? Yeah, the ClA got so there are some pure cloud solutions. Google announced data flow back in our And so there they've got an e t l two on the cloud now. of the incumbents or yeah, that's posturing. They can compete with that. I mean, this sort of dangling your toe in the water, some of them some of the They have the hybrid capabilities because they've got some You're saying, But it's part of the Antos initiative. Folks are watching, What tools have you seen or technology? It's about getting the information of that So the buyer's within the market. not having the report in front of me.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Paul Gillen | PERSON | 0.99+ |
June 2019 | DATE | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
50% | QUANTITY | 0.99+ |
Cisco | ORGANIZATION | 0.99+ |
10% | QUANTITY | 0.99+ |
15% | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Mark | PERSON | 0.99+ |
Mark Ramsey | PERSON | 0.99+ |
Samantha | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Stuart Bond | PERSON | 0.99+ |
Attica | ORGANIZATION | 0.99+ |
66 | QUANTITY | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
International Data Corporation | ORGANIZATION | 0.99+ |
240% | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
Cambridge, Massachusetts | LOCATION | 0.99+ |
99% | QUANTITY | 0.99+ |
a month ago | DATE | 0.99+ |
1/6 | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
100,000,000,000 | QUANTITY | 0.98+ |
MGM | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
Duke | ORGANIZATION | 0.98+ |
a year | DATE | 0.97+ |
Paul | PERSON | 0.95+ |
today | DATE | 0.94+ |
Eve | PERSON | 0.94+ |
M I. T. CDO | PERSON | 0.94+ |
Nautical Palo Carlson | ORGANIZATION | 0.93+ |
this morning | DATE | 0.92+ |
ClA | ORGANIZATION | 0.92+ |
Easton | PERSON | 0.92+ |
1/2 ago | DATE | 0.9+ |
about 66 | QUANTITY | 0.89+ |
2/3 | QUANTITY | 0.88+ |
1/2 1,000,000,000 | QUANTITY | 0.88+ |
Cory | ORGANIZATION | 0.86+ |
one place | QUANTITY | 0.86+ |
single | QUANTITY | 0.85+ |
Stewart Bond | PERSON | 0.84+ |
day one | QUANTITY | 0.83+ |
I DC | ORGANIZATION | 0.83+ |
I. D. C | PERSON | 0.82+ |
this year | DATE | 0.81+ |
over 1,000,000,000,000,000,000 | QUANTITY | 0.8+ |
years | QUANTITY | 0.79+ |
Stewart | PERSON | 0.78+ |
middle | DATE | 0.75+ |
Spider Ring | COMMERCIAL_ITEM | 0.74+ |
beginning | DATE | 0.72+ |
2018 | DATE | 0.72+ |
Radar | ORGANIZATION | 0.72+ |
double | QUANTITY | 0.68+ |
2019 | DATE | 0.68+ |
MIT | ORGANIZATION | 0.68+ |
Tiebreak | ORGANIZATION | 0.64+ |
three | QUANTITY | 0.64+ |
Tahoe | LOCATION | 0.63+ |
M. I T. | PERSON | 0.59+ |
IDC | ORGANIZATION | 0.54+ |
Cube | COMMERCIAL_ITEM | 0.54+ |
those | QUANTITY | 0.51+ |
Justus | PERSON | 0.5+ |
Antos | ORGANIZATION | 0.48+ |
CDOIQ | EVENT | 0.34+ |
Michael Conlin, US Department of Defense | MIT CDOIQ 2019
(upbeat music) >> From Cambridge, Massachusetts, it's the CUBE. Covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. (upbeat music) >> Welcome back to MIT in Cambridge Massachusetts everybody you're watching the CUBE the leader in live tech coverage. We go out to the events and extract the signal from the noise we hear at the MIT CDOIQ. It's the MIT Chief Data Officer event the 13th annual event. The CUBE started covering this show in 2013. I'm Dave Vellante with Paul Gillin, my co-host, and Michael Conlin is here as the chief data officer of the Department of Defense, Michael welcome, thank you for coming on. >> Thank you, it's a pleasure to be here. >> So the DoD is, I think it's the largest organization in the world, what does the chief data officer of the DoD do on a day to day basis? >> A range of things because we have a range of challenges at the Department of Defense. We are the single largest organization on the planet. We have the greatest scope and scale and complexity. We have the most dangerous competitors of anybody on the planet, it's not a trivial issue for us. So, I've a range of challenges. Challenges around, how do I lift the overall performance of the department using data effectively? How do I help executives make better decisions faster, using more recent, more common data? More common enterprise data is the expression we use. How do I help them become more sophisticated consumers of data and especially data analytics? And, how do we get to the point where, I can compare performance over here with performance over there, on a common basis? And compared to commercial benchmark? Which is now an expectation for us, and ask are we doing this as well as we should, right across the patch? Knowing, that all that data comes from multiple different places to start with. So we have to overcome all those differences and provide that department wide view. That's the essence of the role. And now with the recent passage of the Foundations for Evidenced-Based Policymaking Act, there are a number of additional expectations that go on top of that, but this is ultimately about improving affordability and performance of the department. >> So overall performance of the organization... >> Overall performance. >> ...as well, and maybe that comes from supporting various initiatives, and making sure you're driving performance on that basis as well. >> It does, but our litmus test is are we enabling the National Defense Strategy to succeed? Only reason to touch data is to enable the National Defense Strategy to be more successful than without it. And so we're always measuring ourselves against that. But it is, can we objectively say we're performing better? Can we objectively say that we are more affordable? In terms of the way we support the National Defense Strategy. >> I'm curious about your motivations for taking on this assignment because your background, as I see, is primarily in the private sector. A year ago you joined the US Department of Defense. A huge set of issues that you're tackling now, why'd you do it? >> So I am a capitalist, like most Americans, and I'm a serial entrepreneur. This was my first opportunity to serve government. And when I looked at it, knowing that I could directly support national defense, knowing that I could make a direct meaningful contribution, let me exercise that spirit of patriotism that many of us have, but we just not found ourselves an opportunity. When this opportunity came along I just couldn't say no to it. There's so much to be done and so much appetite for improvement that I just couldn't walk away for this. Now I've to tell you, when you start you take an oath of office to protect and defend the constitution. I don't know, it's maybe a paragraph or maybe it's two paragraphs. It felt like it took an hour to choke it out, because I was suddenly struck with all of this emotion. >> The gravity of what you were doing. >> Yeah, the gravity of what I'm doing. And that was just a reinforcement of the choice I'd already made, obviously right. But the chance to be the first chief data officer of the entire Department of Defense, just an enormous privilege. The chance to bring commercial sector best practices in and really lift the game of the department, again enormous privilege. There's so many people who could do this, probably better than me. The fact that I got the opportunity I just couldn't say no. Just too important, to many places I could see that we could make things better. I think anybody with a patriotic bone in their body would of jumped at the opportunity. >> That's awesome, I love that congratulations on getting that role and seemingly thrive in it. A big part of preserving that capitalist belief, defending the constitution and the American way, it sounds corny, but... >> It's real. >> I'm a patriot as well, is security. And security and data are intertwined. And just the whole future of warfare is dramatically changing. Can you talk about in a format like this, security, you're thinking on that, the department's thinking on that from a CDO's perspective? >> So as you know we have a number of swimlanes within the department and security is very clear swimlane, it's aligned under our chief information officer, but security is everybody's responsibility, of course. Now the longstanding criticism of security people is that they think they best way to secure anything is to permit nobody to touch it. The clear expectation for me as chief data officer is to make sure that information is shared to the right people as rapidly as possible. And, that's a different philosophy. Now I'm really lucky. Lieutenant General Denis Crall our principal cyber advisor, Dana Deasy our CIO, these people understand how important it is to get information in the right place at the right time, make it rapidly available and secure it every step along the way. We embrace the zero trust mantra. And because we embrace the zero trust mantra we're directly concerned with defending the data itself. And as long as we defend the data and the same mechanisms are the mechanisms we use to let people share it, suddenly the tension goes away. Suddenly we all have the same goal. Because the goal is not to prevent use of data, it's to enable use of data in a secure way. So the traditional tension that might be in that place doesn't exist in the department. Very productive, very professional level of collaboration with those folks in this space. Very sophisticated people. >> When we were talking before we went live you mentioned that the DoD has 10,000 plus operational systems... >> That's correct. >> A portfolio of that magnitude just overwhelming, I mean how did you know what to do first when you moved into this job, or did you have a clear mandate when you were hired? >> So I did have a clear mandate when I was hired and luckily that was spelled out. We knew what to do first because we sat down with actual leaders of the department and asked them what their goals were for improving the performance of the department. And everything starts from that conversation. You find those executives that what to improve performance, you understand what those goals are, and what data they need to manage that improvement. And you capture all the critical business questions they need answers to. From that point on they're bought in to everything that happens, right. Because they want those answers to those critical business questions. They have performance targets of their own, this is now aligned with. And so you have the support you need to go down the rest of the path of finding the data, standardizing it, et cetera. In order to deliver the answers to those questions. But it all starts which either the business mission leaders or the warfighting mission leaders who define the steps they're taking to implement the National Defense Strategy. Everything gets lined up against that, you get instant support and you know you're going after the right thing. This is not, an if you build it they will come. This is not, a driftnet the organization try to gather up all the data. This is spear fishing for specific answers to materially important questions, and everything we do is done on that basis. >> We hear Mark Ramsey this morning talk about the... He showed a picture of stove pipes and then he complicated that picture by showing multiple copies within each of those stove pipes, and says this is organizations that we've all lived in. >> That's my organization too. >> So talk about some of those data challenges at the DoD and how you're addressing those, specifically how you're enabling soldiers in the field to get the right data to the field when they need it. >> So what we'll be delicate when we talk about what we do for soldiers in the field. >> Understood, yeah. >> That tends to be sensitive. >> Understand why, sure. >> But all of those dynamics that Mark described in that presentation are present in every large cooperation I've ever served. And that includes the Department of Defense. That heterogeneity and sprawl of IT that what I would refer to, he showed us a hair ball of IT. Every large organization has a hair ball of IT. And data scattered all over the place. We took many of the same steps that he described in terms of organizing and presenting meaningful answers to questions, in almost exactly the same sequence. The challenge as you heard me use the statistics that our CIO's published digital monetization strategies, which calls out that we have roughly 10,000 operational systems. Well, every one of them is different. Every one's put in place by a different group of people at a different time, with a different set of requirements, and a different budget, and a different focus. You know organizational scope. We're just like he showed. We're trying to blend all that in to a common view. So we have to find what's the real authoritative piece of data, cause it's not all of those systems. It's only a subset of those systems. And you have to do all of the mapping and translations, to make the result add up. Otherwise you double count or you miss something. This is work in progress. This will always be a work in progress to any large organization. So I don't want to give you impression it's all sorted. Definitely not all sorted. But, the reality is we're trying to get to the point where people can see the data that's available and that's a requirement by the way under the Foundations Act that we have a data catalog, an authoritative data catalog so people can see it and they have the ability to then request access to that through automation. This is what's critical, you need to be able to request access and have it arbitraged on the basis of whether you should directly have access based on your role, your workflow, et cetera, but it should happen in real time. You don't want to wait weeks, or months, or however long for some paperwork to move around. So this all has to become highly automated. So, what's the data, who can access it under what policy, for what purpose? Our roles and responsibilities? Identity management? All this is a combined set of solutions that we have to put in place. I'm mostly worried about a subset of that. My colleagues in these other swimlanes are working to do the rest. Most people in the department have access to data they need in their space. That hasn't been a problem. The problem is you go from space to space, you have to learn a new set of systems and a new set of techniques for a new set of data formats which means you have to be retrained. That really limits our freedom of maneuver of human beings. In the ideal world you'd be able to move from any job in any part of the department to the same job in another part of the department with no retraining whatsoever. You'd be instantly able to make a contribution. That's what we're trying to get to. So that's a different kind of a challenge, right. How do we get that level of consistency in the user experience, a modern user experience. So that if I'm a real estate manager, or I'm a medical business manager, or I'm a clinical professional, or I'm whatever, I can go from this location in this part of the department to that location in that part and my experience is the same. It's completely modern, and it's completely consistent. No retraining. >> How much of that challenge pie is people, process and technology? How would you split that opportunity? >> Well everything starts for a process perspective. Because if you automate a bad process, you just make more mistakes in less time at greater costs. Obviously that's not the ideal. But the biggest single challenge is people. It's talent, it's culture. Both on the demand side and on the supply side. If fact a lot of what I talked about in my remarks, was the additional changes we need to put in place to bring people into a more modern approach to data, more modern consumption. And look, we have pockets of excellence. And they can hold their own against any team, any place on the planet. But they are pockets of excellence. And what we're trying to do is raise the entire organization's performance. So it's people, people, and people and then the other stuff. But the products, don't care about (laughs). >> We often here about... >> They're going to change in 12 to 18 months. I'm a technologist, I'm hands on. The products are going to change rapidly, I make no emotional commitment to products. But the people that's a different story. >> Well we know that in the commercial world we often hear that cultural resistance is what sabotages modernization efforts. The DoD is sort of the ultimate top-down organization. It is any easier to get buy-in because the culture is sort of command and control oriented? >> It's hard in the DoD, it's not easier in the DoD. Ultimately people respond to their performance incentives. That's the dirty secrets performance incentives, they work every time. So unless you restructure performance measures and incentives for people their behavior's never going to change. They need to see their personal future in the future you're prescribing. And if they don't see it, you're going to get resistance every time. They're going to do what they believe they're incented to do. Making those changes, cascading those performance measures down, has been difficult because much of the decision-making processes in the department have been based on slow-moving systems and slow-moving data. I mean think about it, our budget planning process was created by Robert McNamara, as the Secretary of Defense. It requires you to plan everything for five years. And it takes more than a year to plan a single year's worth of activities, it's slow-moving. And we have regulation, we have legislation, we're a law-abiding organization, we do what we have to do. All of those things slow things down. And there's a culture of expecting macro-level consensus building. Which means everybody feels they can say no. If everybody can say no, then change becomes peanut butter spread across an organization. When you peanut butter spread across something our size and scale, the layer's pretty thin. So we have the same problem that other organizations have. There is clearly a perception of top-down change and if the Secretary or the Deputy Secretary issue an instruction people will obey it. It just takes some time to work it's way down into all the detailed combinations and permutations. Cause you have to make sophisticated decisions now. How am I going to change for my performance measures for that group to that group? And that takes time and energy and thought. There's a natural sort of pipeline effect in this. So there's real tension I think in between this perception of top-down and people will obey the orders their given. But when you're trying to integrate those changes into a board set of policy and process and people, that takes time and energy. >> And as a result the leaders have to be circumspect about the orders they give because they want to see success. They want to make sure that what they say is actually implemented or it reflects poorly on the organization. >> I think that out leaders are absolutely concerned about accomplishing the outcomes that they set out. And I think that they are rightfully determined to get the change as rapidly as possible. I would not expect them to be circumspect. I would anticipate that they would be firm and clear in the direction that they set and they would set aggressive targets because you need aggressive targets to get aggressively changed outcomes. Now. >> But they would have to choose wisely, they can't just fire off orders and expect everything to be done. I would think that they got to really think about what they want to get done, and put all the wood behind the arrow as you... >> I think that they constantly balance all those considerations. I must say, I did not appreciate before I joined the department the extraordinary caliber of leadership we enjoy. We have people with real insight and experience, and high intellectual horsepower making the decisions in the department. We've been blessed with the continuing stream of them at all of the senior ranks. These people could go anywhere, or do anything that they wanted in the economy and they've chosen to be in the department. And they bring enormous intellectual firepower to bear on challenges. >> Well you mentioned the motivation at the top of the segment, that's largely pretty powerful. >> Yeah, oh absolutely. >> I want to ask you, we have to break, but the organizational structure, you talked about the CIO, actually the responsibility for security within the CIO. >> Sure. >> To whom do you report. What's the organization look like? >> So I report to the Chief Management Officer of the Department of Defense. So if you think about the order of precedents, there's the Secretary of Defense, the Deputy Secretary of Defense and third in order is the Chief Management Officer. I report to the Chief Management Officer. >> As does the CIO, is that right? >> As does the CIO, as does the CIO. And actually this is quite typical in large organizations, that you don't have the CDO and the CIO in the same space because the concerns are very different. They have to collaborate but very different concerns. We used to see CDOs reporting to CIOs that's fallen dramatically in terms of the frequency you see that. Cause we now recognize that's just a failure mode. So you don't want to go down that path. The number one most common reporting relationship is actually to a CEO, the chief executive officer, of an organization. It's all about, what executive is driving performance for the organization? That's the person the CDO should report to. And I'm blessed in that I do find myself reporting to the executive driving organizational improvement. For me, that's a critical thing. That would make the difference between whether I could succeed or whether I'm doomed to fail. >> COO would be common too in a commercial organization. >> Yeah, in certain commercial organizations, it's a COO. It just depends on the nature of the business and their maturity with data. But if you're in the... If data's the business, CDO will report to the CEO. There are other organizations where it'll be the COO or CFO, it just depends on the nature of that business. And in our case I'm quite fortunate. >> Well Michael, thank you for, not only the coming to the CUBE but the service you're providing to the country, we really appreciate your insights and... >> It's a pleasure meeting you. >> It's a pleasure meeting you. All right, keep it right there everybody we'll be right back with our next guest. You're watching the CUBE live from MIT CDOIQ, be right back. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media. and Michael Conlin is here as the chief data officer More common enterprise data is the expression we use. and maybe that comes from supporting various initiatives, In terms of the way we support as I see, is primarily in the private sector. I just couldn't say no to it. But the chance to be the first chief data officer defending the constitution and the American way, And just the whole future of warfare Because the goal is not to prevent use of data, you mentioned that the DoD has 10,000 plus This is not, a driftnet the organization and says this is organizations that we've all lived in. enabling soldiers in the field to get the right data for soldiers in the field. in any part of the department to the same job Both on the demand side and on the supply side. But the people that's a different story. The DoD is sort of the ultimate top-down organization. and if the Secretary or the Deputy Secretary And as a result the leaders have to be circumspect about in the direction that they set and they would set behind the arrow as you... the extraordinary caliber of leadership we enjoy. of the segment, that's largely pretty powerful. but the organizational structure, you talked about the CIO, What's the organization look like? of the Department of Defense. dramatically in terms of the frequency you see that. It just depends on the nature of the business to the CUBE but the service you're providing to the country, It's a pleasure meeting you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Paul Gillin | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
PCCW | ORGANIZATION | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Michelle Dennedy | PERSON | 0.99+ |
Matthew Roszak | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Mark Ramsey | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jeff Swain | PERSON | 0.99+ |
Andy Kessler | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Matt Roszak | PERSON | 0.99+ |
Frank Slootman | PERSON | 0.99+ |
John Donahoe | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Dan Cohen | PERSON | 0.99+ |
Michael Biltz | PERSON | 0.99+ |
Dave Nicholson | PERSON | 0.99+ |
Michael Conlin | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Melo | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
Joe Brockmeier | PERSON | 0.99+ |
Sam | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
Jeff Garzik | PERSON | 0.99+ |
Cisco | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
George Canuck | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Rebecca Night | PERSON | 0.99+ |
Brian | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
NUTANIX | ORGANIZATION | 0.99+ |
Neil | PERSON | 0.99+ |
Michael | PERSON | 0.99+ |
Mike Nickerson | PERSON | 0.99+ |
Jeremy Burton | PERSON | 0.99+ |
Fred | PERSON | 0.99+ |
Robert McNamara | PERSON | 0.99+ |
Doug Balog | PERSON | 0.99+ |
2013 | DATE | 0.99+ |
Alistair Wildman | PERSON | 0.99+ |
Kimberly | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
Sam Groccot | PERSON | 0.99+ |
Alibaba | ORGANIZATION | 0.99+ |
Rebecca | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Keynote Analysis | MIT CDOIQ 2019
>> From Cambridge, Massachusetts, it's The Cube! Covering MIT Chief Data Officer and Information Qualities Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome to Cambridge, Massachusetts everybody. You're watching The Cube, the leader in live tech coverage. My name is Dave Vellante and I'm here with my cohost Paul Gillin. And we're covering the 13th annual MIT CDOIQ conference. The Cube first started here in 2013 when the whole industry Paul, this segment of the industry was kind of moving out of the ashes of the compliance world and the data quality world and kind of that back office role, and it had this tailwind of the so called big data movement behind it. And the Chief Data Officer was emerging very strongly within as we've talked about many times in theCube, within highly regulated industries like financial services and government and healthcare and now we're seeing data professionals from all industries join this symposium at MIT as I say 13th year, and we're now seeing a lot of discussion about not only the role of the Chief Data Officer, but some of what we heard this morning from Mark Ramsey some of the failures along the way of all these north star data initiatives, and kind of what to do about it. So this conference brings together several hundred practitioners and we're going to be here for two days just unpacking all the discussions the major trends that touch on data. The data revolution, whether it's digital transformation, privacy, security, blockchain and the like. Now Paul, you've been involved in this conference for a number of years, and you've seen it evolve. You've seen that chief data officer role both emerge from the back office into a c-level executive role, and now spanning a very wide scope of responsibilities. Your thoughts? >> It's been like being part of a soap opera for the last eight years that I've been part of this conference because as you said Dave, we've gone through all of these transitions. In the early days this conference actually started as an information qualities symposium. It has evolved to become about chief data officer and really about the data as an asset to the organization. And I thought that the presentation we saw this morning, Mark Ramsey's talk, we're going to have him on later, very interesting about what they did at GlaxoSmithKline to get their arms around all of the data within that organization. Now a project like that would've unthinkable five years ago, but we've seen all of these new technologies come on board, essentially they've created a massive search engine for all of their data. We're seeing organizations beginning to get their arms around this massive problem. And along the way I say it's a soap opera because along the way we've seen failure after failure, we heard from Mark this morning that data governance is a failure too. That was news to me! All of these promising initiatives that have started and fallen flat or failed to live up to their potential, the chief data officer role has emerged out of that to finally try to get beyond these failures and really get their arms around that organizational data and it's a huge project, and it's something that we're beginning to see some organization succeed at. >> So let's talk a little bit about the role. So the chief data officer in many ways has taken a lot of the heat off the chief information officer, right? It used to be CIO stood for career is over. Well, when you throw all the data problems at an individual c-level executive, that really is a huge challenge. And so, with the cloud it's created opportunities for CIOs to actually unburden themselves of some of the crapplications and actually focus on some of the mission critical stuff that they've always been really strong at and focus their budgets there. But the chief data officer has had somewhat of an unclear scope. Different organizations have different roles and responsibilities. And there's overlap with the chief digital officer. There's a lot of emphasis on monetization whether that's increasing revenue or cutting costs. And as we heard today from the keynote speaker Mark Ramsey, a lot of the data initiatives have failed. So what's your take on that role and its viability and its longterm staying power? >> I think it's coming together. I think last year we saw the first evidence of that. I talked to a number of CDOs last year as well as some of the analysts who were at this conference, and there was pretty good clarity beginning to emerge about what they chief data officer role stood for. I think a lot of what has driven this is this digital transformation, the hot buzz word of 2019. The foundation of digital transformation is a data oriented culture. It's structuring the entire organization around data, and when you get to that point when an organization is ready to do that, then the role of the CDO I think becomes crystal clear. It's not so much just an extract transform load discipline. It's not just technology, it's not just governance. It really is getting that data, pulling that data together and putting it at the center of the organization. That's the value that the CDO can provide, I think organizations are coming around to that. >> Yeah and so we've seen over the last 10 years the decrease, the rapid decrease in cost, the cost of storage. Microprocessor performance we've talked about endlessly. And now you've got the machine intelligence piece layering in. In the early days Hadoop was the hot tech, and interesting now nobody talks even about Hadoop. Rarely. >> Yet it was discussed this morning. >> It was mentioned today. It is a fundamental component of infrastructures. >> Yeah. >> But what it did is it dramatically lowered the cost of storing data, and allowing people to leave data in place. The old adage of ship a five megabytes of code to a petabyte of data versus the reverse. Although we did hear today from Mark Ramsey that they copied all the data into a centralized location so I got some questions on that. But the point I want to make is that was really early days. We're now entered an era and it's underscored by if you look at the top five companies in terms of market cap in the US stock market, obviously Microsoft is now over a trillion. Microsoft, Apple, Amazon, Google and Facebook. Top five. They're data companies, their assets are all data driven. They've surpassed the banks, the energy companies, of course any manufacturing automobile companies, et cetera, et cetera. So they're data companies, and they're wrestling with big issues around security. You can't help but open the paper and see issues on security. Yesterday was the big Capital One. The Equifax issue was resolved in terms of the settlement this week, et cetera, et cetera. Facebook struggling mightily with whether or not how to deal fake news, how to deal with deep fakes. Recently it shut down likes for many Instagram accounts in some countries because they're trying to protect young people who are addicted to this. Well, they need to shut down likes for business accounts. So what kids are doing is they're moving over to the business Instagram accounts. Well when that happens, it exposes their emails automatically so they've all kinds of privacy landmines and people don't know how to deal with them. So this data explosion, while there's a lot of energy and excitement around it, brings together a lot of really sticky issues. And that falls right in the lap of the chief data officer, doesn't it? >> We're in uncharted territory and all of the examples you used are problems that we couldn't have foreseen, those companies couldn't have foreseen. A problem may be created but then the person who suffers from that problem changes their behavior and it creates new problems as you point out with kids shifting where they're going to communicate with each other. So these are all uncharted waters and I think it's got to be scary if you're a company that does have large amounts of consumer data in particular, consumer packaged goods companies for example, you're looking at what's happening to these big companies and these data breaches and you know that you're sitting on a lot of customer data yourself, and that's scary. So we may see some backlash to this from companies that were all bought in to the idea of the 360 degree customer view and having these robust data sources about each one of your customers. Turns out now that that's kind of a dangerous place to be. But to your point, these are data companies, the companies that business people look up to now, that they emulate, are companies that have data at their core. And that's not going to change, and that's certainly got to be good for the role of the CDO. >> I've often said that the enterprise data warehouse failed to live up to its expectations and its promises. And Sarbanes-Oxley basically saved EDW because reporting became a critical component post Enron. Mark Ramsey talked today about EDW failing, master data management failing as kind of a mapping and masking exercise. The enterprise data model which was a top down push for a sort of distraction layer, that failed. You had all these failures and so we turned to governance. That failed. And so you've had this series of issues. >> Let me just point out, what do all those have in common? They're all top down. >> Right. >> All top down initiatives. And what Glaxo did is turn that model on its head and left the data where it was. Went and discovered it and figured it out without actually messing with the data. That may be the difference that changes the game. >> Yeah and it's prescription was basically taking a tactical approach to that problem, start small, get quick hits. And then I think they selected a workload that was appropriate for solving this problem which was clinical trials. And I have some questions for him. And of the big things that struck me is the edge. So as you see a new emerging data coming out of the edge, how are organizations going to deal with that? Because I think a lot of what he was talking about was a lot of legacy on-prem systems and data. Think about JEDI, a story we've been following on SiliconANGLE the joint enterprise defense infrastructure. This is all about the DOD basically becoming cloud enabled. So getting data out into the field during wartime fast. We're talking about satellite data, you're talking about telemetry, analytics, AI data. A lot of distributed data at the edge bringing new challenges to how organizations are going to deal with data problems. It's a whole new realm of complexity. >> And you talk about security issues. When you have a lot of data at the edge and you're sending data to the edge, you're bringing it back in from the edge, every device in the middle is from the smart thermostat. at the edge all the way up to the cloud is a potential failure point, a potential vulnerability point. These are uncharted waters, right? We haven't had to do this on a large scale. Organizations like the DOD are going to be the ones that are going to be the leaders in figuring this out because they are so aggressive. They have such an aggressive infrastructure and place. >> The other question I had, striking question listening to Mark Ramsey this morning. Again Mark Ramsey was former data God at GSK, GlaxoSmithKline now a consultant. We're going to hear from a number of folks like him and chief data officers. But he basically kind of poopooed, he used the example of build it and they will come. You know the Kevin Costner movie Field of Dreams. Don't go after the field of dreams. So my question is, and I wonder if you can weigh in on this is, everywhere we go we hear about digital transformation. They have these big digital transformation projects, they generally are top down. Every CEO wants to get digital right. Is that the wrong approach? I want to ask Mark Ramsey that. Are they doing field of dreams type stuff? Is it going to be yet another failure of traditional legacy systems to try to compete with cloud native and born in data era companies? >> Well he mentioned this morning that the research is already showing that digital transformation most initiatives are failing. Largely because of cultural reasons not technical reasons, and I think Ramsey underscored that point this morning. It's interesting that he led off by mentioning business process reengineering which you remember was a big fad in the 1990s, companies threw billions of dollars at trying to reinvent themselves and most of them failed. Is digital transformation headed down the same path? I think so. And not because the technology isn't there, it's because creating a culture where you can break down these silos and you can get everyone oriented around a single view of the organizations data. The bigger the organization the less likely that is to happen. So what does that mean for the CDO? Well, chief information officer at one point we said the CIO stood for career is over. I wonder if there'll be a corresponding analogy for the CDOs at some of these big organizations when it becomes obvious that pulling all that data together is just not feasible. It sounds like they've done something remarkable at GSK, maybe we'll learn from that example. But not all organizations have the executive support, which was critical to what they did, or just the organizational will to organize themselves around that central data storm. >> And I also said before I think the CDO is taking a lot of heat off the CIO and again my inference was the GSK use case and workload was actually quite narrow in clinical trials and was well suited to success. So my takeaway in this, if I were CDO what I would be doing is trying to figure out okay how does data contribute to the monetization of my organization? Maybe not directly selling the data, but what data do I have that's valuable and how can I monetize that in terms of either saving money, supply chain, logistics, et cetera, et cetera, or making money? Some kind of new revenue opportunity. And I would super glue myself for the line of business executive and go after a small hit. You're talking about digital transformations being top down and largely failing. Shadow digital transformations is maybe the answer to that. Aligning with a line of business, focusing on a very narrow use case, and building successes up that way using data as the ingredient to drive value. >> And big ideas. I recently wrote about Experian which launched a service last called Boost that enables the consumers to actually impact their own credit scores by giving Experian access to their bank accounts to see that they are at better credit risk than maybe portrayed in the credit store. And something like 600,000 people signed up in the first six months of this service. That's an example I think of using inspiration, creating new ideas about how data can be applied And in the process by the way, Experian gains data that they can use in other context to better understand their consumer customers. >> So digital meets data. Data is not the new oil, data is more valuable than oil because you can use it multiple times. The same data can be put in your car or in your house. >> Wish we could do that with the oil. >> You can't do that with oil. So what does that mean? That means it creates more data, more complexity, more security risks, more privacy risks, more compliance complexity, but yet at the same time more opportunities. So we'll be breaking that down all day, Paul and myself. Two days of coverage here at MIT, hashtag MITCDOIQ. You're watching The Cube, we'll be right back right after this short break. (upbeat music)
SUMMARY :
and Information Qualities Symposium 2019. and the data quality world and really about the data as an asset to the organization. and actually focus on some of the mission critical stuff and putting it at the center of the organization. In the early days Hadoop was the hot tech, It is a fundamental component of infrastructures. And that falls right in the lap of and all of the examples you used I've often said that the enterprise data warehouse what do all those have in common? and left the data where it was. And of the big things that struck me is the edge. Organizations like the DOD are going to be the ones Is that the wrong approach? the less likely that is to happen. and how can I monetize that in terms of either saving money, that enables the consumers to actually Data is not the new oil, You can't do that with oil.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Mark Ramsey | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Paul | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Paul Gillin | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
2013 | DATE | 0.99+ |
Ramsey | PERSON | 0.99+ |
Kevin Costner | PERSON | 0.99+ |
Enron | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
DOD | ORGANIZATION | 0.99+ |
Experian | ORGANIZATION | 0.99+ |
2019 | DATE | 0.99+ |
GlaxoSmithKline | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
GSK | ORGANIZATION | 0.99+ |
Glaxo | ORGANIZATION | 0.99+ |
Two days | QUANTITY | 0.99+ |
five megabytes | QUANTITY | 0.99+ |
360 degree | QUANTITY | 0.99+ |
two days | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Cambridge, Massachusetts | LOCATION | 0.99+ |
Field of Dreams | TITLE | 0.99+ |
billions of dollars | QUANTITY | 0.99+ |
Mark | PERSON | 0.99+ |
Equifax | ORGANIZATION | 0.99+ |
Yesterday | DATE | 0.99+ |
over a trillion | QUANTITY | 0.99+ |
1990s | DATE | 0.98+ |
600,000 people | QUANTITY | 0.98+ |
US | LOCATION | 0.98+ |
this week | DATE | 0.98+ |
SiliconANGLE Media | ORGANIZATION | 0.98+ |
first six months | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
The Cube | TITLE | 0.98+ |
five years ago | DATE | 0.97+ |
Capital One | ORGANIZATION | 0.96+ |
first evidence | QUANTITY | 0.96+ |
both | QUANTITY | 0.96+ |
first | QUANTITY | 0.95+ |
MIT | ORGANIZATION | 0.93+ |
this morning | DATE | 0.91+ |
Hadoop | TITLE | 0.88+ |
one point | QUANTITY | 0.87+ |
13th year | QUANTITY | 0.86+ |
MIT CDOIQ conference | EVENT | 0.84+ |
MITCDOIQ | TITLE | 0.84+ |
each one | QUANTITY | 0.82+ |
hundred practitioners | QUANTITY | 0.82+ |
EDW | ORGANIZATION | 0.81+ |
last eight years | DATE | 0.81+ |
MIT Chief Data Officer and | EVENT | 0.81+ |
Sarbanes-Oxley | PERSON | 0.8+ |
top five companies | QUANTITY | 0.78+ |
The Cube | ORGANIZATION | 0.75+ |
Top five | QUANTITY | 0.74+ |
single view | QUANTITY | 0.7+ |
last 10 years | DATE | 0.69+ |
Boost | TITLE | 0.68+ |
a petabyte of data | QUANTITY | 0.65+ |
EDW | TITLE | 0.64+ |
SiliconANGLE | ORGANIZATION | 0.64+ |