Manufacturing Reduce Costs and Improve Quality with IoT Analytics

>>Okay. We're here in the second manufacturing drill down session with Michael Gerber. He was the managing director for automotive and manufacturing solutions at Cloudera. And we're going to continue the discussion with a look at how to lower costs and drive quality in IOT analytics with better uptime and hook. When you do the math, that's really quite obvious when the system is down, productivity is lost and it hits revenue and the bottom line improve quality drives, better service levels and reduces lost opportunities. Michael. Great to see you, >>Dave. All right, guys. Thank you so much. So I'll tell you, we're going to talk a little bit about connected manufacturing, right? And how those IOT IOT around connected manufacturing can do as Dave talked about improved quality outcomes for manufacturing improve and improve your plant uptime. So just a little bit quick, quick, little indulgent, quick history lesson. I promise to be quick. We've all heard about industry 4.0, right? That is the fourth industrial revolution. And that's really what we're here to talk about today. First industrial revolution, real simple, right? You had steam power, right? You would reduce backbreaking work. Second industrial revolution, mass assembly line. Right. So think about Henry Ford and motorized conveyor belts, mass automation, third industrial revolution. Things got interesting, right? You started to see automation, but that automation was done essentially programmed a robot to do something. It did the same thing over and over and over irrespective about of how your outside operations, your outside conditions change fourth industrial revolution, very different breakfasts. >>Now we're connecting, um, equipment and processes and getting feedback from it. And through machine learning, we can make those, um, those processes adapted right through machine learning. That's really what we're talking about in the fourth industrial revolution. And it is intrinsically connected to data and a data life cycle. And by the way, it's important, not just for a little bit of a slight issue. There we'll issue that, but it's important, not for technology sake, right? It's important because it actually drives very important business outcomes. First of all, falling, right? If you look at the cost of quality, even despite decades of, of, uh, companies and manufacturers moving to improve while its quality prompts still account to 20% of sales, right? So every fifth of what you meant or manufactured from a revenue perspective, you've got quality issues that are costing you a lot. Plant downtime, cost companies, $50 billion a year. >>So when we're talking about using data and these industry 4.0 types of use cases, connected data types of use cases, we're not doing it just narrowly to implement technology. We're doing it to move these from adverse, improving quality, reducing downtime. So let's talk about how a connected manufacturing data life cycle with what like, right. But so this is actually the business that cloud areas is in. Let's talk a little bit about that. So we call this manufacturing edge to AI. This is analytics, life something, and it starts with having your plants, right? Those plants are increasingly connected. As I said, sensor prices have come down two thirds over the last decade, right? And those sensors are connected over the internet. So suddenly we can collect all this data from your, um, manufacturing plants, and what do we want to be able to do? You know, we want to be able to collect it. >>We want to be able to analyze that data as it's coming across. Right? So, uh, in scream, right, we want to be able to analyze it and take intelligent real-time actions. Right? We might do some simple processing and filtering at the edge, but we really want to take real-time actions on that data. But, and this is the inference part of things, right? Taking that time. But this, the ability to take these real-time actions, um, is actually the result of a machine learning life cycle. I want to walk you through this, right? And it starts with, um, ingesting this data for the first time, putting it into our enterprise data lake, right? And that data lake enterprise data lake can be either within your data center or it could be in the cloud. You're going to, you're going to ingest that data. You're going to store it. >>You're going to enrich it with enterprise data sources. So now you'll have say sensor data and you'll have maintenance repair orders from your maintenance management systems. Right now you can start to think about do you're getting really nice data sets. You can start to say, Hey, which sensor values correlate to the need for machine maintenance, right? You start to see the data sets. They're becoming very compatible with machine learning, but so you bring these datasets together. You process that you align your time series data from your sensors to your timestamp data from your, um, you know, from your enterprise systems that your maintenance management system, as I mentioned, you know, once you've done that, we could put a query layer on top. So now we can start to do advanced analytics query across all these different types of data sets. But as I mentioned to you, and what's really important here is the fact that once you've stored one histories that say that you can build out those machine learning models I talked to you about earlier. >>So like I said, you can start to say, which sensor values drove the need of correlated to the need for equipment maintenance for my maintenance management systems, right? And then you can build out those models and say, Hey, here are the sensor values of the conditions that predict the need for maintenance. And once you understand that you can actually then build out those models, you deploy the models out to the edge where they will then work in that inference mode, that photographer, I will continuously sniff that data as it's coming and say, Hey, which are the, are we experiencing those conditions that, that predicted the need for maintenance? If so, let's take real-time action, right? Let's schedule a work order and equipment maintenance work order in the past, let's in the future, let's order the parts ahead of time before that a piece of equipment fails and allows us to be very, very proactive. >>So, you know, we have, this is a, one of the Mo the most popular use cases we're seeing in terms of connected, connected manufacturing. And we're working with many different, um, manufacturers around the world. I want to just highlight one of them. Cause I thought it's really interesting. This company is bought for Russia. And for SIA for ACA is the, um, is the, is the, um, the, uh, a supplier associated with out of France. They are huge, right? This is a multi-national automotive, um, parts and systems supplier. And as you can see, they operate in 300 sites in 35 countries. So very global, they connected 2000 machines, right. Um, I mean at once be able to take data from that. They started off with learning how to ingest the data. They started off very well with, um, you know, with, uh, manufacturing control towers, right? >>To be able to just monitor the data from coming in, you know, monitor the process. That was the first step, right. Uh, and you know, 2000 machines, 300 different variables, things like, um, vibration pressure temperature, right? So first let's do performance monitoring. Then they said, okay, let's start doing machine learning on some of these things, just start to build out things like equipment, um, predictive maintenance models, or compute. What they really focused on is computer vision while the inspection. So let's take pictures of, um, parts as they go through a process and then classify what that was this picture associated with the good or bad quality outcome. Then you teach the machine to make that decision on its own. So now, now the machine, the camera is doing the inspections for you. And so they both had those machine learning models. They took that data, all this data was on-prem, but they pushed that data up to the cloud to do the machine learning models, develop those machine learning models. >>Then they push the machine learning models back into the plants where they, where they could take real-time actions through these computer vision, quality inspections. So great use case. Um, great example of how you start with monitoring, move to machine learning, but at the end of the day, or improving quality and improving, um, uh, equipment uptime. And that is the goal of most manufacturers. So with that being said, um, I would like to say, if you want to learn some more, um, we've got a wealth of information on our website. You see the URL in front of you, please go, then you'll learn. There's a lot of information there in terms of the use cases that we're seeing in manufacturing and a lot more detail and a lot more talk about a lot more customers we'll work with. If you need that information, please do find it. Um, with that, I'm going to turn it over to Dave, to Steve. I think you had some questions you want to run by. >>I do, Michael, thank you very much for that. And before I get into the questions, I just wanted to sort of make some observations that was, you know, struck by what you're saying about the phases of industry. We talk about industry 4.0, and my observation is that, you know, traditionally, you know, machines have always replaced humans, but it's been around labor and, and the difference with 4.0, and what you talked about with connecting equipment is you're injecting machine intelligence. Now the camera inspection example, and then the machines are taking action, right? That's, that's different and, and is a really new kind of paradigm here. I think the, the second thing that struck me is, you know, the costs, you know, 20% of, of sales and plant downtime costing, you know, many tens of billions of dollars a year. Um, so that was huge. I mean, the business case for this is I'm going to reduce my expected loss quite dramatically. >>And then I think the third point, which we turned in the morning sessions, and the main stage is really this, the world is hybrid. Everybody's trying to figure out hybrid, get hybrid, right. And it certainly applies here. Uh, this is, this is a hybrid world you've got to accommodate, you know, regardless of where the data is, you've got to be able to get to it, blend it, enrich it, and then act on it. So anyway, those are my big, big takeaways. Um, so first question. So in thinking about implementing connected manufacturing initiatives, what are people going to run into? What are the big challenges that they're going to, they're going to hit? >>No, there's, there's there, there's a few of the, but I think, you know, one of the, uh, one of the key ones is bridging what we'll call the it and OT data divide, right. And what we mean by the it, you know, your, it systems are the ones, your ERP systems, your MES system, Freightos your transactional systems that run on relational databases and your it departments are brilliant at running on that, right? The difficulty becomes an implementing these use cases that you also have to deal with operational technology, right? And those are all of the, that's all the equipment in your manufacturing plant that runs on its proprietary network with proprietary pro protocols. That information can be very, very difficult to get to. Right? So, and it's uncertain, it's a much more unstructured than from your OT. So the key challenge is being able to bring these data sets together in a single place where you can start to do advanced analytics and leverage that diverse data to do machine learning. Right? So that is one of the, if I had to boil it down to the single hardest thing in this, uh, in this, in this type of environment, nectar manufacturing is that that operational technology has kind of run on its own in its own. And for a long time, the silos, the silos, a bound, but at the end of the day, this is incredibly valuable data that now can be tapped, um, um, to, to, to, to move those, those metrics we talked about right around quality and uptime. So a huge opportunity. >>Well, and again, this is a hybrid team and you, you've kind of got this world, that's going toward an equilibrium. You've got the OT side and, you know, pretty hardcore engineers. And we know, we know it. A lot of that data historically has been analog data. This is Chris now is getting, you know, instrumented and captured. Uh, and so you've got that, that cultural challenge and, you know, you got to blend those two worlds. That's critical. Okay. So Michael, let's talk about some of the use cases you touched on, on some, but let's peel the onion a bit when you're thinking about this world of connected manufacturing and analytics in that space, when you talk to customers, you know, what are the most common use cases that you see? >>Yeah, that's a great, that's a great question. And you're right. I did allude to a little bit earlier, but there really is. I want people to think about this, a spectrum of use cases ranging from simple to complex, but you can get value even in the simple phases. And when I talk about the simple use cases, the simplest use cases really is really around monitoring, right? So in this, you monitor your equipment or monitor your processes, right? And you just make sure that you're staying within the bounds of your control plan, right? And this is much easier to do now. Right? Cause some of these sensors are a more sensors and those sensors are moving more and more towards the internet types of technology. So, Hey, you've got the opportunity now to be able to do some monitoring. Okay. No machine learning, we're just talking about simple monitoring next level down. >>And we're seeing is something we would call quality event forensic announces. And now on this one, you say, imagine I'm got warranty plans in the, in the field, right? So I'm starting to see warranty claims kicked off on them. And what you simply want to be able to do is do the forensic analysis back to what was the root cause of within the manufacturing process that caused it. So this is about connecting the dots I've got, I've got warranty issues. What were the manufacturing conditions of the day that caused it? Then you could also say which other, which other products were impacted by those same conditions. And we call those proactively rather than, and, and selectively rather than say, um, recalling an entire year's fleet of a car. So, and that, again, also not machine learning is simply connecting the dots from a warranty claims in the field to the manufacturing conditions of the day so that you could take corrective actions, but then you get into a whole slew of machine learning use case, you know, and, and that ranges from things like quality or say yield optimization, where you start to collect sensor values and, um, manufacturing yield, uh, values from your ERP system. >>And you're certain start to say, which, um, you know, which map a sensor values or factors drove good or bad yield outcomes. And you can identify those factors that are the most important. So you, um, you, you measure those, you monitor those and you optimize those, right. That's how you optimize your, and then you go down to the more traditional machine learning use cases around predictive maintenance. So the key point here, Dave is, look, there's a huge, you know, depending on a customer's maturity around big data, you could start simply with monitoring, get a lot of value, start, then bring together more diverse datasets to do things like connect the.analytics then all and all the way then to, to, to the more advanced machine learning use cases this value to be had throughout. >>I remember when the, you know, the it industry really started to think about, or in the early days, you know, IOT and IOT. Um, it reminds me of when, you know, there was, uh, the, the old days of football field, we were grass and, and a new player would come in and he'd be perfectly white uniform and you had it. We had to get dirty as an industry, you know, it'll learn. And so, so my question relates to other technology partners that you might be working with that are maybe new in this space that, that to accelerate some of these solutions that we've been talking about. >>Yeah. That's a great question. I kind of, um, goes back to one of the things I alluded a little bit about earlier. We've got some great partners, a partner, for example, litmus automation, whose whole world is the OT world. And what they've done is for example, they built some adapters to be able to get to practically every industrial protocol. And they've said, Hey, we can do that. And then give a single interface of that data to the Idera data platform. So now, you know, we're really good at ingesting it data and things like that. We can leverage say a company like litmus that can open the flood gates of that OT data, making it much easier to get that data into our platform. And suddenly you've got all the data you need to, to implement those types of, um, industry 4.0, uh, analytics use cases. And it really boils down to, can I get to that? Can I break down that it OT, um, you know, uh, uh, barrier that we've always had and, and bring together those data sets that really move the needle in terms of improving manufacturing performance. >>Okay. Thank you for that last question. Speaking to moving the needle, I want to Lee lead this discussion on the technology advances. I'd love to talk tech here. Uh, what are the key technology enablers and advancers, if you will, that are going to move connected manufacturing and machine learning forward in this transportation space. Sorry. Manufacturing in >>Factor space. Yeah, I know in the manufacturing space, there's a few things, first of all, I think the fact that obviously I know we touched upon this, the fact that sensor prices have come down and it had become ubiquitous that number one, we can w we're finally been able to get to the OT data, right? That's that's number one, number, number two, I think, you know, um, we, we have the ability that now to be able to store that data a whole lot more efficiently, you know, we've got, we've got great capabilities to be able to do that, to put it over into the cloud, to do the machine learning types of workloads. You've got things like if you're doing computer vision, while in analyst respect GPU's to make those machine learning models much more, um, much more effective, if that 5g technology that starts to blur at least from a latency perspective where you do your computer, whether it be on the edge or in the cloud, you've, you've got more, you know, super business critical stuff. >>You probably don't want to rely on, uh, any type of network connection, but from a latency perspective, you're starting to see, uh, you know, the ability to do compute where it's the most effective now. And that's really important. And again, the machine learning capabilities, and they believed the book, bullet, uh, GP, you know, GPU level, machine learning, all that, those models, and then deployed by over the air updates to your equipment. All of those things are making this, um, there's, you know, there's the advanced analytics and machine learning, uh, data life cycle just faster and better. And at the end of the day, to your point, Dave, that equipment and processes are getting much smarter, uh, very much more quickly. >>Yep. We've got a lot of data and we have way lower costs, uh, processing platforms I'll throw in NP use as well. Watch that space neural processing units. Okay. Michael, we're going to leave it there. Thank you so much. Really appreciate your time, >>Dave. I really appreciate it. And thanks. Thanks for, uh, for everybody who joined. Uh, thanks. Thanks for joining today. Yes. Thank you for watching. Keep it right there.

Published Date : Aug 3 2021

SUMMARY :

When you do the math, that's really quite obvious when the system is down, productivity is lost and it hits revenue and the bottom Thank you so much. So every fifth of what you meant or manufactured from a revenue perspective, And those sensors are connected over the internet. I want to walk you through those machine learning models I talked to you about earlier. And then you can build out those models and say, Hey, here are the sensor values of the conditions And as you can see, they operate in 300 sites To be able to just monitor the data from coming in, you know, monitor the process. And that is the goal of most manufacturers. I think the, the second thing that struck me is, you know, the costs, you know, 20% of, And then I think the third point, which we turned in the morning sessions, and the main stage is really this, And what we mean by the it, you know, your, it systems are the ones, So Michael, let's talk about some of the use cases you touched on, on some, And you just make sure that you're staying within the bounds of your control plan, And now on this one, you say, imagine I'm got warranty plans in the, in the field, And you can identify those factors that I remember when the, you know, the it industry really started to think about, or in the early days, litmus that can open the flood gates of that OT data, making it much easier to if you will, that are going to move connected manufacturing and machine learning forward that data a whole lot more efficiently, you know, we've got, we've got great capabilities to be able to do that, And at the end of the day, to your point, Dave, that equipment and processes are getting much smarter, Thank you so much. Thank you for watching.

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Dave	PERSON	0.99+
Michael	PERSON	0.99+
France	LOCATION	0.99+
Michael Gerber	PERSON	0.99+
300 sites	QUANTITY	0.99+
2000 machines	QUANTITY	0.99+
20%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Chris	PERSON	0.99+
2000 machines	QUANTITY	0.99+
Russia	LOCATION	0.99+
third point	QUANTITY	0.99+
today	DATE	0.99+
35 countries	QUANTITY	0.99+
second	QUANTITY	0.99+
300 different variables	QUANTITY	0.99+
first time	QUANTITY	0.99+
first question	QUANTITY	0.99+
first step	QUANTITY	0.99+
Henry Ford	PERSON	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
first	QUANTITY	0.98+
Second	QUANTITY	0.97+
two thirds	QUANTITY	0.96+
litmus	ORGANIZATION	0.96+
First	QUANTITY	0.96+
ACA	ORGANIZATION	0.95+
decades	QUANTITY	0.95+
two worlds	QUANTITY	0.94+
single	QUANTITY	0.94+
fourth industrial revolution	EVENT	0.93+
second thing	QUANTITY	0.93+
Lee	PERSON	0.92+
single interface	QUANTITY	0.92+
last decade	DATE	0.92+
single place	QUANTITY	0.92+
fourth industrial revolution	EVENT	0.92+
Idera	ORGANIZATION	0.91+
tens of billions of dollars a year	QUANTITY	0.88+
$50 billion a year	QUANTITY	0.84+
industrial revolution	EVENT	0.82+
20% of sales	QUANTITY	0.81+
fifth	QUANTITY	0.75+
4.0 types	QUANTITY	0.72+
third	QUANTITY	0.61+
lake	ORGANIZATION	0.61+
Analytics	TITLE	0.47+
two	QUANTITY	0.38+
one	OTHER	0.34+

Tiji Mathew, Patrick Zimet and Senthil Karuppaiah | Io-Tahoe Data Quality Active DQ

(upbeat music), (logo pop up) >> Narrator: From around the globe it's theCUBE. Presenting active DQ intelligent automation for data quality brought to you by IO-Tahoe. >> Are you ready to see active DQ on Snowflake in action? Let's get into the show and tell him, do the demo. With me or Tiji Matthew, the Data Solutions Engineer at IO-Tahoe. Also joining us is Patrick Zeimet Data Solutions Engineer at IO-Tahoe and Senthilnathan Karuppaiah, who's the Head of Production Engineering at IO-Tahoe. Patrick, over to you let's see it. >> Hey Dave, thank you so much. Yeah, we've seen a huge increase in the number of organizations interested in Snowflake implementation. Were looking for an innovative, precise and timely method to ingest their data into Snowflake. And where we are seeing a lot of success is a ground up method utilizing both IO-Tahoe and Snowflake. To start you define your as is model. By leveraging IO-Tahoe to profile your various data sources and push the metadata to Snowflake. Meaning we create a data catalog within Snowflake for a centralized location to document items such as source system owners allowing you to have those key conversations and understand the data's lineage, potential blockers and what data is readily available for ingestion. Once the data catalog is built you have a much more dynamic strategies surrounding your Snowflake ingestion. And what's great is that while you're working through those key conversations IO-Tahoe will maintain that metadata push and partnered with Snowflake ability to version the data. You can easily incorporate potential scheme changes along the way. Making sure that the information that you're working on stays as current as the systems that you're hoping to integrate with Snowflake. >> Nice, Patrick I wonder if you could address how you IO-Tahoe Platform Scales and maybe in what way it provides a competitive advantage for customers. >> Great question where IO-Tahoe shines is through its active DQ or the ability to monitor your data's quality in real time. Marking which roads need remediation. According to the customized business rules that you can set. Ensuring that the data quality standards meet the requirements of your organizations. What's great is through our use of RPA. We can scale with an organization. So as you ingest more data sources we can allocate more robotic workers meaning the results will continue to be delivered in the same timely fashion you've grown used to. What's Morrisons IO-Tahoe is doing the heavy lifting on monitoring data quality. That's frees up your data experts to focus on the more strategic tasks such as remediation that augmentations and analytics developments. >> Okay, maybe Tiji, you could address this. I mean, how does all this automation change the operating model that we were talking to to Aj and Dunkin before about that? I mean, if it involves less people and more automation what else can I do in parallel? >> I'm sure the participants today will also be asking the same question. Let me start with the strategic tasks Patrick mentioned, Io-Tahoe does the heavy lifting. Freeing up data experts to act upon the data events generated by IO-Tahoe. Companies that have teams focused on manually building their inventory of the data landscape. Leads to longer turnaround times in producing actionable insights from their own data assets. Thus, diminishing the value realized by traditional methods. However, our operating model involves profiling and remediating at the same time creating a catalog data estate that can be used by business or IT accordingly. With increased automation and fewer people. Our machine learning algorithms augment the data pipeline to tag and capture the data elements into a comprehensive data catalog. As IO-Tahoe automatically catalogs the data estate in a centralized view, the data experts can partly focus on remediating the data events generated from validating against business rules. We envision that data events coupled with this drillable and searchable view will be a comprehensive one to assess the impact of bad quality data. Let's briefly look at the image on screen. For example, the view indicates that bad quality zip code data impacts the contact data which in turn impacts other related entities in systems. Now contrast that with a manually maintained spreadsheet that drowns out the main focus of your analysis. >> Tiji, how do you tag and capture bad quality data and stop that from you've mentioned these printed dependencies. How do you stop that from flowing downstream into the processes within the applications or reports? >> As IO-Tahoe builds the data catalog across source systems. We tag the elements that meet the business rule criteria while segregating the failed data examples associated with the elements that fall below a certain threshold. The elements that meet the business rule criteria are tagged to be searchable. Thus, providing an easy way to identify data elements that may flow through the system. The segregated data examples on the other hand are used by data experts to triage for the root cause. Based on the root cause potential outcomes could be one, changes in the source system to prevent that data from entering the system in the first place. Two, add data pipeline logic, to sanitize bad data from being consumed by downstream applications and reports or just accept the risk of storing bad data and address it when it meets a certain threshold. However, Dave as for your question about preventing bad quality data from flowing into the system? IO-Tahoe will not prevent it because the controls of data flowing between systems is managed outside of IO-Tahoe. Although, IO-Tahoe will alert and notify the data experts to events that indicate bad data has entered the monitored assets. Also we have redesigned our product to be modular and extensible. This allows data events generated by IO-Tahoe to be consumed by any system that wants to control the targets from bad data. Does IO-Tahoe empowers the data experts to control the bad data from flowing into their system. >> Thank you for that. So, one of the things that we've noticed, we've written about is that you've got these hyper specialized roles within the data, the centralized data organization. And wonder how do the data folks get involved here if at all, and how frequently do they get involved? Maybe Senthilnathan you could take that. >> Thank you, Dave for having me here. Well, based on whether the data element in question is in data cataloging or monitoring phase. Different data folks gets involved. When it isn't in the data cataloging stage. The data governance team, along with enterprise architecture or IT involved in setting up the data catalog. Which includes identifying the critical data elements business term identification, definition, documentation data quality rules, and data even set up data domain and business line mapping, lineage PA tracking source of truth. So on and so forth. It's typically in one time set up review certify then govern and monitor. But while when it is in the monitoring phase during any data incident or data issues IO-Tahoe broadcast data signals to the relevant data folks to act and remedy it as quick as possible. And alerts the consumption team it could be the data science, analytics, business opts are both a potential issue so that they are aware and take necessary preventative measure. Let me show you an example, critical data element from data quality dashboard view to lineage view to data 360 degree view for a zip code for conformity check. So in this case the zip code did not meet the past threshold during the technical data quality check and was identified as non-compliant item and notification was sent to the ID folks. So clicking on the zip code. Will take to the lineage view to visualize the dependent system, says that who are producers and who are the consumers. And further drilling down will take us to the detailed view, that a lot of other information's are presented to facilitate for a root cause analysis and not to take it to a final closure. >> Thank you for that. So Tiji? Patrick was talking about the as is to be. So I'm interested in how it's done now versus before. Do you need a data governance operating model for example? >> Typically a company that decides to make an inventory of the data assets would start out by manually building a spreadsheet managed by data experts of the company. What started as a draft now get break into the model of a company. This leads to loss of collaboration as each department makes a copy of their catalog for their specific needs. This decentralized approach leads to loss of uniformity which each department having different definitions which ironically needs a governance model for the data catalog itself. And as the spreadsheet grows in complexity the skill level needed to maintain. It also increases thus leading to fewer and fewer people knowing how to maintain it. About all the content that took so much time and effort to build is not searchable outside of that spreadsheet document. >> Yeah, I think you really hit the nail on my head Tiji. Now companies want to move away from the spreadsheet approach. IO-Tahoe addresses the shortcoming of the traditional approach enabling companies to achieve more with less. >> Yeah, what the customer reaction has been. We had Webster Bank, on one of the early episodes for example, I mean could they have achieved. What they did without something like active data quality and automation maybe Senthilnathan you could address that? >> Sure, It is impossible to achieve full data quality monitoring and remediation without automation or digital workers in place reality that introverts they don't have the time to do the remediation manually because they have to do an analysis conform fix on any data quality issues, as fast as possible before it gets bigger and no exception to Webster. That's why Webster implemented IO-Tahoe's active DQ to set up the business, metadata management and data quality monitoring and remediation in the Snowflake cloud data Lake. We help and building the center of excellence in the data governance, which is managing the data catalog schedule on demand and in-flight data quality checks, but Snowflake, no pipe on stream are super beneficial to achieve in flight quality checks. Then the data assumption monitoring and reporting last but not the least the time saver is persisting the non-compliant records for every data quality run within the Snowflake cloud, along with remediation script. So that during any exceptions the respect to team members is not only alerted. But also supplied with necessary scripts and tools to perform remediation right from the IO-Tahoe's Active DQ. >> Very nice. Okay guys, thanks for the demo. Great stuff. Now, if you want to learn more about the IO-Tahoe platform and how you can accelerate your adoption of Snowflake book some time with a data RPA expert all you got to do is click on the demo icon on the right of your screen and set a meeting. We appreciate you attending this latest episode of the IO-Tahoe data automation series. Look, if you missed any of the content that's all available on demand. This is Dave Vellante theCUBE. Thanks for watching. (upbeat music)

Published Date : Apr 29 2021

SUMMARY :

the globe it's theCUBE. and tell him, do the demo. and push the metadata to Snowflake. if you could address or the ability to monitor the operating model on remediating the data events generated into the processes within the data experts to events that indicate So, one of the things that So clicking on the zip code. Thank you for that. the skill level needed to maintain. of the traditional approach one of the early episodes So that during any exceptions the respect of the IO-Tahoe data automation series.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Tiji Matthew	PERSON	0.99+
Tiji Mathew	PERSON	0.99+
Senthil Karuppaiah	PERSON	0.99+
Patrick Zimet	PERSON	0.99+
IO-Tahoe	ORGANIZATION	0.99+
Io-Tahoe	ORGANIZATION	0.99+
Tiji	PERSON	0.99+
360 degree	QUANTITY	0.99+
Senthilnathan Karuppaiah	PERSON	0.99+
each department	QUANTITY	0.99+
Snowflake	TITLE	0.99+
today	DATE	0.99+
Webster	ORGANIZATION	0.99+
Aj	PERSON	0.99+
Dunkin	PERSON	0.98+
Two	QUANTITY	0.98+
IO	ORGANIZATION	0.97+
Patrick Zeimet	PERSON	0.97+
Webster Bank	ORGANIZATION	0.97+
one	QUANTITY	0.97+
one time	QUANTITY	0.97+
both	QUANTITY	0.96+
Senthilnathan	PERSON	0.96+
IO-Tahoe	TITLE	0.93+
first place	QUANTITY	0.89+
IO	TITLE	0.72+
Snowflake	EVENT	0.71+
Tahoe	ORGANIZATION	0.69+
Data Solutions	ORGANIZATION	0.69+
-Tahoe	TITLE	0.64+
Tahoe	TITLE	0.63+
Snowflake	ORGANIZATION	0.6+
Morrisons	ORGANIZATION	0.6+

Ajay Vohora and Duncan Turnbull | Io-Tahoe ActiveDQ Intelligent Automation for Data Quality

>>From around the globe, but it's the cube presenting active DQ, intelligent automation for data quality brought to you by IO Tahoe. >>Now we're going to look at the role automation plays in mobilizing your data on snowflake. Let's welcome. And Duncan Turnbull who's partner sales engineer at snowflake and AIG Vihara is back CEO of IO. Tahoe is going to share his insight. Gentlemen. Welcome. >>Thank you, David. Good to have you back. Yeah, it's great to have you back >>A J uh, and it's really good to CIO Tao expanding the ecosystem so important. Um, now of course bringing snowflake and it looks like you're really starting to build momentum. I mean, there's progress that we've seen every month, month by month, over the past 12, 14 months, your seed investors, they gotta be happy. >>They are all that happy. And then I can see that we run into a nice phase of expansion here and new customers signing up. And now you're ready to go out and raise that next round of funding. I think, um, maybe think of a slight snowflake five years ago. So we're definitely on track with that. A lot of interest from investors and, um, we're right now trying to focus in on those investors that can partner with us, understand AI data and, and automation. >>So personally, I mean, you've managed a number of early stage VC funds. I think four of them, uh, you've taken several comp, uh, software companies through many funding rounds and growth and all the way to exit. So, you know how it works, you have to get product market fit, you know, you gotta make sure you get your KPIs, right. And you gotta hire the right salespeople, but, but what's different this time around, >>Uh, well, you know, the fundamentals that you mentioned though, those are never change. And, um, what we can say, what I can say that's different, that's shifted, uh, this time around is three things. One in that they used to be this kind of choice of, do we go open source or do we go proprietary? Um, now that has turned into, um, a nice hybrid model where we've really keyed into, um, you know, red hat doing something similar with Santos. And the idea here is that there is a core capability of technology that independence a platform, but it's the ability to then build an ecosystem around that made a pervade community. And that community may include customers, uh, technology partners, other tech vendors, and enabling the platform adoption so that all of those folks in that community can build and contribute, um, while still maintaining the core architecture and platform integrity, uh, at the core of it. >>And that's one thing that's changed was fitting a lot of that type of software company, um, emerge into that model, which is different from five years ago. Um, and then leveraging the cloud, um, every cloud snowflake cloud being one of them here in order to make use of what customers, uh, and customers and enterprise software are moving towards. Uh, every CIO is now in some configuration of a hybrid. Um, it is state whether those cloud multi-cloud on prem. That's just the reality. The other piece is in dealing with the CIO is legacy. So the past 15, 20 years they've purchased many different platforms, technologies, and some of those are still established and still, how do you, um, enable that CIO to make purchase while still preserving and in some cases building on and extending the, the legacy, um, material technology. So they've invested their people's time and training and financial investment into solving a problem, customer pain point, uh, with technology, but, uh, never goes out of fashion >>That never changes. You have to focus like a laser on that. And of course, uh, speaking of companies who are focused on solving problems, don't can turn bill from snowflake. You guys have really done a great job and really brilliantly addressing pain points, particularly around data warehousing, simplified that you're providing this new capability around data sharing, uh, really quite amazing. Um, Dunkin AAJ talks about data quality and customer pain points, uh, in, in enterprise. It, why is data quality been such a problem historically? >>Oh, sorry. One of the biggest challenges that's really affected by it in the past is that because to address everyone's need for using data, they've evolved all these kinds of different places to store all these different silos or data marts or all this kind of clarification of places where data lives and all of those end up with slightly different schedules to bringing data in and out. They end up with slightly different rules for transforming that data and formatting it and getting it ready and slightly different quality checks for making use of it. And this then becomes like a big problem in that these different teams are then going to have slightly different or even radically different ounces to the same kinds of questions, which makes it very hard for teams to work together, uh, on their different data problems that exist inside the business, depending on which of these silos they end up looking at and what you can do. If you have a single kind of scalable system for putting all of your data into it, you can kind of sidestep along to this complexity and you can address the data quality issues in a, in a single and a single way. >>Now, of course, we're seeing this huge trend in the market towards robotic process automation, RPA, that adoption is accelerating. Uh, you see, in UI paths, I IPO, you know, 35 plus billion dollars, uh, valuation, you know, snowflake like numbers, nice cops there for sure. Uh, agent you've coined the phrase data RPA, what is that in simple terms? >>Yeah, I mean, it was born out of, uh, seeing how in our ecosystem concern community developers and customers, uh, general business users for wanting to adopt and deploy a tar hose technology. And we could see that, um, I mean, there's not monkeying out PA we're not trying to automate that piece, but wherever there is a process that was tied into some form of a manual overhead with handovers and so on. Um, that process is something that we were able to automate with, with our ties technology and, and the deployment of AI and machine learning technologies specifically to those data processes almost as a precursor to getting into financial automation that, um, that's really where we're seeing the momentum pick up, especially in the last six months. And we've kept it really simple with snowflake. We've kind of stepped back and said, well, you know, the resource that a snowflake can leverage here is, is the metadata. So how could we turn snowflake into that repository of being the data catalog? And by the way, if you're a CIO looking to purchase a data catalog tool stop, there's no need to, um, working with snowflake, we've enable that intelligence to be gathered automatically and to be put, to use within snowflake. So reducing that manual effort, and I'm putting that data to work. And, um, and that's where we've packaged this with, uh, AI machine learning specific to those data tasks. Um, and it made sense that's, what's resonated with, with our customers. >>You know, what's interesting here, just a quick aside, as you know, I've been watching snowflake now for awhile and, and you know, of course the, the competitors come out and maybe criticize why they don't have this feature. They don't have that feature. And it's snowflake seems to have an answer. And the answer oftentimes is, well, its ecosystem ecosystem is going to bring that because we have a platform that's so easy to work with though. So I'm interested Duncan in what kind of collaborations you are enabling with high quality data. And of course, you know, your data sharing capability. >>Yeah. So I think, uh, you know, the ability to work on, on datasets, isn't just limited to inside the business itself or even between different business units. And we were kind of discussing maybe with their silos. Therefore, when looking at this idea of collaboration, we have these where we want to be >>Able to exploit data to the greatest degree possible, but we need to maintain the security, the safety, the privacy, and governance of that data. It could be quite valuable. It could be quite personal depending on the application involved. One of these novel applications that we see between organizations of data sharing is this idea of data clean rooms. And these data clean rooms are safe, collaborative spaces, which allow multiple companies or even divisions inside a company where they have particular, uh, privacy requirements to bring two or more data sets together for analysis. But without having to actually share the whole unprotected data set with each other, and this lets you to, you know, when you do this inside of snowflake, you can collaborate using standard tool sets. You can use all of our SQL ecosystem. You can use all of the data science ecosystem that works with snowflake. >>You can use all of the BI ecosystem that works with snowflake, but you can do that in a way that keeps the confidentiality that needs to be presented inside the data intact. And you can only really do these kinds of, uh, collaborations, especially across organization, but even inside large enterprises, when you have good reliable data to work with, otherwise your analysis just isn't going to really work properly. A good example of this is one of our large gaming customers. Who's an advertiser. They were able to build targeting ads to acquire customers and measure the campaign impact in revenue, but they were able to keep their data safe and secure while doing that while working with advertising partners, uh, the business impact of that was they're able to get a lifted 20 to 25% in campaign effectiveness through better targeting and actually, uh, pull through into that of a reduction in customer acquisition costs because they just didn't have to spend as much on the forms of media that weren't working for them. >>So, ha I wonder, I mean, you know, with, with the way public policy shaping out, you know, obviously GDPR started it in the States, you know, California, consumer privacy act, and people are sort of taking the best of those. And, and, and there's a lot of differentiation, but what are you seeing just in terms of, you know, the government's really driving this, this move to privacy, >>Um, government public sector, we're seeing a huge wake up an activity and, uh, across the whole piece that, um, part of it has been data privacy. Um, the other part of it is being more joined up and more digital rather than paper or form based. Um, we've all got stories of waiting in line, holding a form, taking that form to the front of the line and handing it over a desk. Now government and public sector is really looking to transform their services into being online, to show self service. Um, and that whole shift is then driving the need to, um, emulate a lot of what the commercial sector is doing, um, to automate their processes and to unlock the data from silos to put through into those, uh, those processes. Um, and another thing I can say about this is they, the need for data quality is as a Dunkin mentions underpins all of these processes, government pharmaceuticals, utilities, banking, insurance, the ability for a chief marketing officer to drive a, a loyalty campaign. >>They, the ability for a CFO to reconcile accounts at the end of the month. So do a, a, uh, a quick, accurate financial close. Um, also the, the ability of a customer operations to make sure that the customer has the right details about themselves in the right, uh, application that they can sell. So from all of that is underpinned by data and is effective or not based on the quality of that data. So whilst we're mobilizing data to snowflake cloud, the ability to then drive analytics, prediction, business processes off that cloud, um, succeeds or fails on the quality of that data. >>I mean it, and, you know, I would say, I mean, it really is table stakes. If you don't trust the data, you're not gonna use the data. The problem is it always takes so long to get to the data quality. There's all these endless debates about it. So we've been doing a fair amount of work and thinking around this idea of decentralized data, data by its very nature is decentralized, but the fault domains of traditional big data is that everything is just monolithic and the organizations monolithic technology's monolithic, the roles are very, you know, hyper specialized. And so you're hearing a lot more these days about this notion of a data fabric or what calls a data mesh. Uh, and we've kind of been leaning in to that and the ability to, to connect various data capabilities, whether it's a data warehouse or a data hub or a data Lake that those assets are discoverable, they're shareable through API APIs and they're governed on a federated basis. And you're using now bringing in a machine intelligence to improve data quality. You know, I wonder Duncan, if you could talk a little bit about Snowflake's approach to this topic. >>Sure. So I'd say that, you know, making use of all of your data, is there a key kind of driver behind these ideas that they can mesh into the data fabrics? And the idea is that you want to bring together not just your kind of strategic data, but also your legacy data and everything that you have inside the enterprise. I think I'd also like to kind of expand upon what a lot of people view as all of the data. And I think that a lot of people kind of miss that there's this whole other world of data they could be having access to, which is things like data from their business partners, their customers, their suppliers, and even stuff that's more in the public domain, whether that's, you know, demographic data or geographic or all these kinds of other types of data sources. And what I'd say to some extent is that the data cloud really facilitates the ability to share and gain access to this both kind of between organizations inside organizations. >>And you don't have to, you know, make lots of copies of the data and kind of worry about the storage and this federated, um, you know, idea of governance and all these things that it's quite complex to kind of manage this. Uh, you know, the snowflake approach really enables you to share data with your ecosystem all the world, without any latency with full control over what's shared without having to introduce new complexities or having complex attractions with APIs or software integration. The simple approach that we provide allows a relentless focus on creating the right data product to meet the challenges facing your business today. >>So, Andrea, the key here is to don't get to talking about it in my mind. Anyway, my cake takeaway is to simplicity. If you can take the complexity out of the equation, we're going to get more adoption. It really is that simple. >>Yeah, absolutely. Do you think that that whole journey, maybe five, six years ago, the adoption of data lakes was, was a stepping stone. Uh, however, the Achilles heel there was, you know, the complexity that it shifted towards consuming that data from a data Lake where there were many, many sets of data, um, to, to be able to cure rate and to, um, to consume, uh, whereas actually, you know, the simplicity of being able to go to the data that you need to do your role, whether you're in tax compliance or in customer services is, is key. And, you know, listen for snowflake by auto. One thing we know for sure is that our customers are super small and they're very capable. They're they're data savvy and know, want to use whichever tool and embrace whichever, um, cloud platform that is gonna reduce the barriers to solving. What's complex about that data, simplifying that and using, um, good old fashioned SQL, um, to access data and to build products from it to exploit that data. So, um, simplicity is, is key to it to allow people to, to, to make use of that data. And CIO is recognize that >>So Duncan, the cloud obviously brought in this notion of dev ops, um, and new methodologies and things like agile that brought that's brought in the notion of data ops, which is a very hot topic right now. Um, basically dev ops applies to data about how D how does snowflake think about this? How do you facilitate that methodology? >>Yeah, sorry. I agree with you absolutely. That they drops takes these ideas of agile development of >>Agile delivery and of the kind of dev ops world that we've seen just rise and rise, and it applies them to the data pipeline, which is somewhere where it kind of traditionally hasn't happened. And it's the same kinds of messages as we see in the development world, it's about delivering faster development, having better repeatability and really getting towards that dream of the data-driven enterprise, you know, where you can answer people's data questions, they can make better business decisions. And we have some really great architectural advantages that allow us to do things like allow cloning of data sets without having to copy them, allows us to do things like time travel so we can see what data looked like at some point in the past. And this lets you kind of set up both your own kind of little data playpen as a clone without really having to copy all of that data. >>So it's quick and easy, and you can also, again, with our separation of storage and compute, you can provision your own virtual warehouse for dev usage. So you're not interfering with anything to do with people's production usage of this data. So the, these ideas, the scalability, it just makes it easy to make changes, test them, see what the effect of those changes are. And we've actually seen this. You were talking a lot about partner ecosystems earlier. Uh, the partner ecosystem has taken these ideas that are inside snowflake and they've extended them. They've integrated them with, uh, dev ops and data ops tooling. So things like version control and get an infrastructure automation and things like Terraform. And they've kind of built that out into more of a data ops products that, that you can, you can make yourself so we can see there's a huge impact of, of these ideas coming into the data world. >>We think we're really well-placed to take advantage to them. The partner ecosystem is doing a great job with doing that. And it really allows us to kind of change that operating model for data so that we don't have as much emphasis on like hierarchy and change windows and all these kinds of things that are maybe use as a lot of fashioned. And we kind of taking the shift from this batch data integration into, you know, streaming continuous data pipelines in the cloud. And this kind of gets you away from like a once a week or once a month change window, if you're really unlucky to, you know, pushing changes, uh, in a much more rapid fashion as the needs of the business change. >>I mean, those hierarchical organizational structures, uh, w when we apply those to begin to that, what it actually creates the silos. So if you're going to be a silo Buster, which aji look at you guys in silo busters, you've got to put data in the hands of the domain experts, the business people, they know what data they want, if they have to go through and beg and borrow for a new data sets, et cetera. And so that's where automation becomes so key. And frankly, the technology should be an implementation detail, not the dictating factor. I wonder if you could comment on this. >>Yeah, absolutely. I think, um, making the, the technologies more accessible to the general business users >>Or those specialists business teams that, um, that's the key to unlocking is it is interesting to see is as people move from organization to organization where they've had those experiences operating in a hierarchical sense, I want to break free from that and, um, or have been exposed to, um, automation, continuous workflows, um, change is continuous in it. It's continuous in business, the market's continuously changing. So having that flow across the organization of work, using key components, such as get hub, similar to what you drive process Terraform to build in, um, code into the process, um, and automation and with a high Tahoe leveraging all the metadata from across those fragmented sources is, is, is good to say how those things are coming together. And watching people move from organization to organization say, Hey, okay, I've got a new start. I've got my first hundred days to impress my, my new manager. >>Uh, what kind of an impact can I, um, bring to this? And quite often we're seeing that as, you know, let me take away the good learnings from how to do it, or how not to do it from my previous role. And this is an opportunity for me to, to bring in automation. And I'll give you an example, David, you know, recently started working with a, a client in financial services. Who's an asset manager, uh, managing financial assets. They've grown over the course of the last 10 years through M and a, and each of those acquisitions have bought with it tactical data. It's saying instead of data of multiple CRM systems now multiple databases, multiple bespoke in-house created applications. And when the new CIO came in and had a look at those well, you know, yes, I want to mobilize my data. Yes, I need to modernize my data state because my CEO is now looking at these crypto assets that are on the horizon and the new funds that are emerging that around digital assets and crypto assets. >>But in order to get to that where absolutely data underpins and is the core asset, um, cleaning up that, that legacy situation mobilizing the relevant data into the Safelite cloud platform, um, is where we're giving time back, you know, that is now taking a few weeks, whereas that transitioned to mobilize that data, start with that, that new clean slate to build upon a new business as a, a digital crypto asset manager, as well as the legacy, traditional financial assets, bonds stocks, and fixed income assets, you name it, uh, is where we're starting to see a lot of innovation. >>Yeah. Tons of innovation. I love the crypto examples and FTS are exploding and, you know, let's face it, traditional banks are getting disrupted. Uh, and so I also love this notion of data RPA. I, especially because I've done a lot of work in the RPA space. And, and I want to, what I would observe is that the, the early days of RPA, I call it paving the cow path, taking existing processes and applying scripts, get letting software robots, you know, do its thing. And that was good because it reduced, you know, mundane tasks, but really where it's evolved is a much broader automation agenda. People are discovering new, new ways to completely transform their processes. And I see a similar, uh, analogy for data, the data operating model. So I'm wonder whenever you think about that, how a customer really gets started bringing this to their ecosystem, their data life cycles. >>Sure. Yeah. So step one is always the same is figuring out for the CIO, the chief data officer, what data do I have, um, and that's increasingly something that they want towards a mate, so we can help them there and, and do that automated data discovery, whether that is documents in the file, share backup archive in a relational data store, in a mainframe really quickly hydrating that and bringing that intelligence, the forefront of, of what do I have, and then it's the next step of, well, okay. Now I want to continually monitor and curate that intelligence with the platform that I've chosen. Let's say snowflake, um, in order such that I can then build applications on top of that platform to serve my, my internal, external customer needs and the automation around classifying data reconciliation across different fragmented data silos, building that in those insights into snowflake. >>Um, as you say, a little later on where we're talking about data quality, active DQ, allowing us to reconcile data from different sources, as well as look at the integrity of that data. Um, so they can go on to remediation, you know, I, I wanna, um, harness and leverage, um, techniques around traditional RPA. Um, but to get to that stage, I need to fix the data. So remediating publishing the data in snowflake, uh, allowing analysis to be formed performance snowflake. Th those are the key steps that we see and just shrinking that timeline into weeks, giving the organization that time back means they're spending more time on their customer and solving their customer's problem, which is where we want them to be. >>This is the brilliance of snowflake actually, you know, Duncan is, I've talked to him, then what does your view about this and your other co-founders and it's really that focus on simplicity. So, I mean, that's, you, you picked a good company to join my opinion. So, um, I wonder if you could, you know, talk about some of the industry sectors that are, again, going to gain the most from, from data RPA, I mean, traditional RPA, if I can use that term, you know, a lot of it was back office, a lot of, you know, financial w what are the practical applications where data RPA is going to impact, you know, businesses and, and the outcomes that we can expect. >>Yes, sir. So our drive is, is really to, to make that, um, business general user's experience of RPA simpler and, and using no code to do that, uh, where they've also chosen snowflake to build that their cloud platform. They've got the combination then of using a relatively simple script scripting techniques, such as SQL, uh, without no code approach. And the, the answer to your question is whichever sector is looking to mobilize their data. Uh, it seems like a cop-out, but to give you some specific examples, David, um, in banking where, uh, customers are looking to modernize their banking systems and enable better customer experience through, through applications and digital apps. That's where we're, we're seeing a lot of traction, uh, and this approach to, to pay RPA to data, um, health care, where there's a huge amount of work to do to standardize data sets across providers, payers, patients, uh, and it's an ongoing, um, process there for, for retail, um, helping to, to build that immersive customer experience. >>So recommending next best actions, um, providing an experience that is going to drive loyalty and retention, that's, that's dependent on understanding what that customer's needs intent, uh, being out to provide them with the content or the outfit at that point in time, or all data dependent utilities is another one great overlap there with, with snowflake where, you know, helping utilities, telecoms energy, water providers to build services on that data. And this is where the ecosystem just continues to, to expand. If we, if we're helping our customers turn their data into services for, for their ecosystem, that's, that's exciting. And they were more so exciting than insurance, which we always used to, um, think back to, uh, when insurance used to be very dull and mundane, actually, that's where we're seeing a huge amounts of innovation to create new flexible products that are priced to the day to the situation and, and risk models being adaptive when the data changes, uh, on, on events or circumstances. So across all those sectors that they're all mobilizing that data, they're all moving in some way, shape or form to a, a multi-cloud, um, set up with their it. And I think with, with snowflake and without Tahoe, being able to accelerate that and make that journey simple and as complex is, uh, is why we found such a good partner here. >>All right. Thanks for that. And then thank you guys. Both. We gotta leave it there. Uh, really appreciate Duncan you coming on and Aja best of luck with the fundraising. >>We'll keep you posted. Thanks, David. All right. Great. >>Okay. Now let's take a look at a short video. That's going to help you understand how to reduce the steps around your data ops. Let's watch.

Published Date : Apr 29 2021

SUMMARY :

intelligent automation for data quality brought to you by IO Tahoe. Tahoe is going to share his insight. Yeah, it's great to have you back Um, now of course bringing snowflake and it looks like you're really starting to build momentum. And then I can see that we run into a And you gotta hire the right salespeople, but, but what's different this time around, Uh, well, you know, the fundamentals that you mentioned though, those are never change. enable that CIO to make purchase while still preserving and in some And of course, uh, speaking of the business, depending on which of these silos they end up looking at and what you can do. uh, valuation, you know, snowflake like numbers, nice cops there for sure. We've kind of stepped back and said, well, you know, the resource that a snowflake can and you know, of course the, the competitors come out and maybe criticize why they don't have this feature. And we were kind of discussing maybe with their silos. the whole unprotected data set with each other, and this lets you to, you know, And you can only really do these kinds you know, obviously GDPR started it in the States, you know, California, consumer privacy act, insurance, the ability for a chief marketing officer to drive They, the ability for a CFO to reconcile accounts at the end of the month. I mean it, and, you know, I would say, I mean, it really is table stakes. extent is that the data cloud really facilitates the ability to share and gain access to this both kind Uh, you know, the snowflake approach really enables you to share data with your ecosystem all the world, So, Andrea, the key here is to don't get to talking about it in my mind. Uh, however, the Achilles heel there was, you know, the complexity So Duncan, the cloud obviously brought in this notion of dev ops, um, I agree with you absolutely. And this lets you kind of set up both your own kind So it's quick and easy, and you can also, again, with our separation of storage and compute, you can provision your own And this kind of gets you away from like a once a week or once a month change window, And frankly, the technology should be an implementation detail, not the dictating factor. the technologies more accessible to the general business users similar to what you drive process Terraform to build in, that as, you know, let me take away the good learnings from how to do um, is where we're giving time back, you know, that is now taking a And that was good because it reduced, you know, mundane tasks, that intelligence, the forefront of, of what do I have, and then it's the next step of, you know, I, I wanna, um, harness and leverage, um, This is the brilliance of snowflake actually, you know, Duncan is, I've talked to him, then what does your view about this and your but to give you some specific examples, David, um, the day to the situation and, and risk models being adaptive And then thank you guys. We'll keep you posted. That's going to help you understand how to reduce

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Andrea	PERSON	0.99+
Duncan Turnbull	PERSON	0.99+
Ajay Vohora	PERSON	0.99+
Duncan	PERSON	0.99+
20	QUANTITY	0.99+
two	QUANTITY	0.99+
IO	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
One	QUANTITY	0.99+
first hundred days	QUANTITY	0.99+
SQL	TITLE	0.99+
both	QUANTITY	0.99+
three things	QUANTITY	0.98+
California	LOCATION	0.98+
five years ago	DATE	0.98+
one thing	QUANTITY	0.98+
25%	QUANTITY	0.97+
Terraform	ORGANIZATION	0.97+
each	QUANTITY	0.97+
one	QUANTITY	0.96+
35 plus billion dollars	QUANTITY	0.96+
five	DATE	0.96+
Santos	ORGANIZATION	0.96+
once a week	QUANTITY	0.95+
GDPR	TITLE	0.95+
Tahoe	PERSON	0.95+
once a month	QUANTITY	0.95+
consumer privacy act	TITLE	0.94+
four	QUANTITY	0.94+
step one	QUANTITY	0.93+
IO Tahoe	ORGANIZATION	0.93+
M	ORGANIZATION	0.91+
agile	TITLE	0.91+
last six months	DATE	0.91+
14 months	QUANTITY	0.9+
single	QUANTITY	0.88+
six years ago	DATE	0.88+
today	DATE	0.88+
Io-Tahoe	ORGANIZATION	0.87+
12	QUANTITY	0.84+
one of them	QUANTITY	0.83+
AIG Vihara	ORGANIZATION	0.82+
One thing	QUANTITY	0.8+
single way	QUANTITY	0.77+
last 10 years	DATE	0.76+
Tons	QUANTITY	0.75+
Agile	TITLE	0.73+
years	QUANTITY	0.73+
Tahoe	ORGANIZATION	0.7+
Terraform	TITLE	0.66+
every cloud	QUANTITY	0.65+
Dunkin	ORGANIZATION	0.61+
past 15, 20	DATE	0.59+
Tao	ORGANIZATION	0.56+
Snowflake	ORGANIZATION	0.56+
Safelite	ORGANIZATION	0.54+
snowflake	TITLE	0.53+
Dunkin AAJ	PERSON	0.52+
people	QUANTITY	0.51+
hat	ORGANIZATION	0.5+

Tiji Mathew, Patrick Zimet and Senthil Karuppaiah | Io-Tahoe Data Quality: Active DQ

(upbeat music), (logo pop up) >> Narrator: From around the globe it's theCUBE. Presenting active DQ intelligent automation for data quality brought to you by IO-Tahoe. >> Are you ready to see active DQ on Snowflake in action? Let's get into the show and tell him, do the demo. With me or Tiji Matthew, the Data Solutions Engineer at IO-Tahoe. Also joining us is Patrick Zeimet Data Solutions Engineer at IO-Tahoe and Senthilnathan Karuppaiah, who's the Head of Production Engineering at IO-Tahoe. Patrick, over to you let's see it. >> Hey Dave, thank you so much. Yeah, we've seen a huge increase in the number of organizations interested in Snowflake implementation. Were looking for an innovative, precise and timely method to ingest their data into Snowflake. And where we are seeing a lot of success is a ground up method utilizing both IO-Tahoe and Snowflake. To start you define your as is model. By leveraging IO-Tahoe to profile your various data sources and push the metadata to Snowflake. Meaning we create a data catalog within Snowflake for a centralized location to document items such as source system owners allowing you to have those key conversations and understand the data's lineage, potential blockers and what data is readily available for ingestion. Once the data catalog is built you have a much more dynamic strategies surrounding your Snowflake ingestion. And what's great is that while you're working through those key conversations IO-Tahoe will maintain that metadata push and partnered with Snowflake ability to version the data. You can easily incorporate potential scheme changes along the way. Making sure that the information that you're working on stays as current as the systems that you're hoping to integrate with Snowflake. >> Nice, Patrick I wonder if you could address how you IO-Tahoe Platform Scales and maybe in what way it provides a competitive advantage for customers. >> Great question where IO-Tahoe shines is through its active DQ or the ability to monitor your data's quality in real time. Marking which roads need remediation. According to the customized business rules that you can set. Ensuring that the data quality standards meet the requirements of your organizations. What's great is through our use of RPA. We can scale with an organization. So as you ingest more data sources we can allocate more robotic workers meaning the results will continue to be delivered in the same timely fashion you've grown used to. What's Morrisons IO-Tahoe is doing the heavy lifting on monitoring data quality. That's frees up your data experts to focus on the more strategic tasks such as remediation that augmentations and analytics developments. >> Okay, maybe Tiji, you could address this. I mean, how does all this automation change the operating model that we were talking to to Aj and Dunkin before about that? I mean, if it involves less people and more automation what else can I do in parallel? >> I'm sure the participants today will also be asking the same question. Let me start with the strategic task. Patrick mentioned IO-Tahoe does the heavy lifting. Freeing up data experts to act upon the data events generated by IO-Tahoe. Companies that have teams focused on manually building their inventory of the data landscape. Leads to longer turnaround times in producing actionable insights from their own data assets. Thus, diminishing the value realized by traditional methods. However, our operating model involves profiling and remediating at the same time creating a catalog data estate that can be used by business or IT accordingly. With increased automation and fewer people. Our machine learning algorithms augment the data pipeline to tag and capture the data elements into a comprehensive data catalog. As IO-Tahoe automatically catalogs the data estate in a centralized view, the data experts can partly focus on remediating the data events generated from validating against business rules. We envision that data events coupled with this drillable and searchable view will be a comprehensive one to assess the impact of bad quality data. Let's briefly look at the image on screen. For example, the view indicates that bad quality zip code data impacts the contact data which in turn impacts other related entities in systems. Now contrast that with a manually maintained spreadsheet that drowns out the main focus of your analysis. >> Tiji, how do you tag and capture bad quality data and stop that from you've mentioned these printed dependencies. How do you stop that from flowing downstream into the processes within the applications or reports? >> As IO-Tahoe builds the data catalog across source systems. We tag the elements that meet the business rule criteria while segregating the failed data examples associated with the elements that fall below a certain threshold. The elements that meet the business rule criteria are tagged to be searchable. Thus, providing an easy way to identify data elements that may flow through the system. The segregated data examples on the other hand are used by data experts to triage for the root cause. Based on the root cause potential outcomes could be one, changes in the source system to prevent that data from entering the system in the first place. Two, add data pipeline logic, to sanitize bad data from being consumed by downstream applications and reports or just accept the risk of storing bad data and address it when it meets a certain threshold. However, Dave as for your question about preventing bad quality data from flowing into the system? IO-Tahoe will not prevent it because the controls of data flowing between systems is managed outside of IO-Tahoe. Although, IO-Tahoe will alert and notify the data experts to events that indicate bad data has entered the monitored assets. Also we have redesigned our product to be modular and extensible. This allows data events generated by IO-Tahoe to be consumed by any system that wants to control the targets from bad data. Does IO-Tahoe empowers the data experts to control the bad data from flowing into their system. >> Thank you for that. So, one of the things that we've noticed, we've written about is that you've got these hyper specialized roles within the data, the centralized data organization. And wonder how do the data folks get involved here if at all, and how frequently do they get involved? Maybe Senthilnathan you could take that. >> Thank you, Dave for having me here. Well, based on whether the data element in question is in data cataloging or monitoring phase. Different data folks gets involved. When it doesn't the data cataloging stage. The data governance team, along with enterprise architecture or IT involved in setting up the data catalog. Which includes identifying the critical data elements business term identification, definition, documentation data quality rules, and data even set up data domain and business line mapping, lineage PA tracking source of truth. So on and so forth. It's typically in one time set up review certify then govern and monitor. But while when it is in the monitoring phase during any data incident or data issues IO-Tahoe broadcast data signals to the relevant data folks to act and remedy it as quick as possible. And alerts the consumption team it could be the data science, analytics, business opts are both a potential issue so that they are aware and take necessary preventative measure. Let me show you an example, critical data element from data quality dashboard view to lineage view to data 360 degree view for a zip code for conformity check. So in this case the zip code did not meet the past threshold during the technical data quality check and was identified as non-compliant item and notification was sent to the ID folks. So clicking on the zip code. Will take to the lineage view to visualize the dependent system, says that who are producers and who are the consumers. And further drilling down will take us to the detailed view, that a lot of other information's are presented to facilitate for a root cause analysis and not to take it to a final closure. >> Thank you for that. So Tiji? Patrick was talking about the as is to be. So I'm interested in how it's done now versus before. Do you need a data governance operating model for example? >> Typically a company that decides to make an inventory of the data assets would start out by manually building a spreadsheet managed by data experts of the company. What started as a draft now get break into the model of a company. This leads to loss of collaboration as each department makes a copy of their catalog for their specific needs. This decentralized approach leads to loss of uniformity which each department having different definitions which ironically needs a governance model for the data catalog itself. And as the spreadsheet grows in complexity the skill level needed to maintain. It also increases thus leading to fewer and fewer people knowing how to maintain it. About all the content that took so much time and effort to build is not searchable outside of that spreadsheet document. >> Yeah, I think you really hit the nail on my head Tiji. Now companies want to move away from the spreadsheet approach. IO-Tahoe addresses the shortcoming of the traditional approach enabling companies to achieve more with less. >> Yeah, what the customer reaction has been. We had Webster Bank, on one of the early episodes for example, I mean could they have achieved. What they did without something like active data quality and automation maybe Senthilnathan you could address that? >> Sure, It is impossible to achieve full data quality monitoring and remediation without automation or digital workers in place reality that introverts they don't have the time to do the remediation manually because they have to do an analysis conform fix on any data quality issues, as fast as possible before it gets bigger and no exception to Webster. That's why Webster implemented IO-Tahoe's active DQ to set up the business, metadata management and data quality monitoring and remediation in the Snowflake cloud data Lake. We help and building the center of excellence in the data governance, which is managing the data catalog schedule on demand and in-flight data quality checks, but Snowflake, no pipe on stream are super beneficial to achieve in flight quality checks. Then the data assumption monitoring and reporting last but not the least the time saver is persisting the non-compliant records for every data quality run within the Snowflake cloud, along with remediation script. So that during any exceptions the respect to team members is not only alerted. But also supplied with necessary scripts and tools to perform remediation right from the IO-Tahoe's Active DQ. >> Very nice. Okay guys, thanks for the demo. Great stuff. Now, if you want to learn more about the IO-Tahoe platform and how you can accelerate your adoption of Snowflake book some time with a data RPA expert all you got to do is click on the demo icon on the right of your screen and set a meeting. We appreciate you attending this latest episode of the IO-Tahoe data automation series. Look, if you missed any of the content that's all available on demand. This is Dave Vellante theCUBE. Thanks for watching. (upbeat music)

Published Date : Apr 21 2021

SUMMARY :

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave	PERSON	0.99+
Tiji Matthew	PERSON	0.99+
Tiji Mathew	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Patrick Zimet	PERSON	0.99+
IO-Tahoe	ORGANIZATION	0.99+
Senthil Karuppaiah	PERSON	0.99+
360 degree	QUANTITY	0.99+
Tiji	PERSON	0.99+
Senthilnathan Karuppaiah	PERSON	0.99+
each department	QUANTITY	0.99+
today	DATE	0.99+
Snowflake	TITLE	0.99+
Webster	ORGANIZATION	0.99+
Aj	PERSON	0.99+
Dunkin	PERSON	0.98+
Two	QUANTITY	0.98+
IO	ORGANIZATION	0.97+
Patrick Zeimet	PERSON	0.97+
one time	QUANTITY	0.97+
Webster Bank	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Io-Tahoe	ORGANIZATION	0.96+
both	QUANTITY	0.96+
Senthilnathan	PERSON	0.96+
IO-Tahoe	TITLE	0.95+
first place	QUANTITY	0.89+
Snowflake	EVENT	0.71+
Tahoe	ORGANIZATION	0.69+
Data Solutions	ORGANIZATION	0.69+
IO	TITLE	0.68+
-Tahoe	TITLE	0.64+
Snowflake	ORGANIZATION	0.6+
Morrisons	ORGANIZATION	0.6+
Tahoe	TITLE	0.59+

Ajay Vohora and Duncan Turnbull | Io-Tahoe Data Quality: Active DQ

>> Announcer: From around the globe. It's the cube presenting active DQ, intelligent automation for data quality brought to you by Io Tahoe. (indistinct) >> Got it? all right if everybody is ready we'll opening on Dave in five, four, three. Now we're going to look at the role automation plays in mobilizing your data on snowflake. Let's welcome. And Duncan Turnbull who's partner sales engineer at snowflake, Ajay Vohora is back CEO of IO. Tahoe he's going to share his insight. Gentlemen. Welcome. >> Thank you, David good to be back. >> Yes it's great to have you back Ajay and it's really good to see Io Tahoe expanding the ecosystem so important now of course bringing snowflake in, it looks like you're really starting to build momentum. I mean, there's progress that we've seen every month month by month, over the past 12, 14 months. Your seed investors, they got to be happy. >> They are they're happy and they can see that we're running into a nice phase of expansion here new customers signing up, and now we're ready to go out and raise that next round of funding. Maybe think of us like Snowflake five years ago. So we're definitely on track with that. A lot of interest from investors and right now trying to focus in on those investors that can partner with us and understand AI data and an automation. >> Well, so personally, I mean you've managed a number of early stage VC funds. I think four of them. You've taken several comm software companies through many funding rounds and growth and all the way to exit. So you know how it works. You have to get product market fit, you got to make sure you get your KPIs, right. And you got to hire the right salespeople, but what's different this time around? >> Well, you know, the fundamentals that you mentioned those that never change. What I can see that's different that's shifted this time around is three things. One in that they used to be this kind of choice of do we go open source or do we go proprietary? Now that has turned into a nice hybrid model where we've really keyed into RedHat doing something similar with Centos. And the idea here is that there is a core capability of technology that underpins a platform, but it's the ability to then build an ecosystem around that made up of a community. And that community may include customers, technology partners, other tech vendors and enabling the platform adoption so that all of those folks in that community can build and contribute whilst still maintaining the core architecture and platform integrity at the core of it. And that's one thing that's changed. We're seeing a lot of that type of software company emerge into that model, which is different from five years ago. And then leveraging the Cloud, every Cloud, Snowflake Cloud being one of them here. In order to make use of what customers end customers in enterprise software are moving towards. Every CIO is now in some configuration of a hybrid. IT is state whether that is Cloud, multi-Cloud, on-prem. That's just the reality. The other piece is in dealing with the CIO, his legacy. So the past 15, 20 years I've purchased many different platforms, technologies, and some of those are still established and still (indistinct) How do you enable that CIO to make purchase whilst still preserving and in some cases building on and extending the legacy material technology. So they've invested their people's time and training and financial investment into. Yeah, of course solving a problem, customer pain point with technology that never goes out in a fashion >> That never changes. You have to focus like a laser on that. And of course, speaking of companies who are focused on solving problems, Duncan Turnbull from Snowflake. You guys have really done a great job and really brilliantly addressing pain points particularly around data warehousing, simplified that you're providing this new capability around data sharing really quite amazing. Duncan, Ajay talks about data quality and customer pain points in enterprise IT. Why is data quality been such a problem historically? >> So one of the biggest challenges that's really affected that in the past is that because to address everyone's needs for using data, they've evolved all these kinds of different places to store it, all these different silos or data marts or all this kind of pluralfiation of places where data lives and all of those end up with slightly different schedules for bringing data in and out, they end up with slightly different rules for transforming that data and formatting it and getting it ready and slightly different quality checks for making use of it. And this then becomes like a big problem in that these different teams are then going to have slightly different or even radically different ounces to the same kinds of questions, which makes it very hard for teams to work together on their different data problems that exist inside the business, depending on which of these silos they end up looking at. And what you can do. If you have a single kind of scalable system for putting all of your data, into it, you can kind of side step along this complexity and you can address the data quality issues in a single way. >> Now, of course, we're seeing this huge trend in the market towards robotic process automation, RPA that adoption is accelerating. You see in UI paths, IPO, 35 plus billion dollars, valuation, Snowflake like numbers, nice comms there for sure. Ajay you've coined the phrase data RPA what is that in simple terms? >> Yeah I mean, it was born out of seeing how in our ecosystem (indistinct) community developers and customers general business users for wanting to adopt and deploy Io Tahoe's technology. And we could see that. I mean, there's not marketing out here we're not trying to automate that piece but wherever there is a process that was tied into some form of a manual overhead with handovers. And so on, that process is something that we were able to automate with Io Tahoe's technology and the employment of AI and machine learning technologies specifically to those data processes, almost as a precursor to getting into marketing automation or financial information automation. That's really where we're seeing the momentum pick up especially in the last six months. And we've kept it really simple with snowflake. We've kind of stepped back and said, well, the resource that a Snowflake can leverage here is the metadata. So how could we turn Snowflake into that repository of being the data catalog? And by the way, if you're a CIO looking to purchase the data catalog tool, stop there's no need to. Working with Snowflake we've enabled that intelligence to be gathered automatically and to be put to use within snowflake. So reducing that manual effort and I'm putting that data to work. And that's where we've packaged this with our AI machine learning specific to those data tasks. And it made sense that's what's resonated with our customers. >> You know, what's interesting here just a quick aside, as you know I've been watching snowflake now for awhile and of course the competitors come out and maybe criticize, "Why they don't have this feature. They don't have that feature." And snowflake seems to have an answer. And the answer oftentimes is, well ecosystem, ecosystem is going to bring that because we have a platform that's so easy to work with. So I'm interested Duncan in what kind of collaborations you are enabling with high quality data. And of course, your data sharing capability. >> Yeah so I think the ability to work on datasets isn't just limited to inside the business itself or even between different business units you're kind of discussing maybe with those silos before. When looking at this idea of collaboration. We have these challenges where we want to be able to exploit data to the greatest degree possible, but we need to maintain the security, the safety, the privacy, and governance of that data. It could be quite valuable. It could be quite personal depending on the application involved. One of these novel applications that we see between organizations of data sharing is this idea of data clean rooms. And these data clean rooms are safe, collaborative spaces which allow multiple companies or even divisions inside a company where they have particular privacy requirements to bring two or more data sets together, for analysis. But without having to actually share the whole unprotected data set with each other. And this lets you to you know, when you do this inside of Snowflake you can collaborate using standard tool sets. You can use all of our SQL ecosystem. You can use all of the data science ecosystem that works with Snowflake. You can use all of the BI ecosystem that works with snowflake. But you can do that in a way that keeps the confidentiality that needs to be presented inside the data intact. And you can only really do these kinds of collaborations especially across organization but even inside large enterprises, when you have good reliable data to work with, otherwise your analysis just isn't going to really work properly. A good example of this is one of our large gaming customers. Who's an appetizer. They were able to build targeted ads to acquire customers and measure the campaign impact in revenue but they were able to keep their data safe and secure while doing that while working with advertising partners. The business impact of that was they're able to get a lift of 20 to 25% in campaign effectiveness through better targeting and actually pull through into that of a reduction in customer acquisition costs because they just didn't have to spend as much on the forms of media that weren't working for them. >> So, Ajay I wonder, I mean with the way public policy is shaping out, you know, obviously GDPR started it in the States, California consumer privacy Act, and people are sort of taking the best of those. And there's a lot of differentiation but what are you seeing just in terms of governments really driving this move to privacy. >> Government, public sector, we're seeing a huge wake up an activity and across (indistinct), part of it has been data privacy. The other part of it is being more joined up and more digital rather than paper or form based. We've all got, so there's a waiting in the line, holding a form, taking that form to the front of the line and handing it over a desk. Now government and public sector is really looking to transform their services into being online (indistinct) self service. And that whole shift is then driving the need to emulate a lot of what the commercial sector is doing to automate their processes and to unlock the data from silos to put through into those processes. And another thing that I can say about this is the need for data quality is as Duncan mentions underpins all of these processes government, pharmaceuticals, utilities, banking, insurance. The ability for a chief marketing officer to drive a a loyalty campaign, the ability for a CFO to reconcile accounts at the end of the month to do a quick accurate financial close. Also the ability of a customer operations to make sure that the customer has the right details about themselves in the right application that they can sell. So from all of that is underpinned by data and is effective or not based on the quality of that data. So whilst we're mobilizing data to the Snowflake Cloud the ability to then drive analytics, prediction, business processes of that Cloud succeeds or fails on the quality of that data. >> I mean it really is table stakes. If you don't trust the data you're not going to use the data. The problem is it always takes so long to get to the data quality. There's all these endless debates about it. So we've been doing a fair amount of work and thinking around this idea of decentralized data. Data by its very nature is decentralized but the fault domains of traditional big data is that everything is just monolithic. And the organizations monolithic that technology's monolithic, the roles are very, you know, hyper specialized. And so you're hearing a lot more these days about this notion of a data fabric or what Jimit Devani calls a data mesh and we've kind of been leaning into that and the ability to connect various data capabilities whether it's a data, warehouse or a data hub or a data lake, that those assets are discoverable, they're shareable through API APIs and they're governed on a federated basis. And you're using now bringing in a machine intelligence to improve data quality. You know, I wonder Duncan, if you could talk a little bit about Snowflake's approach to this topic >> Sure so I'd say that making use of all of your data is the key kind of driver behind these ideas of beta meshes or beta fabrics? And the idea is that you want to bring together not just your kind of strategic data but also your legacy data and everything that you have inside the enterprise. I think I'd also like to kind of expand upon what a lot of people view as all of the data. And I think that a lot of people kind of miss that there's this whole other world of data they could be having access to, which is things like data from their business partners, their customers, their suppliers, and even stuff that's, more in the public domain, whether that's, you know demographic data or geographic or all these kinds of other types of data sources. And what I'd say to some extent is that the data Cloud really facilitates the ability to share and gain access to this both kind of, between organizations, inside organizations. And you don't have to, make lots of copies of the data and kind of worry about the storage and this federated, idea of governance and all these things that it's quite complex to kind of manage. The snowflake approach really enables you to share data with your ecosystem or the world without any latency with full control over what's shared without having to introduce new complexities or having complex interactions with APIs or software integration. The simple approach that we provide allows a relentless focus on creating the right data product to meet the challenges facing your business today. >> So Ajay, the key here is Duncan's talking about it my mind and in my cake takeaway is to simplicity. If you can take the complexity out of the equation you're going to get more adoption. It really is that simple. >> Yeah, absolutely. I think that, that whole journey, maybe five, six years ago the adoption of data lakes was a stepping stone. However, the Achilles heel there was the complexity that it shifted towards consuming that data from a data lake where there were many, many sets of data to be able to cure rate and to consume. Whereas actually, the simplicity of being able to go to the data that you need to do your role, whether you're in tax compliance or in customer services is key. And listen for snowflake by Io Tahoe. One thing we know for sure is that our customers are super smart and they're very capable. They're data savvy and they'll want to use whichever tool and embrace whichever Cloud platform that is going to reduce the barriers to solving what's complex about that data, simplifying that and using good old fashioned SQL to access data and to build products from it to exploit that data. So simplicity is key to it to allow people to make use of that data and CIO is recognize that. >> So Duncan, the Cloud obviously brought in this notion of DevOps and new methodologies and things like agile that's brought in the notion of DataOps which is a very hot topic right now basically DevOps applies to data about how does Snowflake think about this? How do you facilitate that methodology? >> So I agree with you absolutely that DataOps takes these ideas of agile development or agile delivery and have the kind of DevOps world that we've seen just rise and rise. And it applies them to the data pipeline, which is somewhere where it kind of traditionally hasn't happened. And it's the same kinds of messages. As we see in the development world it's about delivering faster development having better repeatability and really getting towards that dream of the data-driven enterprise, where you can answer people's data questions they can make better business decisions. And we have some really great architectural advantages that allow us to do things like allow cloning of data sets without having to copy them, allows us to do things like time travel so we can see what the data looked like at some point in the past. And this lets you kind of set up both your own kind of little data playpen as a clone without really having to copy all of that data so it's quick and easy. And you can also, again with our separation of storage and compute, you can provision your own virtual warehouse for dev usage. So you're not interfering with anything to do with people's production usage of this data. So these ideas, the scalability, it just makes it easy to make changes, test them, see what the effect of those changes are. And we've actually seen this, that you were talking a lot about partner ecosystems earlier. The partner ecosystem has taken these ideas that are inside Snowflake and they've extended them. They've integrated them with DevOps and DataOps tooling. So things like version control and get an infrastructure automation and things like Terraform. And they've kind of built that out into more of a DataOps products that you can make use of. So we can see there's a huge impact of these ideas coming into the data world. We think we're really well-placed to take advantage to them. The partner ecosystem is doing a great job with doing that. And it really allows us to kind of change that operating model for data so that we don't have as much emphasis on like hierarchy and change windows and all these kinds of things that are maybe viewed as a lot as fashioned. And we kind of taken the shift from this batch stage of integration into streaming continuous data pipelines in the Cloud. And this kind of gets you away from like a once a week or once a month change window if you're really unlucky to pushing changes in a much more rapid fashion as the needs of the business change. >> I mean those hierarchical organizational structures when we apply those to begin to that it actually creates the silos. So if you're going to be a silo buster, which Ajay I look at you guys in silo busters, you've got to put data in the hands of the domain experts, the business people, they know what data they want, if they have to go through and beg and borrow for a new data sets cetera. And so that's where automation becomes so key. And frankly the technology should be an implementation detail not the dictating factor. I wonder if you could comment on this. >> Yeah, absolutely. I think making the technologies more accessible to the general business users or those specialists business teams that's the key to unlocking. So it is interesting to see is as people move from organization to organization where they've had those experiences operating in a hierarchical sense, I want to break free from that. And we've been exposed to automation. Continuous workflows change is continuous in IT. It's continuous in business. The market's continuously changing. So having that flow across the organization of work, using key components, such as GitHub and similar towards your drive process, Terraform to build in, code into the process and automation and with Io Tahoe, leveraging all the metadata from across those fragmented sources is good to see how those things are coming together. And watching people move from organization to organization say, "Hey okay, I've got a new start. I've got my first hundred days to impress my new manager. What kind of an impact can I bring to this?" And quite often we're seeing that as, let me take away the good learnings from how to do it or how not to do it from my previous role. And this is an opportunity for me to bring in automation. And I'll give you an example, David, recently started working with a client in financial services. Who's an asset manager, managing financial assets. They've grown over the course of the last 10 years through M&A and each of those acquisitions have bought with its technical debt, it's own set of data, that multiple CRM systems now multiple databases, multiple bespoke in-house created applications. And when the new CIO came in and had a look at those he thought well, yes I want to mobilize my data. Yes, I need to modernize my data state because my CEO is now looking at these crypto assets that are on the horizon and the new funds that are emerging that's around digital assets and crypto assets. But in order to get to that where absolutely data underpins that and is the core asset cleaning up that that legacy situation mobilizing the relevant data into the Snowflake Cloud platform is where we're giving time back. You know, that is now taking a few weeks whereas that transitioned to mobilize that data start with that new clean slate to build upon a new business as a digital crypto asset manager as well as the legacy, traditional financial assets, bonds, stocks, and fixed income assets, you name it is where we're starting to see a lot of innovation. >> Tons of innovation. I love the crypto examples, NFTs are exploding and let's face it. Traditional banks are getting disrupted. And so I also love this notion of data RPA. Especially because Ajay I've done a lot of work in the RPA space. And what I would observe is that the early days of RPA, I call it paving the cow path, taking existing processes and applying scripts, letting software robots do its thing. And that was good because it reduced mundane tasks, but really where it's evolved is a much broader automation agenda. People are discovering new ways to completely transform their processes. And I see a similar analogy for the data operating model. So I'm wonder what do you think about that and how a customer really gets started bringing this to their ecosystem, their data life cycles. >> Sure. Yeah. Step one is always the same. It's figuring out for the CIO, the chief data officer, what data do I have? And that's increasingly something that they want to automate, so we can help them there and do that automated data discovery whether that is documents in the file share backup archive in a relational data store in a mainframe really quickly hydrating that and bringing that intelligence the forefront of what do I have, and then it's the next step of, well, okay now I want to continually monitor and curate that intelligence with the platform that I've chosen let's say Snowflake. In order such that I can then build applications on top of that platform to serve my internal external customer needs. and the automation around classifying data, reconciliation across different fragmented data silos building that in those insights into Snowflake. As you say, a little later on where we're talking about data quality, active DQ, allowing us to reconcile data from different sources as well as look at the integrity of that data. So then go on to remediation. I want to harness and leverage techniques around traditional RPA but to get to that stage, I need to fix the data. So remediating publishing the data in Snowflake, allowing analysis to be formed, performed in Snowflake but those are the key steps that we see and just shrinking that timeline into weeks, giving the organization that time back means they're spending more time on their customer and solving their customer's problem which is where we want them to be. >> Well, I think this is the brilliance of Snowflake actually, you know, Duncan I've talked to Benoit Dageville about this and your other co-founders and it's really that focus on simplicity. So I mean, that's you picked a good company to join in my opinion. So I wonder Ajay, if you could talk about some of the industry sectors that again are going to gain the most from data RPA, I mean traditional RPA, if I can use that term, a lot of it was back office, a lot of financial, what are the practical applications where data RPA is going to impact businesses and the outcomes that we can expect. >> Yes, so our drive is really to make that business general user's experience of RPA simpler and using no code to do that where they've also chosen Snowflake to build their Cloud platform. They've got the combination then of using a relatively simple scripting techniques such as SQL without no code approach. And the answer to your question is whichever sector is looking to mobilize their data. It seems like a cop-out but to give you some specific examples, David now in banking, where our customers are looking to modernize their banking systems and enable better customer experience through applications and digital apps, that's where we're seeing a lot of traction in this approach to pay RPA to data. And health care where there's a huge amount of work to do to standardize data sets across providers, payers, patients and it's an ongoing process there. For retail helping to to build that immersive customer experience. So recommending next best actions. Providing an experience that is going to drive loyalty and retention, that's dependent on understanding what that customer's needs, intent are, being able to provide them with the content or the offer at that point in time or all data dependent utilities. There's another one great overlap there with Snowflake where helping utilities telecoms, energy, water providers to build services on that data. And this is where the ecosystem just continues to expand. If we're helping our customers turn their data into services for their ecosystem, that's exciting. Again, they were more so exciting than insurance which it always used to think back to, when insurance used to be very dull and mundane, actually that's where we're seeing a huge amounts of innovation to create new flexible products that are priced to the day to the situation and risk models being adaptive when the data changes on events or circumstances. So across all those sectors that they're all mobilizing their data, they're all moving in some way but for sure form to a multi-Cloud setup with their IT. And I think with Snowflake and with Io Tahoe being able to accelerate that and make that journey simple and less complex is why we've found such a good partner here. >> All right. Thanks for that. And thank you guys both. We got to leave it there really appreciate Duncan you coming on and Ajay best of luck with the fundraising. >> We'll keep you posted. Thanks, David. >> All right. Great. >> Okay. Now let's take a look at a short video. That's going to help you understand how to reduce the steps around your DataOps let's watch. (upbeat music)

Published Date : Apr 20 2021

SUMMARY :

brought to you by Io Tahoe. he's going to share his insight. and it's really good to see Io Tahoe and they can see that we're running and all the way to exit. but it's the ability to You have to focus like a laser on that. is that because to address in the market towards robotic and I'm putting that data to work. and of course the competitors come out that needs to be presented this move to privacy. the ability to then drive and the ability to connect facilitates the ability to share and in my cake takeaway is to simplicity. that is going to reduce the And it applies them to the data pipeline, And frankly the technology should be that's the key to unlocking. that the early days of RPA, and the automation and the outcomes that we can expect. And the answer to your question is We got to leave it there We'll keep you posted. All right. That's going to help you

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Ajay Vohora	PERSON	0.99+
Duncan Turnbull	PERSON	0.99+
Duncan Turnbull	PERSON	0.99+
five	QUANTITY	0.99+
Duncan	PERSON	0.99+
two	QUANTITY	0.99+
Dave	PERSON	0.99+
IO	ORGANIZATION	0.99+
Jimit Devani	PERSON	0.99+
Ajay	PERSON	0.99+
Io Tahoe	ORGANIZATION	0.99+
20	QUANTITY	0.99+
Io-Tahoe	ORGANIZATION	0.99+
One	QUANTITY	0.99+
California consumer privacy Act	TITLE	0.99+
Tahoe	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
Snowflake	TITLE	0.99+
five years ago	DATE	0.99+
SQL	TITLE	0.99+
first hundred days	QUANTITY	0.98+
four	QUANTITY	0.98+
GDPR	TITLE	0.98+
each	QUANTITY	0.98+
three	QUANTITY	0.98+
both	QUANTITY	0.98+
25%	QUANTITY	0.97+
three things	QUANTITY	0.97+
one	QUANTITY	0.97+
M&A	ORGANIZATION	0.97+
once a week	QUANTITY	0.97+
one thing	QUANTITY	0.96+
Snowflake	ORGANIZATION	0.95+
once a month	QUANTITY	0.95+
DevOps	TITLE	0.95+
snowflake	TITLE	0.94+
single	QUANTITY	0.93+
last six months	DATE	0.92+
States	TITLE	0.92+
six years ago	DATE	0.91+
single way	QUANTITY	0.91+
Snowflake Cloud	TITLE	0.9+
DataOps	TITLE	0.9+
today	DATE	0.86+
12	QUANTITY	0.85+
35 plus billion dollars	QUANTITY	0.84+
five	DATE	0.84+
Step one	QUANTITY	0.83+
Tons	QUANTITY	0.82+
RedHat	ORGANIZATION	0.81+
Centos	ORGANIZATION	0.8+
One thing	QUANTITY	0.79+
14 months	QUANTITY	0.79+

Io-Tahoe Episode 6: ActiveDQ™ Intelligent Automation for Data Quality Management promo 1

>>The data Lake concept was intriguing when first introduced in 2010, but people quickly realized that shoving data into a data Lake may data Lake stagnant, repositories that were essentially storage bins that were less expensive than traditional data warehouses. This is Dave Vellante joined me for IO. Tahoe's latest installment of the data automation series, active DQ, intelligent automation for data quality management. We'll talk to experts from snowflake about their data assessment utility from within the snowflake platform and how it scales to the demands of business. While also controlling costs. I have Tahoe CEO, AIG Hora will explain how IO Tahoe and snowflake together are bringing active DQ to market. And what the customers are saying about it. Save the date Thursday, April 29th for IO Tahoes data automation series active DQ, intelligent automation for data quality show streams promptly at 11:00 AM Eastern on the cube, the >>In high tech coverage.

Published Date : Apr 8 2021

SUMMARY :

the snowflake platform and how it scales to the demands of business.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
2010	DATE	0.99+
Thursday, April 29th	DATE	0.99+
IO	ORGANIZATION	0.98+
11:00 AM Eastern	DATE	0.97+
first	QUANTITY	0.96+
IO Tahoe	ORGANIZATION	0.96+
AIG Hora	ORGANIZATION	0.91+
Io-Tahoe	TITLE	0.89+
IO Tahoes	ORGANIZATION	0.89+
ActiveDQ™	TITLE	0.86+
Episode 6	QUANTITY	0.85+
Tahoe	ORGANIZATION	0.83+
Tahoe	PERSON	0.56+
CEO	PERSON	0.54+

Scott Howser, Hadapt - MIT Information Quality 2013 - #MIT #CDOIQ #theCUBE

>> wait. >> Okay, We're back. We are in Cambridge, Massachusetts. This is Dave Volante. I'm here with Jeff Kelly. Where with Wicked Bond. This is the Cube Silicon Angles production. We're here at the Mighty Information Quality Symposium in the heart of database design and development. We've had some great guests on Scott Hauser is here. He's the head of marketing at Adapt Company that we've introduced to our community. You know, quite some time ago, Um, really bringing multiple channels into the Duke Duke ecosystem and helping make sense out of all this data bringing insights to this data. Scott, welcome back to the Cube. >> Thanks for having me. It's good to be here. >> So this this notion of data quality, the reason why we asked you to be on here today is because first of all, you're a practitioner. Umm, you've been in the data warehousing world for a long, long time. So you've struggled with this issue? Um, people here today, uh, really from the world of Hey, we've been doing big data for a long time. This whole big data theme is nothing new to us. Sure, but there's a lot knew. Um, and so take us back to your days as a zoo. A data practitioner. Uh, data warehousing, business intelligence. What were some of the data quality issues that you faced and how did you deal with him? So >> I think a couple of points to raise in that area are no. One of things that we like to do is try and triangulate on user to engage them. And every channel we wanted to go and bring into the fold, creating unique dimension of how do we validate that this is the same person, right? Because each channel that you engage with has potentially different requirements of, um, user accreditation or, ah, guarantee of, you know, single user fuel. That's why I think the Holy Grail used to be in a lot of ways, like single sign on our way to triangulate across the spirit systems, one common identity or person to make that world simple. I don't think that's a reality in the in the sense that when you look at, um, a product provider or solution provider and a customer that's external, write those those two worlds Avery spirit and there was a lot of channels and pitch it potentially even third party means that I might want to engage this individual by. And every time I want to bring another one of those channels online, it further complicates. Validating who? That person eighty. >> Okay, so So when you were doing your data warehouse thing again as an I t practitioner, Um, you have you You try to expand the channels, but every time he did that and complex if I hide the data source So how did you deal with that problem? So just create another database and stole five Everything well, >> unfortunately, absolutely creates us this notion of islands of information throughout the enterprise. Because, as you mentioned, you know, we define a schema effectively a new place, Um, data elements into that schema of how you identified how you engage in and how you rate that person's behaviors or engagement, etcetera. And I think what you'd see is, as you'd bring on new sources that timeto actually emerge those things together wasn't in the order of days or weeks. It's on months and years. And so, with every new channel that became interesting, you further complicate the problem and effectively, What you do is, you know, creating these pools of information on you. Take extracts and you try and do something to munch the data and put in a place where you give access to an analyst to say, Okay, here's it. Another, um, Sample said a day to try and figure out of these things. Align and you try and create effectively a new schema that includes all the additional day that we just added. >> So it's interesting because again, one of the themes that we've been hearing a lot of this conference and hear it a lot in many conferences, not the technology. It's the people in process around the technology. That's certainly any person person would agree with that. But at the same time, the technology historically has been problematic, particularly data. Warehouse technology has been challenging you. So you've had toe keep databases relatively small and despair, and you had to build business processes around those that's right a basis. So you've not only got, you know, deficient technology, if you will, no offense, toe data, warehousing friends, but you've got ah, process creep that's actually fair. That's occurred, and >> I think you know what is happening is it's one of the things that's led to sort of the the revolution it's occurring in the market right now about you know, whether it's the new ecosystem or all the tangential technologies around that. Because what what's bound not some technology issues in the past has been the schema right. As important as that is because it gives people a very easy way to interact with the data. It also creates significant challenges when you want to bring on these unique sources of information. Because, you know, as you look at things that have happened over the last decade, the engagement process for either a consumer, a prospect or customer have changed pretty dramatically, and they don't all have the same stringent requirements about providing information to become engaged that way. So I think where the schema has, you know, has value you obviously, in the enterprise, it also has a lot of, um, historical challenges that brings along with >> us. So this jump movement is very disruptive to the traditional market spaces. Many folks say it isn't traditional guy, say, say it isn't but clearly is, particularly as you go Omni Channel. I threw that word out earlier on the channels of discussion that we had a dupe summit myself. John Ferrier, Hobby lobby meta and as your and this is something that you guys are doing that bringing in data to allow your customers to go Omni Channel. As you do that, you start again. Increase the complexity of the corpus of data at the same time. A lot of a lot of times into do you hear about scheme alight ski, but less so how do you reconcile the Omni Channel? The scheme of less It's their scheme alight. And the data quality >> problems, Yes, I think for, you know, particular speaking about adapt one of things that we do is we give customers the ability to take and effectively dump all that data into one common repository that is HD if s and do and leverage some of those open source tools and even their own, you know, inventions, if you will, you know, with m R code pig, whatever, and allow them to effectively normalized data through it orations and to do and then push that into tables effectively that now we can give access to the sequel interface. Right? So I think for us the abilities you're absolutely right. The more channels. You, Khun, give access to write. So this concept of anomie channel where Irrespective of what way we engaged with a customer what way? They touch us in some way. Being able to provide those dimensions of data in one common repository gives the marketeer, if you will, an incredible flexibility and insights that were previous, Who'd be discoverable >> assuming that data qualities this scene >> right of all these So so that that was gonna be my question. So what did the data quality implications of using something like HD FSB. You're essentially scheme unless you're just dumping data and essentially have a raw format and and it's raw format. So now you've gotto reconcile all these different types of data from different sources on build out that kind of single view of a customer of a product, Whatever, whatever is yours. You're right. >> So how do you go >> about doing that in that kind of scenario? So I think the repository in Hindu breach defense himself gives you that one common ground toa workin because you've got, you know, no implications of schema or any other preconceived notions about how you're going toe to toe massage weight if you will, And it's about applying logic and looking for those universal ides. There are a bunch of tools around that are focused on this, but applying those tools and it means that doesn't, um, handy captain from the start by predisposing them to some structure. And you want them to decipher or call out that through whether it's began homegrown type scripts, tools that might be upstairs here and then effectively normalizing the data and moving it into some structure where you can interact with it on in a meaningful way. So that really the kind the old way of trying to bring, you know, snippets of the data from different sources into ah, yet another database where you've got a play structure that takes time, months and years in some cases. And so Duke really allows you to speed up that process significantly by basically eliminating that that part of the equation. Yeah, I think there's and there's a bunch of dimensions we could talk about things like even like pricing exercises, right quality of triangulating on what that pricing should be per product for geography, for engagement, etcetera. I think you see that a lot of those types of work. Let's have transitioned from, you know, mainframe type environments, environments of legacy to the Duke ecosystem. And we've seen cases where people talk about they're going from eight month, you know, exercises to a week. And I think that's where the value of this ecosystem in you know, the commodity scalability really provides you with flexibility. That was just previously you unachievable. >> So could you provide some examples either >> you know, your own from your own career or from some customers you're seeing in terms of the data quality implications of the type of work they're doing. So one of our kind of *** is that you know the data quality measures required for any given, uh, use case various, in some cases, depending on the type of case. You know, in depending on the speed that you need, the analysis done, uh, the type of data quality or the level data qualities going is going to marry. Are you seeing that? And if >> so, can you give some examples of the different >> types of way data quality Gonna manifest itself in a big data were close. Sure. So I think that's absolutely fair. And you know. Obviously there's there's gonna be some trade off between accuracy and performance, right? And so you have to create some sort of confidence coefficient part, if you will, that you know, within some degree of probability this is good enough, right? And there's got to be some sort of balance between that actor Jerseyan time Um, some of the things that you know I've seen a lot of customers being interested in is it is a sort of market emerging around providing tools for authenticity of engagement. So it's an example. You know, I may be a large brand, and I have very, um, open channels that I engage somebody with my B e mail might be some Web portal, etcetera, and there's a lot of fishing that goes on out there, right? And so people fishing for whether it's brands and misrepresenting themselves etcetera. And there's a lot of, you know, desire to try and triangulate on data quality of who is effectively positioned themselves as me, who's really not me and being able to sort of, you know, take a cybersecurity spin and started to block those things down and alleviate those sort of nefarious activities. So We've seen a lot of people using our tool to effectively understand and be able to pinpoint those activities based upon behavior's based upon, um, out liars and looking at examples of where the engagement's coming from that aren't authentic if that >> makes you feel any somewhat nebulous but right. So using >> analytics essentially to determine the authenticity of a person of intensity, of an engagement rather than taking more rather than kind of looking at the data itself using pattern detection to determine. But it also taking, you know, there's a bunch of, um, there's a bunch of raw data that exists out there that needs you when you put it together again. Back to this notion of this sort of, you know, landing zone, if you will, or Data Lake or whatever you wanna call it. You know, putting all of this this data into one repository where now I can start to do you know, analytics against it without any sort of pre determined schema. And start to understand, you know, are these people who are purporting to be, you know, firm X y Z are there really from X y Z? And if they're not, where these things originating and how, when we start to put filters or things in place to alleviate those sort of and that could apply, it sounds like to certainly private industry. But, I mean, >> it sounds like >> something you know, government would be very interested in terms ofthe, you know, in the news about different foreign countries potentially being the source of attacks on U. S. Corporations are part of the, uh, part of our infrastructure and trying to determine where that's coming from and who these people are. And >> of course, people were trying to get >> complicated because they're trying to cover up their tracks, right? Certainly. But I think that the most important thing in this context is it's not necessarily about being able to look at it after the fact, but it's being able to look at a set of conditions that occur before these things happen and identify those conditions and put controls in place to alleviate the action from taking place. I think that's where when you look at what is happening from now an acceleration of these models and from an acceleration of the quality of the data gathering being able to put those things into place and put effective controls in place beforehand is changing. You know the loss prevention side of the business and in this one example. But you're absolutely right. From from what I see and from what our customers were doing, it is, you know, it's multi dimensional in that you know this cyber security. That's one example. There's pricing that could be another example. There's engagements from, ah, final analysis or conversion ratio that could be yet another example. So I think you're right in it and that it is ubiquitous. >> So when you think about the historical role of the well historical we had Stewart on earlier, he was saying, the first known chief data officer we could find was two thousand three. So I guess that gives us a decade of history. But if you look back at the hole, I mean data quality. We've been talking about that for many, many decades. So if you think about the traditional or role of an organization, trying tio achieved data quality, single version of the truth, information, quality, information value and you inject it with this destruction of a dupe that to me anyway, that whole notion of data quality is changing because in certain use, cases inference just fine. Um, in false positives are great. Who cares? That's right. Now analyzing Twitter data from some cases and others like healthcare and financial services. It's it's critical. But so how do you see the notion of data quality evolving and adapting to this >> new world? Well, I think one of these you mentioned about this, you know, this single version of the truth was something that was, you know, when I was on the other side of the table, >> they were beating you over the head waken Do this, We >> can do this, and it's It's something that it sounds great on paper. But when you look at the practical implications of trying to do it in a very finite or stringent controlled way, it's not practical for the business >> because you're saying that the portions of your data that you can give a single version of the truth on our so small because of the elapsed time That's right. I think there's that >> dimension. But there's also this element of time, right and the time that it takes to define something that could be that rigid and the structure months. It's months, and by that time a lot of the innovations that business is trying to >> accomplish. The eyes have changed. The initiatives has changed. Yeah, you lost the sale. Hey, but we got the data. It would look here. Yeah, I think that's your >> right. And I think that's what's evolving. I think there's this idea that you know what Let's fail fast and let's do a lot of it. Orations and the flexibility it's being provided out in that ecosystem today gives people an opportunity. Teo iterated failed fast, and you write that you set some sort of, you know confidence in that for this particular application. We're happy with you in a percent confidence. Go fish. You are something a little >> bit, but it's good enough. So having said that now, what can we learn from the traditional date? A quality, you know, chief data officer, practitioners, those who've been very dogmatic, particularly in certain it is what can we learn from them and take into this >> new war? I think from my point of view on what my experience has always been is that those individuals have an unparalleled command of the business and have an appreciation for the end goal that the business is trying to accomplish. And it's taking that instinct that knowledge and applying that to the emergence of what's happening in the technology world and bringing those two things together. I think it's It's not so much as you know, there's a practical application in that sense of Okay, here's the technology options that we have to do these, you know, these desired you engaged father again. It's the pricing engagement, the cyber security or whatever. It's more. How could we accelerate what the business is trying to accomplish and applying this? You know, this technology that's out there to the business problem. I think in a lot of ways, you know, in the past it's always been here. But this really need technology. How can I make it that somewhere? And now I think those folks bring a lot of relevance to the technology to say Hey, here's a problem. Trying to solve legacy methodologies haven't been effective. Haven't been timely. Haven't been, uh, scaleable. Whatever hock me. Apply what's happening. The market today to these problems. >> Um, you guys adapt in particular to me any way a good signal of the maturity model and with the maturity of a dupe, it's It's starting to grow up pretty rapidly, you know, See, due to two auto. And so where are we had? What do you see is the progression, Um, and where we're going. >> So, you know, I mentioned it it on the cue for the last time it So it and I said, I believe that you know who do busy operating system of big data. And I believe that, you know, there's a huge transition taking place that was there were some interesting response to that on Twitter and all the other channels, but I stand behind that. I think that's really what's happening. Lookit. You know what people are engaging us to do is really start to transition away from the legacy methodologies and they're looking at. He's not just lower cost alternatives, but also more flexibility. And we talked about, you know, its summit. The notion of that revenue curve right and cost takeouts great on one side of the coin, and I are one side of the defense here. But I think equally and even more importantly, is the change in the revenue curve and the insights that people they're finding because of these unique channels of the Omni Challenge you describe being able to. So look at all these dimensions have dated one. Unified place is really changing the way that they could go to market. They could engage consumers on DH that they could provide access to the analyst. Yeah. I mean, ultimately, that's the most >> we had. Stewart Madness con who's maybe got written textbooks on operating systems. We probably use them. I know I did. Maybe they were gone by the time you got there, but young, but the point being, you know, a dupe azan operating system. The notion of a platform is really it's changing dramatically. So, um, I think you're right on that. Okay. So what's what's next for you guys? Uh, we talked about, you know, customer attraction and proof points. You're working. All right on that. I know. Um, you guys got a great tech, amazing team. Um, what's next for >> you? So I think it's it's continuing toe. Look at the market in being flexible with the market around as the Hughes case is developed. So, you know, obviously is a startup We're focused in a couple of key areas where we see a lot of early adoption and a lot of pain around the problem that we can solve. But I think it's really about continuing to develop those use cases, um, and expanded the market to become more of a, you know, a holistic provider of Angelique Solutions on top of a >> house. Uh, how's Cambridge working out for you, right? I mean, the company moved up from the founders, moved up from New Haven and chose shows the East Coast shows cameras were obviously really happy about. That is East Coast people. You don't live there full time, but I might as well. So how's that working out talent pool? You know, the vibrancy of the community, the the you know, the young people that you're able to tap. So >> I see there's a bunch of dimensions around that one. It's hot. It's really, really hot >> in human, Yes, but it's been actually >> fantastic. And if you look it not just a town inside the team, but I think around the team. So if you look at our board right Jet Saxena. Chris Lynch, I've been very successful. The database community over decades of experience, you know, and getting folks like that onto the board fell. The Hardiman has been, you know, in this space as well for a long time. Having folks like that is, you know, advisors and providing guidance to the team. Absolutely incredible. Hack Reduce is a great facility where we do things like hackathons meet ups get the community together. So I think there's been a lot of positive inertia around the company just being here in Cambridge. But, you know, from AA development of resource or recruiting one of you. It's also been great because you've got some really exceptional database companies in this area, and history will show you like there's been a lot of success here, not only an incubating technology, but building real database companies. And, you know, we're on start up on the block that people are very interested in, and I think we show a lot of, you know, dynamics that are changing in the market and the way the markets moving. So the ability for us to recruit talent is exceptional, right? We've got a lot of great people to pick from. We've had a lot of people joined from no other previously very successful database companies. The team's growing, you know, significantly in the engineering space right now. Um, but I just you know, I can't say enough good things about the community. Hack, reduce and all the resource is that we get access to because we're here in Cambridge. >> Is the hacker deuces cool? So you guys are obviously leveraging that you do how to bring people into the Sohag produces essentially this. It's not an incubator. It's really more of a an idea cloud. It's a resource cloud really started by Fred Lan and Chris Lynch on DH. Essentially, people come in, they share ideas. You guys I know have hosted a number of how twos and and it's basically open. You know, we've done some stuff there. It's it's very cool. >> Yeah, you know, I think you know, it's even for us. It's also a great place to recruit, right. We made a lot of talented people there, and you know what? The university participation as well We get a lot of talent coming in, participate in these activities, and we do things that aren't just adapt related, that we've had people teach had obsessions and just sort of evangelize what's happening in the ecosystem around us. And like I said, it's just it's been a great resource pool to engage with. And, uh, I think it's been is beneficial to the community, as it has been to us. So very grateful for that. >> All right. Scott has always awesome. See, I knew you were going to have some good practitioner perspectives on data. Qualities really appreciate you stopping by. My pleasure. Thanks for having to see you. Take care. I keep right to everybody right back with our next guest. This is Dave a lot. They would. Jeff Kelly, this is the Cube. We're live here at the MIT Information Quality Symposium. We'LL be right back.

Published Date : Jul 17 2013

SUMMARY :

the Duke Duke ecosystem and helping make sense out of all this data bringing insights to It's good to be here. So this this notion of data quality, the reason why we asked you to be on here today is because first of all, I don't think that's a reality in the in the sense that when you look at, um, that became interesting, you further complicate the problem and effectively, What you do is, databases relatively small and despair, and you had to build business processes around those it's occurring in the market right now about you know, whether it's the new ecosystem or all the A lot of a lot of times into do you hear about scheme alight ski, but less so problems, Yes, I think for, you know, particular speaking about adapt one of things that we do is we So what did the data quality implications of using And I think that's where the value of this ecosystem in you know, the commodity scalability So one of our kind of *** is that you know the data quality that you know, within some degree of probability this is good enough, right? makes you feel any somewhat nebulous but right. And start to understand, you know, are these people who are purporting something you know, government would be very interested in terms ofthe, you know, in the news about different customers were doing, it is, you know, it's multi dimensional in that you know this cyber security. So if you think about the traditional or But when you look at the practical of the truth on our so small because of the elapsed time That's right. could be that rigid and the structure months. Yeah, you lost the sale. I think there's this idea that you know what Let's fail fast and A quality, you know, chief data officer, practitioners, those who've been very dogmatic, here's the technology options that we have to do these, you know, these desired you engaged you know, See, due to two auto. And I believe that, you know, there's a huge transition taking place Uh, we talked about, you know, customer attraction and proof points. um, and expanded the market to become more of a, you know, a holistic provider the the you know, the young people that you're able to tap. I see there's a bunch of dimensions around that one. on the block that people are very interested in, and I think we show a lot of, you know, dynamics that are changing in So you guys are obviously leveraging that you do how to bring people into the Sohag Yeah, you know, I think you know, it's even for us. Qualities really appreciate you stopping by.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Scott	PERSON	0.99+
Omni Channel	ORGANIZATION	0.99+
Chris Lynch	PERSON	0.99+
Scott Howser	PERSON	0.99+
Dave Volante	PERSON	0.99+
Cambridge	LOCATION	0.99+
five	QUANTITY	0.99+
eight month	QUANTITY	0.99+
today	DATE	0.99+
Angelique Solutions	ORGANIZATION	0.99+
Dave	PERSON	0.99+
John Ferrier	PERSON	0.99+
first	QUANTITY	0.99+
Fred Lan	PERSON	0.99+
Scott Hauser	PERSON	0.99+
Sohag	ORGANIZATION	0.99+
New Haven	LOCATION	0.99+
Twitter	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
two thousand	QUANTITY	0.99+
two things	QUANTITY	0.99+
Stewart	PERSON	0.99+
eighty	QUANTITY	0.99+
one	QUANTITY	0.99+
one example	QUANTITY	0.98+
each channel	QUANTITY	0.98+
one side	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
2013	DATE	0.97+
Hughes	PERSON	0.97+
a week	QUANTITY	0.96+
two	QUANTITY	0.96+
one repository	QUANTITY	0.96+
#CDOIQ	ORGANIZATION	0.96+
East Coast	LOCATION	0.96+
two worlds	QUANTITY	0.95+
a decade	QUANTITY	0.94+
one common repository	QUANTITY	0.93+
Hack Reduce	ORGANIZATION	0.92+
#MIT	ORGANIZATION	0.91+
one common repository	QUANTITY	0.91+
Wicked Bond	ORGANIZATION	0.91+
Cube	ORGANIZATION	0.91+
one common	QUANTITY	0.89+
MIT Information Quality	EVENT	0.89+
Mighty Information Quality Symposium	EVENT	0.88+
Khun	PERSON	0.87+
MIT Information Quality	ORGANIZATION	0.86+
single version	QUANTITY	0.86+
a day	QUANTITY	0.85+
twos	QUANTITY	0.85+
Teo	PERSON	0.85+
Sample	PERSON	0.82+
Duke Duke	ORGANIZATION	0.81+
one side of	QUANTITY	0.8+
single sign	QUANTITY	0.8+
Duke	ORGANIZATION	0.76+
Jet Saxena	PERSON	0.75+
Hobby	ORGANIZATION	0.75+
last decade	DATE	0.74+
Data Lake	LOCATION	0.72+
themes	QUANTITY	0.7+
Adapt Company	ORGANIZATION	0.65+
Cube Silicon Angles	ORGANIZATION	0.62+
Hindu	OTHER	0.61+
Duke	LOCATION	0.6+
Hadapt	ORGANIZATION	0.58+
Hardiman	PERSON	0.57+
three	QUANTITY	0.52+
Symposium	ORGANIZATION	0.51+
points	QUANTITY	0.5+
#theCUBE	ORGANIZATION	0.49+
Stewart Madness	PERSON	0.49+
U. S.	ORGANIZATION	0.48+
couple	QUANTITY	0.47+

Felix Van de Maele, Collibra, Data Citizens 22

(upbeat techno music) >> Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions, and they were largely confined to regulated industries that had to comply with public policy mandates. But as the cloud went mainstream the tech giants showed us how valuable data could become, and the value proposition for data quality and trust, it evolved from primarily a compliance driven issue, to becoming a linchpin of competitive advantage. But, data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper-specialized skills, to develop data architectures and processes, to serve the myriad data needs of organizations. And it resulted in a lot of frustration, with data initiatives for most organizations, that didn't have the resources of the cloud guys and the social media giants, to really attack their data problems and turn data into gold. This is why today, for example, there's quite a bit of momentum to re-thinking monolithic data architectures. You see, you hear about initiatives like Data Mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business users. You hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver, like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that but also, how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. In other words, while it's enticing to experiment, and run fast and loose with data initiatives, kind of like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated and intelligent. Governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is going to use data that is entrusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. Hello and welcome to theCUBE's coverage of Data Citizens made possible by Collibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Vellante and I'm one of the hosts of our program which is running in parallel to Data Citizens. Now at theCUBE we like to say we extract the signal from the noise, and over the next couple of days we're going to feature some of the themes from the keynote speakers at Data Citizens, and we'll hear from several of the executives. Felix Van de Maele, who is the co-founder and CEO of Collibra, will join us. Along with one of the other founders of Collibra, Stan Christiaens, who's going to join my colleague Lisa Martin. I'm going to also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Haslbeck. He's the Vice President of Data Quality at Collibra. He's an amazingly smart dude who founded Owl DQ, a company that he sold to Collibra last year. Now, many companies they didn't make it through the Hadoop era, you know they missed the industry waves and they became driftwood. Collibra, on the other hand, has evolved its business, they've leveraged the cloud, expanded its product portfolio and leaned in heavily to some major partnerships with cloud providers as well as receiving a strategic investment from Snowflake, earlier this year. So, it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. (upbeat rock music) Last year theCUBE covered Data Citizens, Collibra's customer event, and the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know starting with the Hadoop movement, we had Data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like Tensorflow, the rise of AI, Low Code, No Code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives, and we said at the time, you know maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation. Meaning, making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back and we're pleased to have Felix Van de Maele who is the founder and CEO of Collibra. He's on theCUBE. We're excited to have you Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data citizens 2021, and we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends and we're not just snapping back to the 2010s, that's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s, from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely, and and I think you said it well, Dave and the intro that, that rising complexity and fragmentation, in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten more more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under, respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity, and fragmentation. So, it's become much more acute. And to your earlier point, we do live in a different world and and the past couple of years we could probably just kind of brute force it, right? We could focus on, on the top line, there was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, how do we truly get the value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale with data, not just from a a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? You said at the, the people and process, how do we do that at scale? And that's only, only, only becoming much more important, and we do believe that the, the economic environment that we find ourselves in today is going to be catalyst for organizations to really take that more seriously if, if, if you will, than they maybe have in the have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >> Yeah, absolutely. We, we started Collibra in 2008. So, in some sense and the, the last kind of financial crisis and that was really the, the start of Collibra, where we found product market fit, working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis. And kind of here we are again, in a very different environment of course 15 years, almost 15 years later, but data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So, what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it Data Citizens, we truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we still relatively early in that, in that journey. >> Well that's interesting, because you know, in my observation it takes 7 to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwind organizations that are only maturing their data practices and we've seen that kind of transform or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world with its Adobe, Heineken, Bank of America and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in the, in the market with some of the cloud partners like Google, Amazon, Snowflake, Data Breaks, and and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new but I think there's a lot of reasons why we're so excited about quality and observability now. One, is around leveraging AI machine learning again to drive more automation. And a second is that those data pipelines, that are now being created in the cloud, in these modern data architecture, architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously, has become absolutely critical so that they're really excited about, about that as well. And on the organizational side, I'm sure you've heard the term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations, and so that aligns really well with our vision and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know Kirk Haslbeck and, and their team. It's interesting, you know the whole data quality used to be this back office function and and really confined to highly regulated industries. It's come to the front office, it's top of mind for Chief Data Officers. Data mesh, you mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So, let's chat a little bit about the, the products. We're going to go deeper into products later on, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes, as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We we're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission. Either customers are still start, are just starting on that, on that journey. We want to make it as easy as possible for the, for organization to actually get started, because we know that's important that they do. And for our organization and customers, that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again to make it easier for, really to, to accomplish that mission and vision around that Data Citizen, that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that, as clear becomes this kind of mission critical enterprise platform, from a security performance, architecture scale supportability, that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One, is around data marketplace. Again, a lot of our customers have plans in that direction, How to make it easy? How do we make How do we make available to true kind of shopping experience? So that anybody in the organization can, in a very easy search first way, find the right data product, find the right dataset, that they can then consume. Usage analytics, how do you, how do we help organizations drive adoption? Tell them where they're working really well and where they have opportunities. Homepages again to, to make things easy for, for people, for anyone in your organization, to kind of get started with Collibra. You mentioned Workflow Designer, again, we have a very powerful enterprise platform, one of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a, a new Low-Code, No-Code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around Collibra protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PIA data, is managed as a much more effective, effective rate. Really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily, and quickly, and widely as we can? Moving that to the cloud has been a big part of our strategy. So, we launch our data quality cloud product, as well as making use of those, those native compute capabilities and platforms, like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down, so we're actually pushing down the computer and data quality, to monitoring into the underlying platform, which again from a scale performance and ease of use perspective, is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So that's a lot coming out, the team has been work, at work really hard, and we are really really excited about what we are coming, what we're bringing to market. >> Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you you talked about, you know, the marketplace, you know you think about Data Mesh, you think of data as product, one of the key principles, you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been, been so hard. So, how do you see sort of the future and, you know give us the, your closing thoughts please? >> Yeah, absolutely. And, and I think we we're really at a pivotal moment and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to, deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can, as kind of our, as our mission. And so I'm really, really excited to see what we, what we are going to, how the marks are going to evolve over the next, next few quarters and years. I think the trend is clearly there. We talked about Data Mesh, this kind of federated approach focus on data products, is just another signal that we believe, that a lot of our organization are now at the time, they're understanding need to go beyond just the technology. I really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now is the time, given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now is the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much. Good luck in, in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in-person event and and of course the content post-event is going to be available at collibra.com and you can of course catch theCUBE coverage at theCUBE.net and all the news at siliconangle.com. This is Dave Vellante for theCUBE, your leader in enterprise and emerging tech coverage. (upbeat techno music)

Published Date : Nov 2 2022

SUMMARY :

and the premise that we put for having me again. in the data landscape of the 2020s, and to scale with data, and what are you doing to And kind of here we are again, still in the early days a lot of momentum in the org in the, And of course we see you at all the shows. is the ability to the technology to work and now is the time we need to look of data won't be like the and of course the content

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Heineken	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Van de Maele	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Laura Sellers	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
San Diego	LOCATION	0.99+
Stan Christiaens	PERSON	0.99+
Dave	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
7	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
2020s	DATE	0.99+
last year	DATE	0.99+
2010s	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
Python	TITLE	0.99+
Last year	DATE	0.99+
12 months	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
one	QUANTITY	0.99+
Data Citizens	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Owl DQ	ORGANIZATION	0.98+
10	DATE	0.98+
OwlDQ	ORGANIZATION	0.98+
Kirk Haslbeck	PERSON	0.98+
10 years	QUANTITY	0.98+
One	QUANTITY	0.98+
Spark	TITLE	0.98+
today	DATE	0.98+
first	QUANTITY	0.97+
Data Citizens	EVENT	0.97+
earlier this year	DATE	0.96+
Tensorflow	TITLE	0.96+
Data Citizens 22	ORGANIZATION	0.95+
both	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
15 years ago	DATE	0.93+
over 600 enterprise customers	QUANTITY	0.91+
past couple of years	DATE	0.91+
about 18 months ago	DATE	0.9+
collibra.com	OTHER	0.89+
Data citizens 2021	ORGANIZATION	0.88+
Data Citizens 2022	EVENT	0.86+
almost 15 years later	DATE	0.85+
West	LOCATION	0.85+
Azure	TITLE	0.84+
first way	QUANTITY	0.83+
Vice President	PERSON	0.83+
last couple of years	DATE	0.8+

Kirk Haslbeck, Collibra, Data Citizens 22

(atmospheric music) >> Welcome to theCUBE Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Haslbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you, welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now? And why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before, and the variety has changed and the volume has grown. And while I think that remains true there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave you and I were going to build a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be, what those incidents would look like. Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, kind of starts there, where everybody's realizing that we're all data companies, and if we are using bad data we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him cause I didn't know. And I used to ask those types of questions about any lock breaks and cylinders, and if it's manual or automatic. And I realized, I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality, and for years we did the MIT, CDO, IQ event. We didn't do it last year at COVID, messed everything up. But the observation I would make there, your thoughts is, data quality used to be information quality, used to be this back office function, and then it became sort of front office with financial services, and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well they sort of flipped the bit from sort of a data as a risk to data as an asset. And now as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's so fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app, and I check out the Nasdaq market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources, and data providers, and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is, we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting, and why I think the CDO is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale, and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing, a lot of the topic and areas changing. The way that I like to say it or break it down because the lingo is constantly moving, as a target on the space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong, and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data, and why would that matter? Well, if your data never arrived, or only part of it arrived, or didn't arrive on time, it's likely stale, and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data. But it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff? Made the acquisition, you got data quality, observability coming together. You guys have a lot of expertise in this area, but you hear providence of data. You just talked about stale data, the whole trend toward realtime. How is Collibra approaching the problem and what's unique about your approach? >> Well I think where we're fortunate is with our background. Myself and team, we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with, before it was called data observability or reliability, was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights. And they want to see break records and breaking trends together, so they can correlate the root cause. And we hear that all the time. "I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact." So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows you can actually achieve total data governance. At this point with the acquisition of what was a Lineage company years ago, and then my company OwlDQ, now Collibra Data Quality. Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was. They would just say, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you got to announce new products, right? It is your yearly event. What's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's this, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and BigQuery, and Databricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook into these databases, and while we've always worked with the same databases in the past they're supported today. We're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now? Is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peak for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described, you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, Databricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool. But then Google's got the open data cloud. If you heard, Google next. And now Databricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way, up until now I'm hearing, to really understand the relationships between all those and have confidence across, it's like yamarket AMI, you should just be a note on the mesh. I don't care if it's a data warehouse or a data lake, or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And it's, for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into BigQuery tomorrow using our tool, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage in access, privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because, we've got talks about well the cloud guys you're going to own the world. And of course now we're seeing that the ecosystem is finding so much white space to add value connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts on the trends that we've talked about and data Citizens 22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there they want to know where everything is, where their sensitive data is, if it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SaaS based solutions, and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent. All right, Kirk Haslbeck, thanks so much for coming on theCUBE and previewing Data Citizens 22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right. And thank you for watching. Keep it right there for more coverage from theCUBE. (atmospheric music)

Published Date : Nov 2 2022

SUMMARY :

Kirk, good to see you, welcome. Excited to be here. And now you lead data quality at Collibra. And it's so complex that the And now as we say, we're going and I check out the Nasdaq market cap. of the thing that you're observing and what's unique about your approach? ahead of the curve there and some examples, And the one right now is these and has the proper lineage, providence. and get the answers. And of course now we're and solutions that hopefully and previewing Data Citizens 22. And thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2010	DATE	0.99+
Kirk Haslbeck	PERSON	0.99+
one	QUANTITY	0.99+
OwlDQ	ORGANIZATION	0.99+
Kirk	PERSON	0.99+
50 day	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
10 day	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
last year	DATE	0.99+
Collibra Data Quality	ORGANIZATION	0.99+
next week	DATE	0.99+
Data Citizens	ORGANIZATION	0.99+
tomorrow	DATE	0.98+
two other things	QUANTITY	0.98+
BigQuery	TITLE	0.98+
five seconds	QUANTITY	0.98+
one click	QUANTITY	0.97+
today	DATE	0.97+
Collibra	TITLE	0.96+
Wall Street	LOCATION	0.96+
SQL Pushdown	TITLE	0.94+
Data Citizens 22	ORGANIZATION	0.93+
COVID	ORGANIZATION	0.93+
Snowflake	TITLE	0.91+
Nasdaq	ORGANIZATION	0.9+
Data Citizens 22	ORGANIZATION	0.89+
Delta Lake	TITLE	0.89+
Egress	ORGANIZATION	0.89+
MIT	EVENT	0.89+
more than a couple	QUANTITY	0.87+
a decade ago	DATE	0.85+
zero	QUANTITY	0.84+
Citizens	ORGANIZATION	0.83+
Data Citizens 2022 Collibra	EVENT	0.83+
years	DATE	0.81+
thousands of data	QUANTITY	0.8+
Data Citizens 22	TITLE	0.78+
two domain experts	QUANTITY	0.77+
Snowflake	ORGANIZATION	0.76+
IQ	EVENT	0.76+
couple	QUANTITY	0.75+
Collibra	PERSON	0.75+
theCUBE	ORGANIZATION	0.71+
many numbers	QUANTITY	0.7+
Vice President	PERSON	0.68+
Lineage	ORGANIZATION	0.66+
Databricks	TITLE	0.64+
too long ago	DATE	0.62+
three	QUANTITY	0.6+
Data	ORGANIZATION	0.57+
CDO	EVENT	0.53+
minute	QUANTITY	0.53+
CDO	TITLE	0.53+
number	QUANTITY	0.51+
AMI	ORGANIZATION	0.44+
Quality	PERSON	0.43+

Kirk Haslbeck, Collibra | Data Citizens '22

(bright upbeat music) >> Welcome to theCUBE's Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Hasselbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you. Welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ and it was acquired by Collibra last year. Congratulations! And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now, and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave, you and I were going to build, you know a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be? What those incidents would look like? Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We'll probably have major financial ramifications in that sense. So, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him because 'cause I didn't know. And I used to ask those types of questions about any lock brakes and cylinders and if it's manual or automatic and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute but that's really what's setting the foreground for this observability play and why everybody's so interested, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality and for years we did the MIT CDOIQ event we didn't do it last year at COVID, messed everything up. But the observation I would make there love thoughts is it data quality used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a a risk to data as an asset. And now, as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor. But with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is listening right intently nowadays to this topic is so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So, how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the lingo is constantly moving as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong and when it doesn't, it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff made the acquisition you got data quality observability coming together, you guys have a lot of expertise in this area but you hear providence of data you just talked about stale data, the whole trend toward real time. How is Collibra approaching the problem and what's unique about your approach? >> Well, I think where we're fortunate is with our background, myself and team we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact connecting it with lineage and catalog, metadata. And as that grows, you can actually achieve total data governance. At this point, with the acquisition of what was a lineage company years ago and then my company OwlDQ, now Collibra Data Quality, Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, they just said, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens '22 that you're announcing you got to announce new products, right? Your yearly event, what's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a salike model. And we've started to hook in to these databases. And while we've always worked with the same databases in the past they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands? Did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peek for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra Data Quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, the Data Bricks. Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool but then Google's got the open data cloud. If you heard Google Nest and now Data Bricks doesn't call it the data cloud but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm hearing to really understand the relationships between all those and have confidence across, it's like (indistinct) you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into Big BigQuery tomorrow using our tool or let or say Snowflake, for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because every talks about wow the cloud guys are going to own the world and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts and on the trends that we've talked about and Data Citizens '22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board people used to know it was a zip code and nowadays with the amount of data that's out there, they want to know where everything is where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases, how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SAS-based solutions and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent, all right. Kurt Hasselbeck, thanks so much for coming on theCUBE and previewing Data Citizens '22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right, and thank you for watching. Keep it right there for more coverage from theCUBE.

Published Date : Oct 24 2022

SUMMARY :

Kirk, good to see you. Excited to be here. and it was acquired by Collibra last year. And it's so complex that the And now, as we say, we're going and I check out the NASDAQ market cap. and areas changing the and what's unique about your approach? of the curve there when most and some examples, remember and data activity happens in the database. and has the proper lineage, providence. and get the answers. and on the trends that we've talked about and solutions that hopefully and previewing Data Citizens '22. All right, and thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Kurt Hasselbeck	PERSON	0.99+
2010	DATE	0.99+
one	QUANTITY	0.99+
Kirk Hasselbeck	PERSON	0.99+
50 day	QUANTITY	0.99+
Kirk	PERSON	0.99+
10 day	QUANTITY	0.99+
OwlDQ	ORGANIZATION	0.99+
Kirk Haslbeck	PERSON	0.99+
next week	DATE	0.99+
Google	ORGANIZATION	0.99+
last year	DATE	0.99+
two sides	QUANTITY	0.99+
thousands	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
Snowflake	TITLE	0.99+
Data Citizens	ORGANIZATION	0.99+
Data Bricks	ORGANIZATION	0.99+
two other things	QUANTITY	0.98+
one click	QUANTITY	0.98+
tomorrow	DATE	0.98+
today	DATE	0.98+
five seconds	QUANTITY	0.97+
two domain	QUANTITY	0.94+
Collibra Data Quality	TITLE	0.92+
MIT CDOIQ	EVENT	0.9+
Data Citizens '22	TITLE	0.9+
Egress	ORGANIZATION	0.89+
Delta Lake	TITLE	0.89+
three	QUANTITY	0.86+
zero	QUANTITY	0.85+
Big Query	TITLE	0.85+
about a decade ago	DATE	0.85+
SQL Pushdown	TITLE	0.83+
Data Citizens 2022 Collibra	EVENT	0.82+
Big BigQuery	TITLE	0.81+
more than a couple	QUANTITY	0.79+
couple	QUANTITY	0.78+
one big	QUANTITY	0.77+
Collibra Data Quality	ORGANIZATION	0.75+
Collibra	OTHER	0.75+
Google Nest	ORGANIZATION	0.75+
Data Citizens '22	ORGANIZATION	0.74+
zero latency	QUANTITY	0.72+
SAS	ORGANIZATION	0.71+
Snowflake	ORGANIZATION	0.69+
COVID	ORGANIZATION	0.69+
years ago	DATE	0.68+
Wall Street	LOCATION	0.66+
theCUBE	ORGANIZATION	0.66+
many numbers	QUANTITY	0.63+
Collibra	PERSON	0.63+
times	QUANTITY	0.61+
Data	ORGANIZATION	0.61+
too long	DATE	0.6+
Vice President	PERSON	0.57+
data	QUANTITY	0.56+
CDO	TITLE	0.52+
Bricks	TITLE	0.48+

Mitesh Shah, Alation & Ash Naseer, Warner Bros Discovery | Snowflake Summit 2022

(upbeat music) >> Welcome back to theCUBE's continuing coverage of Snowflake Summit '22 live from Caesar's Forum in Las Vegas. I'm Lisa Martin, my cohost Dave Vellante, we've been here the last day and a half unpacking a lot of news, a lot of announcements, talking with customers and partners, and we have another great session coming for you next. We've got a customer and a partner talking tech and data mash. Please welcome Mitesh Shah, VP in market strategy at Elation. >> Great to be here. >> and Ash Naseer great, to have you, senior director of data engineering at Warner Brothers Discovery. Welcome guys. >> Thank you for having me. >> It's great to be back in person and to be able to really get to see and feel and touch this technology, isn't it? >> Yeah, it is. I mean two years or so. Yeah. Great to feel the energy in the conference center. >> Yeah. >> Snowflake was virtual, I think for two years and now it's great to kind of see the excitement firsthand. So it's wonderful. >> Th excitement, but also the boom and the number of customers and partners and people attending. They were saying the first, or the summit in 2019 had about 1900 attendees. And this is around 10,000. So a huge jump in a short time period. Talk a little bit about the Elation-Snowflake partnership and probably some of the acceleration that you guys have been experiencing as a Snowflake partner. >> Yeah. As a snowflake partner. I mean, Snowflake is an investor of us in Elation early last year, and we've been a partner for, for longer than that. And good news. We have been awarded Snowflake partner of the year for data governance, just earlier this week. And that's in fact, our second year in a row for winning that award. So, great news on that front as well. >> Repeat, congratulations. >> Repeat. Absolutely. And we're going to hope to make it a three-peat as well. And we've also been awarded industry competency badges in five different industries, those being financial services, healthcare, retail technology, and Median Telcom. >> Excellent. Okay. Going to right get into it. Data mesh. You guys actually have a data mesh and you've presented at the conference. So, take us back to the beginning. Why did you decide that you needed to implement something like data mesh? What was the impetus? >> Yeah. So when people think of Warner brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of HBO, you think of TNT, you think of CNN, we have 30 plus brands in our portfolio and each have their own needs. So the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against as an example, HBO, if Game of Thrones is going on. >> So, okay. So the, the impetus was to serve those lines of business better. Actually, given that you've got these different brands, it was probably easier than most companies. Cause if you're, let's say you're a big financial services company, and now you have to decide who owns what. CNN owns its own data products, HBO. Now, do they decide within those different brands, how to distribute even further? Or is it really, how deep have you gone in that decentralization? >> That's a great question. It's a very close partnership, because there are a number of data sets, which are used by all the brands, right? You think about people browsing websites, right? You know, CNN has a website, Warner brothers has a website. So for us to ingest that data for each of the brands to ingest that data separately, that means five different ways of doing things and you know, a big environment, right? So that is where our team comes into play. We ingest a lot of the common data sets, but like I said, any unique data sets, data sets regarding theatrical as an example, you know, Warner brothers does it themselves, you know, for streaming, HBO Max, does it themselves. So we kind of operate in partnership. >> So do you have a centralized data team and also decentralized data teams, right? >> That's right. >> So I love this conversation because that was heresy 10 years ago, five years ago, even, cause that's inefficient. But you've, I presume you've found that it's actually more productive in terms of the business output, explain that dynamic. >> You know, you bring up such a good point. So I, you know, I consider myself as one of the dinosaurs who started like 20 plus years ago in this industry. And back then, we were all taught to think of the data warehouse as like a monolithic thing. And the reason for that is the technology wasn't there. The technology didn't catch up. Now, 20 years later, the technology is way ahead, right? But like, our mindset's still the same because we think of data warehouses and data platforms still as a monolithic thing. But if you really sort of remove that sort of mental barrier, if you will, and if you start thinking about, well, how do I sort of, you know, federate everything and make sure that you let folks who are building, or are closest to the customer or are building their products, let them own that data and have a partnership. The results have been amazing. And if we were only sort of doing it as a centralized team, we would not be able to do a 10th of what we do today. So it's that massive scale in, in our company as well. >> And I should have clarified, when we talk about data mesh are we talking about the implementing in practice, the octagon sort of framework, or is this sort of your own sort of terminology? >> Well, so the interesting part is four years ago, we didn't have- >> It didn't exist. >> Yeah. It didn't exist. And, and so we, our principle was very simple, right? When we started out, we said, we want to make sure that our brands are able to operate independently with some oversight and guidance from our technology teams, right? That's what we set out to do. We did that with Snowflake by design because Snowflake allows us to, you know, separate those, those brands into different accounts. So that was done by design. And then the, the magic, I think, is the Snowflake data sharing where, which allows us to sort of bring data in here once, and then share it with whoever needs it. So think about HBO Max. On HBO Max, You not only have HBO Max content, but content from CNN, from Cartoon Network, from Warner Brothers, right? All the movies, right? So to see how The Batman movie did in theaters and then on streaming, you don't need, you know, Warner brothers doesn't need to ingest the same streaming data. HBO Max does it. HBO Max shares it with Warner brothers, you know, store once, share many times, and everyone works at their own pace. >> So they're building data products. Those data products are discoverable APIs, I presume, or I guess maybe just, I guess the Snowflake cloud, but very importantly, they're governed. And that's correct, where Elation comes in? >> That's precisely where Elation comes in, is where sort of this central flexible foundation for data governance. You know, you mentioned data mesh. I think what's interesting is that it's really an answer to the bottlenecks created by centralized IT, right? There's this notion of decentralizing that the data engineers and making the data domain owners, the people that know the data the best, have them be in control of publishing the data to the data consumers. There are other popular concepts actually happening right now, as we speak, around modern data stack. Around data fabric that are also in many ways underpinned by this notion of decentralization, right? These are concepts that are underpinned by decentralization and as the pendulum swings, sort of between decentralization and centralization, as we go back and forth in the world of IT and data, there are certain constants that need to be centralized over time. And one of those I believe is very much a centralized platform for data governance. And that's certainly, I think where we come in. Would love to hear more about how you use Elation. >> Yeah. So, I mean, elation helps us sort of, as you guys say, sort of, map, the treasure map of the data, right? So for consumers to find where their data is, that's where Elation helps us. It helps us with the data cataloging, you know, storing all the metadata and, you know, users can go in, they can sort of find, you know, the data that they need and they can also find how others are using data. So it's, there's a little bit of a crowdsourcing aspect that Elation helps us to do whereby you know, you can see, okay, my peer in the other group, well, that's how they use this piece of data. So I'm not going to spend hours trying to figure this out. You're going to use the query that they use. So yeah. >> So you have a master catalog, I presume. And then each of the brands has their own sub catalogs, is that correct? >> Well, for the most part, we have that master catalog and then the brands sort of use it, you know, separately themselves. The key here is all that catalog, that catalog isn't maintained by a centralized group as well, right? It's again, maintained by the individual teams and not only in the individual teams, but the folks that are responsible for the data, right? So I talked about the concept of crowdsourcing, whoever sort of puts the data in, has to make sure that they update the catalog and make sure that the definitions are there and everything sort of in line. >> So HBO, CNN, and each have their own, sort of access to their catalog, but they feed into the master catalog. Is that the right way to think about it? >> Yeah. >> Okay. And they have their own virtual data warehouses, right? They have ownership over that? They can spin 'em up, spin 'em down as they see fit? Right? And they're governed. >> They're governed. And what's interesting is it's not just governed, right? Governance is a, is a big word. It's a bit nebulous, but what's really being enabled here is this notion of self-service as well, right? There's two big sort of rockets that need to happen at the same time in any given organization. There's this notion that you want to put trustworthy data in the hands of data consumers, while at the same time mitigating risk. And that's precisely what Elation does. >> So I want to clarify this for the audience. So there's four principles of database. This came after you guys did it. And I wonder how it aligns. Domain ownership, give data, as you were saying to the, to the domain owners who have context, data as product, you guys are building data products, and that creates two problems. How do you give people self-service infrastructure and how do you automate governance? So the first two, great. But then it creates these other problems. Does that align with your philosophy? Where's alignment? What's different? >> Yeah. Data products is exactly where we're going. And that sort of, that domain based design, that's really key as well. In our business, you think about who the customer is, as an example, right? Depending on who you ask, it's going to be, the answer might be different, you know, to the movie business, it's probably going to be the person who watches a movie in a theater. To the streaming business, to HBO Max, it's the streamer, right? To others, someone watching live CNN on their TV, right? There's yet another group. Think about all the franchising we do. So you see Batman action figures and T-shirts, and Warner brothers branded stuff in stores, that's yet another business unit. But at the end of the day, it's not a different person, it's you and me, right? We do all these things. So the domain concept, make sure that you ingest data and you bring data relevant to the context, however, not sort of making it so stringent where it cannot integrate, and then you integrate it at a higher level to create that 360. >> And it's discoverable. So the point is, I don't have to go tap Ash on the shoulder, say, how do I get this data? Is it governed? Do I have access to it? Give me the rules of it. Just, I go grab it, right? And the system computationally automates whether or not I have access to it. And it's, as you say, self-service. >> In this case, exactly right. It enables people to just search for data and know that when they find the data, whether it's trustworthy or not, through trust flags, and the like, it's doing both of those things at the same time. >> How is it an enabler of solving some of the big challenges that the media and entertainment industry is going through? We've seen so much change the last couple of years. The rising consumer expectations aren't going to go back down. They're only going to come up. We want you to serve us up content that's relevant, that's personalized, that makes sense. I'd love to understand from your perspective, Mitesh, from an industry challenges perspective, how does this technology help customers like Warner Brothers Discovery, meet business customers, where they are and reduce the volume on those challenges? >> It's a great question. And as I mentioned earlier, we had five industry competency badges that were awarded to us by Snowflake. And one of those four, Median Telcom. And the reason for that is we're helping media companies understand their audiences better, and ultimately serve up better experiences for their audiences. But we've got Ash right here that can tell us how that's happening in practice. >> Yeah, tell us. >> So I'll share a story. I always like to tell stories, right? Once once upon a time before we had Elation in place, it was like, who you knew was how you got access to the data. So if I knew you and I knew you had access to a certain kind of data and your access to the right kind of data was based on the network you had at the company- >> I had to trust you. >> Yeah. >> I might not want to give up my data. >> That's it. And so that's where Elation sort of helps us democratize it, but, you know, puts the governance and controls, right? There are certain sensitive things as well, such as viewership, such as subscriber accounts, which are very important. So making sure that the right people have access to it, that's the other problem that Elation helps us solve. >> That's precisely part of our integration with Snowflake in particular, being able to define and manage policies within Elation. Saying, you know, certain people should have access to certain rows, doing column level masking. And having those policies actually enforced at the Snowflake data layer is precisely part of our value product. >> And that's automated. >> And all that's automated. Exactly. >> Right. So I don't have to think about it. I don't have to go through the tap on their shoulder. What has been the impact, Ash, on data quality as you've pushed it down into the domains? >> That's a great question. So it has definitely improved, but data quality is a very interesting subject, because back to my example of, you know, when we started doing things, we, you know, the centralized IT team always said, well, it has to be like this, Right? And if it doesn't fit in this, then it's bad quality. Well, sometimes context changes. Businesses change, right? You have to be able to react to it quickly. So making sure that a lot of that quality is managed at the decentralized level, at the place where you have that business context, that ensures you have the most up to date quality. We're talking about media industry changing so quickly. I mean, would we have thought three years ago that people would watch a lot of these major movies on streaming services? But here's the reality, right? You have to react and, you know, having it at that level just helps you react faster. >> So data, if I play that back, data quality is not a static framework. It's flexible based on the business context and the business owners can make those adjustments, cause they own the data. >> That's it. That's exactly it. >> That's awesome. Wow. That's amazing progress that you guys have made. >> In quality, if I could just add, it also just changes depending on where you are in your data pipeline stage, right? Data, quality data observability, this is a very fast evolving space at the moment, and if I look to my left right now, I bet you I can probably see a half-dozen quality observability vendors right now. And so given that and given the fact that Elation still is sort of a central hub to find trustworthy data, we've actually announced an open data quality initiative, allowing for best-of-breed data quality vendors to integrate with the platform. So whoever they are, whatever tool folks want to use, they can use that particular tool of choice. >> And this all runs in the cloud, or is it a hybrid sort of? >> Everything is in the cloud. We're all in the cloud. And you know, again, helps us go faster. >> Let me ask you a question. I could go on forever in this topic. One of the concepts that was put forth is whether it's a Snowflake data warehouse or a data bricks, data lake, or an Oracle data warehouse, they should all be inclusive. They should just be a node on the mesh. Like, wow, that sounds good. But I haven't seen it yet. Right? I'm guessing that Snowflake and Elation enable all the self-serve, all this automated governance, and that including those other items, it's got to be a one-off at this point in time. Do you ever see you expanding that scope or is it better off to just kind of leave it into the, the Snowflake data cloud? >> It's a good question. You know, I feel like where we're at today, especially in terms of sort of technology giving us so many options, I don't think there's a one size fits all. Right? Even though we are very heavily invested in Snowflake and we use Snowflake consistently across the organization, but you could, theoretically, could have an architecture that blends those two, right? Have different types of data platforms like a teradata or an Oracle and sort of bring it all together today. We have the technology, you know, that and all sorts of things that can make sure that you query on different databases. So I don't think the technology is the problem, I think it's the organizational mindset. I think that that's what gets in the way. >> Oh, interesting. So I was going to ask you, will hybrid tables help you solve that problem? And, maybe not, what you're saying, it's the organization that owns the Oracle database saying, Hey, we have our system. It processes, it works, you know, go away. >> Yeah. Well, you know, hybrid tables I think, is a great sort of next step in Snowflake's evolution. I think it's, in my opinion, I, think it's a game changer, but yeah. I mean, they can still exist. You could do hybrid tables right on Snowflake, or you could, you know, you could kind of coexist as well. >> Yeah. But, do you have a thought on this? >> Yeah, I do. I mean, we're always going to live in a time where you've got data distributed in throughout the organization and around the globe. And that could be even if you're all in on Snowflake, you could have data in Snowflake here, you could have data in Snowflake in EMEA and Europe somewhere. It could be anywhere. By the same token you might be using. Every organization is using on-premises systems. They have data, they naturally have data everywhere. And so, you know, this one solution to this is really centralizing, as I mentioned, not just governance, but also metadata about all of the data in your organization so that you can enable people to search and find and discover trustworthy data no matter where it is in your organization. >> Yeah. That's a great point. I mean, if you have the data about the data, then you can, you can treat these independent nodes. That's just that. Right? And maybe there's some advantages of putting it all in the Snowflake cloud, but to your point, organizationally, that's just not feasible. The whole, unfortunately, sorry, Snowflake, all the world's data is not going to go into Snowflake, but they play a key role in accelerating, what I'm hearing, your vision of data mesh. >> Yeah, absolutely. I think going forward in the future, we have to start thinking about data platforms as just one place where you sort of dump all the data. That's where the mesh concept comes in. It is going to be a mesh. It's going to be distributed and organizations have to be okay with that. And they have to embrace the tools. I mean, you know, Facebook developed a tool called Presto many years ago that that helps them solve exactly the same problem. So I think the technology is there. I think the organizational mindset needs to evolve. >> Yeah. Definitely. >> Culture. Culture is one of the hardest things to change. >> Exactly. >> Guys, this was a masterclass in data mesh, I think. Thank you so much for coming on talking. >> We appreciate it. Thank you so much. >> Of course. What Elation is doing with Snowflake and with Warner Brothers Discovery, Keep that content coming. I got a lot of stuff I got to catch up on watching. >> Sounds good. Thank you for having us. >> Thanks guys. >> Thanks, you guys. >> For Dave Vellante, I'm Lisa Martin. You're watching theCUBE live from Snowflake Summit '22. We'll be back after a short break. (upbeat music)

Published Date : Jun 30 2022

SUMMARY :

session coming for you next. and Ash Naseer great, to have you, in the conference center. and now it's great to kind of see the acceleration that you guys have of the year for data And we've also been awarded Why did you decide that you So the idea of a data mesh Or is it really, how deep have you gone the brands to ingest that data separately, terms of the business and make sure that you let allows us to, you know, separate those, guess the Snowflake cloud, of decentralizing that the data engineers the data cataloging, you know, storing all So you have a master that are responsible for the data, right? Is that the right way to think about it? And they're governed. that need to happen at the So the first two, great. the answer might be different, you know, So the point is, It enables people to just search that the media and entertainment And the reason for that is So if I knew you and I knew that the right people have access to it, Saying, you know, certain And all that's automated. I don't have to go through You have to react and, you know, It's flexible based on the That's exactly it. that you guys have made. and given the fact that Elation still And you know, again, helps us go faster. a node on the mesh. We have the technology, you that owns the Oracle database saying, you know, you could have a thought on this? And so, you know, this one solution I mean, if you have the I mean, you know, the hardest things to change. Thank you so much for coming on talking. Thank you so much. of stuff I got to catch up on watching. Thank you for having us. from Snowflake Summit '22.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
CNN	ORGANIZATION	0.99+
HBO	ORGANIZATION	0.99+
Mitesh Shah	PERSON	0.99+
Ash Naseer	PERSON	0.99+
Europe	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
Mitesh	PERSON	0.99+
Elation	ORGANIZATION	0.99+
TNT	ORGANIZATION	0.99+
Warner brothers	ORGANIZATION	0.99+
EMEA	LOCATION	0.99+
second year	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
2019	DATE	0.99+
two years	QUANTITY	0.99+
one	QUANTITY	0.99+
Cartoon Network	ORGANIZATION	0.99+
Game of Thrones	TITLE	0.99+
two problems	QUANTITY	0.99+
two	QUANTITY	0.99+
Warner Brothers	ORGANIZATION	0.99+
10th	QUANTITY	0.99+
first	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Snowflake Summit '22	EVENT	0.99+
Warner brothers	ORGANIZATION	0.99+
each	QUANTITY	0.99+
four	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Median Telcom	ORGANIZATION	0.99+
20 years later	DATE	0.98+
both	QUANTITY	0.98+
five different industries	QUANTITY	0.98+
10 years ago	DATE	0.98+
30 plus brands	QUANTITY	0.98+
Alation	PERSON	0.98+
four years ago	DATE	0.98+
today	DATE	0.98+
20 plus years ago	DATE	0.97+
Warner Brothers Discovery	ORGANIZATION	0.97+
One	QUANTITY	0.97+
five years ago	DATE	0.97+
Snowflake Summit 2022	EVENT	0.97+
three years ago	DATE	0.97+
five different ways	QUANTITY	0.96+
earlier this week	DATE	0.96+
Snowflake	TITLE	0.96+
Max	TITLE	0.96+
early last year	DATE	0.95+
about 1900 attendees	QUANTITY	0.95+
Snowflake	EVENT	0.94+
Ash	PERSON	0.94+
three-peat	QUANTITY	0.94+
around 10,000	QUANTITY	0.93+

Mitesh Shah, Alation & Ash Naseer, Warner Bros Discovery | Snowflake Summit 2022

Published Date : Jun 15 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
CNN	ORGANIZATION	0.99+
HBO	ORGANIZATION	0.99+
Mitesh Shah	PERSON	0.99+
Ash Naseer	PERSON	0.99+
Europe	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
Mitesh	PERSON	0.99+
Elation	ORGANIZATION	0.99+
TNT	ORGANIZATION	0.99+
Warner brothers	ORGANIZATION	0.99+
EMEA	LOCATION	0.99+
second year	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
2019	DATE	0.99+
two years	QUANTITY	0.99+
one	QUANTITY	0.99+
Cartoon Network	ORGANIZATION	0.99+
Game of Thrones	TITLE	0.99+
two problems	QUANTITY	0.99+
two	QUANTITY	0.99+
Warner Brothers	ORGANIZATION	0.99+
10th	QUANTITY	0.99+
first	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Snowflake Summit '22	EVENT	0.99+
Warner brothers	ORGANIZATION	0.99+
each	QUANTITY	0.99+
four	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Median Telcom	ORGANIZATION	0.99+
20 years later	DATE	0.98+
both	QUANTITY	0.98+
five different industries	QUANTITY	0.98+
10 years ago	DATE	0.98+
30 plus brands	QUANTITY	0.98+
Alation	PERSON	0.98+
four years ago	DATE	0.98+
today	DATE	0.98+
20 plus years ago	DATE	0.97+
Warner Brothers Discovery	ORGANIZATION	0.97+
One	QUANTITY	0.97+
five years ago	DATE	0.97+
Snowflake Summit 2022	EVENT	0.97+
three years ago	DATE	0.97+
five different ways	QUANTITY	0.96+
earlier this week	DATE	0.96+
Snowflake	TITLE	0.96+
Max	TITLE	0.96+
early last year	DATE	0.95+
about 1900 attendees	QUANTITY	0.95+
Snowflake	EVENT	0.94+
Ash	PERSON	0.94+
three-peat	QUANTITY	0.94+
around 10,000	QUANTITY	0.93+

Christian Wiklund, unitQ | CUBE Conversation

>>Welcome everyone to this cube conversation featuring unit Q. I'm your host, Lisa Martin. And we are excited to be joined by Christian Vickle, the founder and CEO of unit Q Christian. Thank you so much for joining me today. >>Thank you so much, Lisa pleasure to be here. >>Let's talk a little bit about unit Q. You guys were founded in 2018, so pretty recent. What is it that unit Q does. And what were some of the gaps in the market that led you to founding the company? >>Yep. So me and my co-founder Nick, we're actually doing our second company now is the unit Q is number two, and our first company was called scout years ago. We were back ES wicks and it was very different from unit Q. It's a social network for meeting people. And it was really during that experience where we saw the impact that quality of the experience quality of the product can have on your growth trajectory and the challenges we faced. How do we test everything before we ship it? And in reality, a modern company will have, let's say, 20 languages supported you support Android, Iowas, web big screen, small screen, you have 20 plus integrations and you have lots of different devices out there that might run your binary a little differently. So who is the ultimate test group of all of these different permutation and that's the end user. >>And we, we saw the, the big gap in the market, sort of the dream platform for us was unit queue. So if, if this would've existed back in the day, we would've been a, a happy purchaser and customer, and it really comes down to how do we, how do we harness the power of user feedback? You know, the end user, that's testing your product every single day in all different configurations. And then they're telling you that, Hey, something didn't work for me. I got double build or the passive recent link didn't work, or I couldn't, you know, when music, when the ad is finished playing on, on my app, the music doesn't resume. So how do we capture those signals into something that the company and different teams can align on? So that's where, you know, unit Q the, the vision here is to build a quality company, to help other companies build higher quality products. >>So really empowering companies to take a data driven approach to product quality. I was looking on your website and noticed that Pandora is one of your customers, but talk to me a little bit about a customer example that you think really articulates the value of what Q unit he was delivering. >>Right? So maybe we should just go back one little step and talk about what is quality. And I think quality is something that is, is a bit subjective. It's something that we live and breathe every day. It's something that can be formed in an instant first impressions. Last it's something that can be built over time that, Hey, I'm using this product and it's just not working for me. Maybe it's missing features. Maybe there are performance related bots. Maybe there is there's even fulfillment related issues. Like we work with Uber and hello, fresh and, and other types of more hybrid type companies in addition to the Pandoras and, and Pinterest and, and Spotify, and these more digital, only products, but the, the end users I'm producing this data, the reporting, what is working and not working out there in many different channels. So they will leave app produce. >>They will write into support. They might engage with a chat support bot. They will post stuff on Reddit on Twitter. They will comment on Facebook ads. So like this data is dispersed everywhere. The end user is not gonna fill out a perfect bug report in a form somewhere that gets filed into gr like they're, they're producing this content everywhere in different languages. So the first value of what we do is to just ingest all of that data. So all the entire surface area of use of feedback, we ingest into a machine and then we clean the data. We normalize it, and then we translate everything into English. And it was actually a surprise to us when we started this company, that there are quite a few companies out there that they're only looking at feedback in English. So what about my Spanish speaking users? What about my French speaking users? >>And when, when, when that is done, like when all of that data is, is need to organized, we extract signals from that around what is impacting the user experience right now. So we break these, all of this data down into something called quality monitors. So quality monitor is basically a topic which can be again, passive reset, link noting, or really anything that that's impacting the end user. And the important part here is that we need to have specific actionable data. For instance, if I tell you, Hey, Lisa music stops playing is a growing trend that our users are reporting. You will tell me, well, what can I do with that? Like what specifically is breaking? So we deploy up to 1500 unique quality monitors per customer. So we can then alert different teams inside of the organization of like, Hey, something broke and you should take a look at it. >>So it's really breaking down data silos within the company. It aligns cross-functional teams to agree on what should be fixed next. Cause there's typically a lot of confusion, you know, marketing, they might say, Hey, we want this fixed engineering. They're like, well, I can't reproduce, or that's not a high priority for us. The support teams might also have stuff that they want to get fixed. And what we've seen is that these teams, they struggle to communicate. So how do we align them around the single source of truth? And I think that's for unit two is early identification of stuff. That's not working in production and it's also aligning the teams so they can quickly triage and say, yes, we gotta fix this right before it snowballs into something. We say, you know, we wanna, we wanna cap catch issues before you go into crisis PR mode, right? So we want to get this, we wanna address it early in the cycle. >>Talk to me about when you're in customer conversations, Christian, the MarTech landscape is competitive. There's nearly 10,000 different solutions out there, and it's growing really quickly quality monitors that you just described is that one of the key things that, that you talk to customers about, that's a differentiator for unit Q. >>Yeah. So I mean, it, it, it comes down to, as you're building your product, right, you, you have, you have a few different options. One is to build new features and we need to build new features and innovate and, and, and that's all great. We also need to make sure that the foundation of the product is working and that we keep improving quality and what, what we see with, with basically every customer that we work with, that, that when quality goes up, it's supercharges the growth machine. So quality goes up, you're gonna see less support tickets. You're gonna see less one star reviews, less one star reviews is of course good for making the store front convert better. You know, I, I want install a 4.5 star app, not a 3.9 star app. We also see that sentiment. So for those who are interested in getting that NPS score up for the next time we measure it, we see that quality is of course a very important piece of that. >>And maybe even more importantly, so sort of inside of the product machine, the different conversion steps, let's say sign up to activate it to coming back in second day, 30 day, 90 day, and so forth. We see a dramatic impact on how quality sort of moves that up and down the retention function, if you will. So it, it really, if you think about a modern company, like the product is sort of the center of the existence of the company, and if the product performs really well, then you can spend more money in marketing because it converts really good. You can hire more engineers, you can hire, you can hire more support people and so forth. So it's, it's really cool to see that when quality improves its supercharges, everything else I think for marketing it's how do you know if you're spending into a broken product or not? >>And I, and I, I feel like marketing has, they have their insights, but it's, it's not deep enough where they can go to engineering and say, Hey, these 10 issues are impacting my MPS score and they're impacting my conversion and I would love for you to fix it. And when you can bring tangible impact, when you can bring real data to, to engineering and product, they move on it cause they also wanna help build the company. And, and so I think that's, that's how we stand out from the more traditional MarTech, because we need to fix the core of, of sort of this growth engine, which is the quality of the product >>Quality of the product. And obviously that's directly related to the customer experience. And we know these days, one of the things I think that's been in short supply the last couple of years is patience. We know when customers are unhappy with the product or service, and you talked about it a minute ago, they're gonna go right to, to Reddit or other sources to complain about that. So being able to, for uniq, to help companies to improve the customer experience, isn't I think table stakes for businesses it's mission critical these days. Yeah, >>It is mission critical. So if you look at the, let's say that we were gonna start a, a music app. Okay. So how do we, how do we compete as a music app? Well, if you, if you were to analyze all different music apps out there, they have more or less the same features app. Like they, the feature differentiation is minimal. And, and if you launch a new cool feature than your competitor will probably copy that pretty quickly as well. So competing with features is really hard. What about content? Well, I'm gonna get the same content on Spotify as apple SD. So competing with content is also really hard. What about price? So it turns out you'll pay 9 99 a month for music, but there's no, there's no 1 99. It's gonna be 9 99. So quality of the experience is one of the like last vectors or areas where you can actually compete. >>And we see consistently that if you' beating your competition on quality, you will do better. Like the best companies out there also have the highest quality experience. So it's, it's been, you know, for us at our last company, measuring quality was something that was very hard. How do we talk about it? And when we started this company, I went out and talked to a bunch of CEOs and product leaders and board members. And I said, how do you talk about quality in a board meeting? And they were, they said, well, we don't, we don't have any metrics. So actually the first thing we did was to define a metrics. We have, we have this thing called this unit Q score, which is on our website as well, where we can base it's like the credit score. So you can see your score between zero and a hundred. >>And if your score is 100, it means that we're finding no quality issues in the public domain. If your score is 90, it means that 10% of the data we look at refers to a quality issue. And the definition of a quality issue is quite simple. It is when the user experience doesn't match the user expectation. There is a gap in between, and we've actually indexed the 5,000 largest apps out there. So we're then looking at all the public review. So on our website, you can go in and, and look up the unit Q score for the 5,000 largest products. And we republish these every night. So it's an operational metric that changes all the time. >>Hugely impactful. Christian, thank you so much for joining me today, talking to the audience about unit Q, how you're turning qualitative feedback into pretty significant product improvements for your customers. We appreciate your insights. >>Thank you, Lisa, have a great day. >>You as well, per Christian Lin, I'm Lisa Martin. You're watching a cube conversation.

Published Date : Jun 7 2022

SUMMARY :

And we are excited to be joined by Christian Vickle, the founder and CEO of And what were some of the gaps in the market that led you to founding the company? the challenges we faced. So that's where, you know, unit Q the, So really empowering companies to take a data driven approach to product quality. So maybe we should just go back one little step and talk about what is quality. So the first value of what we do And the important part here is that we need to have specific actionable data. So how do we align them around the single source of truth? that you just described is that one of the key things that, that you talk to customers about, that's a differentiator for unit the next time we measure it, we see that quality is of course a very important piece of that. and if the product performs really well, then you can spend more money in marketing because it converts And when you can bring tangible And we know these days, one of the things I think that's been in short supply the last couple of years is So quality of the experience is one of the like So actually the first thing we did was to So it's an operational metric that changes all the time. Christian, thank you so much for joining me today, talking to the audience about unit Q, You as well, per Christian Lin, I'm Lisa Martin.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
2018	DATE	0.99+
Christian Wiklund	PERSON	0.99+
3.9 star	QUANTITY	0.99+
10%	QUANTITY	0.99+
Nick	PERSON	0.99+
Christian Vickle	PERSON	0.99+
4.5 star	QUANTITY	0.99+
Christian	PERSON	0.99+
one star	QUANTITY	0.99+
Pandora	ORGANIZATION	0.99+
10 issues	QUANTITY	0.99+
90 day	QUANTITY	0.99+
second company	QUANTITY	0.99+
Pandoras	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Uber	ORGANIZATION	0.99+
90	QUANTITY	0.99+
30 day	QUANTITY	0.99+
100	QUANTITY	0.99+
Christian Lin	PERSON	0.99+
20 languages	QUANTITY	0.99+
second day	QUANTITY	0.99+
first	QUANTITY	0.99+
first company	QUANTITY	0.99+
Pinterest	ORGANIZATION	0.99+
Spotify	ORGANIZATION	0.99+
English	OTHER	0.99+
Facebook	ORGANIZATION	0.99+
today	DATE	0.99+
MarTech	ORGANIZATION	0.98+
one	QUANTITY	0.98+
20 plus integrations	QUANTITY	0.98+
Spanish	OTHER	0.98+
Android	TITLE	0.97+
Twitter	ORGANIZATION	0.97+
Reddit	ORGANIZATION	0.97+
9 99	QUANTITY	0.97+
5,000 largest apps	QUANTITY	0.96+
apple	ORGANIZATION	0.96+
unit Q	ORGANIZATION	0.96+
first value	QUANTITY	0.95+
first impressions	QUANTITY	0.95+
9 99 a month	QUANTITY	0.94+
One	QUANTITY	0.94+
5,000 largest products	QUANTITY	0.93+
scout	ORGANIZATION	0.86+
nearly 10,000 different solutions	QUANTITY	0.85+
single source	QUANTITY	0.85+
up to 1500 unique quality monitors	QUANTITY	0.85+
Iowas	LOCATION	0.84+
a minute ago	DATE	0.84+
unit two	QUANTITY	0.82+
ES wicks	ORGANIZATION	0.8+
French	OTHER	0.78+
years ago	DATE	0.75+
one little	QUANTITY	0.75+
zero and	QUANTITY	0.75+
a hundred	QUANTITY	0.73+
1 99	QUANTITY	0.72+
single day	QUANTITY	0.7+
last couple of years	DATE	0.68+
number two	OTHER	0.62+
unit	ORGANIZATION	0.61+
uniq	ORGANIZATION	0.55+

Dec 16th Keynote Analysis with Jeremy Burton | AWS re:Invent 2020

>>From around the globe. It's the cube with digital coverage of AWS reinvent 2020 sponsored by Intel, AWS, and our community partners. >>Hi, everyone. Welcome back to the cubes. Live coverage of AWS reinvent 2020 I'm John Farrow, your hosts. We've got the cube virtual. We're not there in person with remote this year, and we're excited to cover three weeks of wall-to-wall coverage. It's virtual events, so they don't over three weeks. We're in week three, day two. Um, and if you're watching this live on the platform tomorrow, Thursday at two o'clock Andy Jassy, we'll be live here on the cube with one-on-one with me to address all the hard questions, but here we're doing a day two of week three analysis with Jeremy Burton industry legend entrepreneur. Now the CEO of observe Inc, um, formerly the CMO of Dell technologies before that EMC has done a variety of ventures, seeing many ways of innovation, friend of the cube. Jeremy, thank you for coming on. >>Yeah, my pleasure. Great. Always great to be on the cube. >>Uh, great to have you on in particularly because, um, yesterday Verner, Vogel's talked a lot about observability and I noticed you got your observed shirt on, uh, observe Inc is your company's name, which is one of the many, uh, hot startups around observability, where you're making a business out of basically what he talked about yesterday. Um, and today's keynote. You had the extended cloud, uh, edge applications. You had bill Vass who leads up both edge and quantum. And then you had Rudy Valdez who, who talked a lot about, uh, evolution of cloud architecture. And of course you finally had, um, David Richardson, who is the VP of serverless. So you got edge. Quantum serverless architecture speaks to the sea change, Jeremy, and you have a good read on these big waves. When you look at serverless and then quantum, you look at, uh, edge, which is data, and you look at, um, all this coming together and on their architecture, Verner's keynote yesterday kind of makes sense. It's a systems architecture and this new observability trend, isn't like a point product. It's a broader concepts. You have a complete rethinking of distributed computing in the cloud. This is kinda what this Amazon feels like. What's your, what's your take? >>Yeah, it's a, it's a good observation. You know, the, the, the, the sort of punchline is, is that people are building applications differently. Um, so the, the, the, the, the technologies that people are using to build apps are different, um, the way in which they build applications is different. Um, the way folks released codes into production is different, and it stands to reason. Therefore, you're going to need a different approach, uh, when you want to troubleshoot these applications. So, uh, when you find, uh, you know, w w what is show when you want to find out what issues customers are having? So what, what we fell a couple of three years ago when we started to observe was that, um, uh, a new approach was required, what you're going to need to monitor your application. And, you know, 2020 is not the same as what you needed in 2015 or 2010. >>And we felt very strongly that this new wave was, was going to be called observability. It, it brings a tear to my eye to hear a Verner, talk about it, because as much as we observe, you know, believe that we can do big things in future. It's the big vendors today that can move markets. And so the Amazon and vulnerable particular talk about observability, I think it lends more credence to the topic. Um, we think that organizations should have observability teams. We think there should be a head of observability. And again, you know, Amazon and Dawson this, uh, I think means that there's a much stronger chance that that's going to happen. And they're going to start, start to shine a light on, I think, a topic that almost everybody needs to pay attention to as they build their next generation of applications. >>When you guys, I know you guys are launched and you have couple of campaign customers now and growing rapidly, um, well-funded, um, uh, get some great investors have found that the investors of snowflake also, um, invested in you guys. So they see this cloud trend LC snowflake when public, and I know you're on the board of snowflake as well. So, uh, you, you, you know, a little bit about what's going on with Amazon and the opportunity when you look at observability, okay, you're building a business around it. And again, you think about head of observability. That's not like a small thing when you make, put someone in charge of something. So why do you say that? I mean, what, I mean, you know, some would say, you know, Hey, it's a feature, not a company. I mean, this is two mindsets that are different. How do you address that? >>Yeah, the, the, the, the thing I'd say is, look, the number one job in America is, um, is a software engineer is writing code. The number two job is fixing it. And so, you know, th th the job think about that for a second. The job of fixing our applications is almost as big as the job of creating our applications. Uh, something has to change, right? I know the job of fixing cars is not as big as the auto industry. Why, because over time that industry has matured and there are better tools to diagnose cars. Uh, and so they're, they, they become easy to fix over time. We've, we've not made that leap with our applications. Um, the tools that the engineering team use to debug and troubleshoot their application are often still very different to what the dev ops team is using, um, which is very different to what maybe the SRE team is using. >>And so it's a huge problem in our industry. Um, really not being able to diagnose troubleshoot issues when they arise. It, it costs the industry, a fortune, it costs, you know, sort of in indirect wasted productivity of development teams, but it also costs in terms of customer experience. Um, I mean, you know, you and I both know is, look, if we're, if we're having a bad experience with maybe a new service that we're trying out online, w w we're probably going to go somewhere else. And so the there's never been like a more important time for people to invest in observing the entire environment, the entire customer experience, not only will you have happier customers, you might actually reduce the costs and improve the productivity in your engineering team as well. So I feel like the opportunity there is, is, is, is, is vast. Um, I also think longer term, um, it doesn't just apply to troubleshooting distributed applications. >>Um, I think the security systems are very related to the way we build software. Um, I mean, I think in, in, in the news in recent days, we've, we've come attuned, uh, uh, to, to software defects, um, or malware in software causing breaches and government agencies. Um, Hey, that, that could be anybody's software right there. Yeah. And so security has got a role to play in observability and the customer experience. It doesn't stop when they have a bad experience on the website. What if they complain? You know, what if a help desk ticket get, how do you track that? >>Yeah, I'm going to, I have a lot of questions for chassis tomorrow. One of them I'm going to ask him, and I want to get your thoughts on it. Cause you brought that up. And I think it's a key point, you know, building applications and supporting them and fixing them. It kind of reminds me of the old adage of, um, you know, you know, you gotta run it running the operation, 70% of the budget using to running it. If you look at what's happening and if you talk to customers and this is what I'm going to ask chassis tomorrow, Verner actually talked about, I on day two operations in his keynote. Yeah. I mean, this is Amazon they're, they're targeting builders. And so I talked to, um, a few other entrepreneurs, um, who were growing companies and some CIA CIOs and CEOs and the basic enterprises. >>They don't want to be building things like they, that's not their DNA. They don't build things like, that's not what they do. I mean, first of all, I love the builder mentality and with Amazon. Um, but they might be at a time where there might not be enough builders, Jeremy right out there. So you've got skill shortages and then ultimately are enterprises really builders. Yeah. They'll build something, but then they just run it it's. So, so at what point do they stop building or they build their own thing in the cloud and then they got to run it. So I think Amazon is going to shift quickly to day two operations, get bill, bill, bill run, run, run. >>Yeah. That's a great topic of conversation. I think what you sort of poking out is, is sort of the maturation of this digital age in the state that we're at. Um, I mean, if you, if you go back, you, you know, to, you know, 10, 10, 20 years, um, I mean, look at the mid nineties, um, there were a lot of people building custom applications, right? I mean, you know, it was innovation, it was all about building custom apps. And I think that golden era of application development whack that now, um, and, and customers in order to get competitive advantage, they are building their own applications. When you talk about digital transformation, what does that mean? Well, it means, you know, often a traditional company building a new digital experience for services that they've potentially offered in a physical way, uh, in the past. So make no mistake, P people are builders or they are writing code, they are becoming digital. >>I think what you'll find at some point as the industry's mature, some of these digital experience is become packaged. And so you can buy those off the shelf. And so there's less building required. But I think as we sit today, um, that there's probably more code been written in anger by more organizations that at any point in the last 30 years. And, and I think this is another reason why observability is so important, um, as you're building that code and as you're developing that customer experience, you want to be able to understand, um, where the issues are and, and, um, uh, like along the way, you don't want to wait until there's a, a big customer disaster on the day of you roll that, something to production before you start investigate. And you want to do that as you go. >>Yeah. And I think that's a kill. I do agree with you, by the way. I think the, there is a builder mentality, but it's probably right. But remember those days back in it, if you want to put our, our time machine hat on and go through the time machine is, you know, that was during the mainframe client server transition. And it was called spaghetti code. You know, it's like the monoliths were built and then it had to be supported and that became legacy. So I kind of see that happening today, where, um, people are moving to the cloud, they are building, but at some point you got to build your thing in the cloud. If I'm a company. And again, this isn't some dots trying to connect in real time. I got serverless, which is totally cool. I'm gonna have quantum has headroom for compute. >>I'm going to have, um, kind of a S a SOA service oriented architecture with web services, with observability. I'm gonna have all these modern apps great that, or run them. And I'm now I'm gonna shift them. Multiple clouds is so, you know, maybe the private cloud waves coming back, you're seeing telco clouds. You start to see these new tier. I won't say tier two clouds, but I mean, people will build their own cloud environment. There's no doubt as going to the cloud. And Steve Malania, Aviatrix kind of made this point yesterday in his analysis where he's like, he thinks private cloud will be back. I was just, it'll just be public cloud. People will build their own clouds and run them. >>Yeah. I feel well, what happens over time is, is the, the sort of line above which you would add value rises. So I kind of feel like, look, cloud is just going to the infrastructure. We can debate, you know, private cloud, public cloud. Is it a public cloud, or is it a private cloud served up by a public cloud provider? My view is, is look, all of that is, is, um, just going to be commodity, right? Um, it's going to be served up for an ever decreasing cost. And so then it's incumbent on organizations to innovate above that line. And, you know, 20 years ago, you know, we, we built our own data centers. Um, and now increasingly that, that seeming like a crazy idea. Um, and you know, now you can get almost all of your infrastructure from the cloud. The great thing is, I mean, look at observe. >>We have no people running data center operations, none, right? We have no people building a database, non, you know, we use snowflake in the cloud. It runs on AWS. We have, we have one dev ops, uh, engineer. And so all the people in the company right now, we're focused on adding value, helping people understand and analyze data, uh, above that line. And we just pay for a service level and, and look, uh, as time goes by, there's going to be more and more services and that line's going to rise. And so, you know, what, what I care about and what I think a lot of CEOs care about is are most of my resources innovating above that sort of value creation line, um, because that's what people are going to pay for in our business. And I think that's, what's going to represent you, you know, sort of value add for you, you know, organizations big and small. >>Yeah. That's a good point. I want to shift to the next topic and then we'll get into some observability questions I have for you and update on your company. Um, complexity has been a big theme. That's come out of all the conversations with analysts that have come on the cube, as you hear it with Amazon, a lot of undifferentiated, heavy lifting, being extracted away to your point about value layers and competing on value. Amazon continues to do that all great stuff, but some are saying, and we had said on the cube, yes, two days ago you put them complexity behind the curtain. It's still complexity, right? So, so complexity with the edge is highlighted. Uh, even though they got green, uh, I, um, edge core Greengrass, which has core thing, IOT core, a lot of cool things happening, but it's still not yet super easy. So complexity tends to slow things down became striction, what's your view on this? Because taming, the complexity seems to be a post COVID pandemic mandate for cloud journeys. What's your thing. >>Yeah, I totally agree. I think, I think in certainly you look organizations that have been in existence, but you know, 30, 40 years, or maybe even 10 years look at there's an amount of technical debt and complexity that you build up over time. Um, but even newer companies, um, the way that people are building modern distributed applications and in some respects is, is more complex than in days gone by, you know, microservices. Um, some of which maybe you own some of which maybe you don't, and what you've gotta be able to do is, is see the big picture, you know, w w when, when there's something in my code, but then when am I making a call out to maybe a third party microservice and, and that microservices bailing out on me, like people have got to see the big picture. And I think what hasn't been available as people have changed the architecture and their applications, there hasn't been an equivalent set of innovation or evolution in the tools that they use to manage that environment. And so you, you, you, you've got this sort of dichotomy of, uh, a better way for software developers to write code and deploy it into production microservices. But at the same time, you don't have good information and good tools to make sense of that complexity. >>That's great stuff. Jeremy Burton is here. He's the CEO of observe Inc cube, alumni, VIP cube alumni, by the way, has been on the cube every year, since the Q has been around 2010, when he took the new job as the CMO of EMC prior to being bought by Dell, Jeremy, you're a legend in the industry, certainly on as an executive and a marketer. And as an entrepreneur, um, I gotta ask you observe Inc, your company now, um, you're right in the middle of all this, you, you got a big bet going on. Could you share, in your opinion, your words, what is the big bet that you're making with observing? Uh, what are you betting on? How do you see the preferred future unfolding and where are you guys going to capture that value? >>Yes, I I'll big bat. Hey, uh, really is to take a new approach, um, in, in, in, in terms of enabling people to observe their systems, that the term observability actually goes back, uh, to a guy in control systems theory in the sixties. And then it's got quite a simple definition, which is, you know, being able to determine the, uh, I've been able to diagnose a system by the telemetry data that it emits. So let's look at the external outputs. And then based on that, can I determine the internal state of the application? And so from the get-go, we felt like observability was not about building another tool, right? We're not, you know, it's not about building another monitoring tool, a logging tool. Um, it's about analyzing data. And I, I was struck many years ago. Uh, I spent a bit of time with, with Andy McAfee, uh, from the sea sail lab at MIT. >>And he made a statement that I thought at the time was quite profound, which he said, look, everything's a matter of data. If you have enough data, you can solve any problem. And that stuck with me for a long time. And, um, you know, observe really what we do is we ingest vast quantities of telemetry data. We treat everything as events and we try and make sense of it. And the economics of the infrastructure now is such, that is you truly can ingest all the Alltel telemetry data and it's affordable, right? I mean, one of the wonderful things that Amazon has done is they've brought you, you know, very cheap, affordable storage. You can ingest all your data and keep it forever. Um, but, but now can you make sense of it? Well, you know, compute is pretty cheap these days and you've got amazing processing engines like snowflake. >>And so I was sense was that if we could allow folks to ingest all of this telemetry data process, that data and help people easily analyze that data, then they could find almost any problem that existed, uh, in their applications or in their infrastructure. So we really set out to create a data company, which I think is fundamentally different to, to really what everybody else is doing. And today we're troubleshooting distributed applications, but I think in future, we, my hope is that we can, we can help people analyze almost anything around their applications or infrastructure. >>And what's the use case problem statement that you're entering the market on? Is it just making sure microservices can be deployed as a Kubernetes? Is it managing containers? Is there a specific, um, customer adoption use case that you're focused on right now? >>Yeah, we've tried to target our ideal customer if you like has been the three or 4,000, uh, uh, SAS companies. Uh, we're, we're really focused on the U S right now, but three to 5,000 SAS companies, um, predominantly, uh, obviously running on AWS often, uh, Kubernetes infrastructure, but, you know, people who, uh, having a hard time, uh, understanding the complexity of the application that they've created, and they're having a hard time understanding, uh, the experience that their customers are having and tracking that back to root cause. So, you know, really helping those SAS companies troubleshoot their applications and having a better customer experience that's where the early customers are. And if we can do a good job in that area, I think we can, you know, over time, you know, start to take on some of the bigger companies and maybe some of the more established companies that are moving in this, this digital direction. >>Jeremy, thanks for sharing that. And I got one last set of questions for you around the industry, but before I get there, give a quick plug for observe. What are you guys looking to do hire, I mean, give a quick, uh, a PSA on what's going on with observed. >>Yeah, so we're, uh, the company is now what a rough and tough. About three years old, we got about 40 people. Uh we're well-funded by sort of Hill ventures. Uh, they were the original investors in, in snowflake. Um, and, um, yeah, I mean, we we've, we've well, more than doubled in size since the COVID lockdown began. We had about 15 people when that began. We've got almost 40 now. Um, and I would anticipate in the next year we're, we're probably going to double in size again, but, um, yeah, really the core focus in the company is, is understanding and analyzing vast quantities of data. And so anybody who is interested in, uh, that space look us up >>Mainly any areas, obviously engineering and the other areas okay. >>Near in all over. I mean, we, you know, w w w as you'll see, if you go to observing.com, we've got a pretty slick front end. Uh, we invested very early on in design and UX design. So we believe that you are, can be a differentiator. So we've got some amazing engineers on the front end. Uh, so going to can always do with the help there, but obviously, um, you know, there's a data processing platform here as well. Um, we, uh, we do run on top of snowflake. We, we do have a number of folks here who are very familiar, uh, you know, with the snowflake database and, and how to write efficiency equals. So, so front and backend. Um, we very soon, I think we'll be starting to expand the sales team. Um, we're really starting to get our initial set of customers and the feedback loop rule in rolling into engineering. And my hope would be, you know, probably early part of next year, we re we really start to nail the product market fit. Um, and we've got a huge release coming in the early part of next year where that the metrics and alerting functionality will be in the product. So, yeah, it's, it's sort of all systems go right now. >>Congratulations. Love to see the entrepreneurial journey. We'll keep an eye out for you and you're in a hot space. So we'll be riding, you'll be riding that wave, uh, question for you on the, um, just kind of the industry, uh, you're in the heart of Silicon Valley. Like I am honestly, I'm fellow Alto, you're up in the Hillsborough area. Um, I think you're in Hillsborough, right? That's where you, where you live. Um, San Francisco, the Valley, the pandemic pretty hard hit right now. People are sheltering in place, but still a lot of activity. Um, what are you hearing in, um, in, in the VC circles, startup circles, as everyone looks at coming out of the pandemic and you look at Amazon and you look at what snowflake has done. I mean, snowflake was built on top of Amazon competing against Redshift. Um, okay. They were hugely successful at doing that. So there's kind of this new playbook emerging. What are, what are people talking about? What's the scuttlebutt. >>Yeah. I mean, clearly TAC has done very well throughout what has been, you know, like just a terrible environment. Um, I think both kind of socially and economically, and I think what's going on in the stock market right now is probably not reflective of the, of the economic situation. And I think a lot of the indices are dominated by tech companies. So you, if you're not careful, you can get a little bit of a false read. Um, but look, what is undisputed is, is that the world is going to become more digital, more tech centric than, than less. Um, so I think there is a very, very bright future, you know, for tech, um, that there is certainly plenty of VC money, um, available. Um, you know, that is not really changed materially in the last year. Um, so if you have a good idea, if you're on one of these major trends, I think that there is a very good chance that you can get the company funded. >>Um, and you know, our, our expectation is that, you know, next year, obviously industries are going to return to work that have been dominant maybe for the last six, nine months. And so some parts of the economy should pick up again, but I would also tell you, I think certain, uh, sort of habits are not going to die. I mean, I think more things are going to be done online and we've gotten used to that way of working and, and you know, what, not, some of it is measurable. I don't know about cocktails over zoom, but working with customers, um, in some respects is easier because they're not traveling, we're not traveling. So we both have more time. Uh, it's sometimes easy to get meetings with people that you would never get. Now. Now, can you do an efficient sales process, education proof of concept? You know, those processes maybe have to grow up a little bit to be taken online, but I think the certain parts of the last, maybe six to nine months that we don't want to throw away and go back to the way we were doing it, because I think, you know, maybe this way of doing it is, is more efficient. >>What do you think about the, uh, entrepreneurial journeys out there? Obviously, um, Amazon we're here covering re-invent is really kind of, you know, building a massive compute engine. They've got higher level services and, you know, I've been speculating for years. I think snowflake is the first kind of big sign. That points to kind of what I said five years ago, which is there's going to be an opportunity for these other clouds as specialty clouds. I called them might be the wrong word, but snowflake basically built on top of Amazon, you know, most valuable company ever on wall street, uh, IPO on someone else's cloud. So is that a playbook? I mean, is that a move? I mean, this is kind of like a new thing. >>Yeah. I mean, that's, I mean, I, I feel like on databases, I've got a lot of history on management, Oracle almost 10 years. And you know, what snowflake does they did was they, they rearchitected the database explicitly for the cloud. I mean, you can run Oracle on the cloud, but, but it, but it doesn't do things the way that snowflake does it. Right. I mean, snowflake uses commodity storage. It uses S3 it's elastic. And so when you're not using it, you're not paying it. And these things sound very simple and very obvious now, which is I think what, what, what the genius of the founders, you know, Ben Warren and Tre, uh, work, and, and I think there will be other costs, you know, categories of infrastructure that will get rearchitected and reinvented for the cloud. And, you know, I've got equally big opportunities. Um, and so, yeah, I mean, I think the model, I believe firmly that the model is if you're a startup, you don't need to waste a lot of time, like reinventing the wheel on data center, infrastructure and databases, and a lot of the services that you would use to construct an application. >>You, you, you can start, you know, if, if the building that you're trying to build is like 12 floors, you can start at the eighth or ninth floor. Um, you know, I've, I've got like what three or 400 quality engineers at snowflake that are building our database. I don't, I don't need to do that. I can just piggyback on top of what they've done and add value. And, you know, the, the, the beautiful thing, you know, now, if you're a business out there thinking of, of, of, of becoming digital and reinventing yourself, or you're a startup just getting going, there's a lot of stuff you just don't have to build anymore. You just don't even have to think about it. >>Yeah. This is the new program of bull internet. It's internet, truly 2.0 or 3.0, whatever 4.0, a complete reset of online. And I think the pandemic, as you pointed out on many cube interviews and Andy Jassy send his keynote is on full display right now. And I think the smart money and smart entrepreneurs are going to see the opportunities. Okay. >>Yeah. It comes back to ideas and a great, I mean, I've always been a product person. Um, but look at great idea, a great product idea and a great product idea that, that capitalizes on the big trends in the industry. I think there's always going to be funding for those kinds of things. I don't know a lot about the consumer world I've always worked in, in B2B, but, um, you know, the kind of things that you're going to be able to do in future. I mean, think about it. If storage is essentially free and compute is essentially free. Just imagine what you could do, right. Jeremy, >>This is the new consumer. Get out. Let's understand that. Finally, B2B is the new consumer enterprise is hot. I was, again, it was riffing on this all week. All the things going on in enterprise is complex is now the new consumers now all connected. It's all one thing. The consumerization of it, the condition of computing has happened. It's going on. So you're a leader. Thank you for coming on. Great to see you as always, um, say hi to your family and stay safe. >>Yeah, you too. Thanks for the invite. Always, always a pleasure. >>Jeremy Burton breaking down the analysis of day two of week three of re-invent coverage. I'm John furry with the cube virtual. We're not in person anymore. Virtualization has allowed us to do more interviews over 110 interviews so far for re-invent and tomorrow, Thursday at two o'clock, Andy Jassy will spend 30 minutes with me here on the cube, looking back at re-invent the highs, the lows, and what's next for Amazon web services. I'm chef Aria. Thanks for watching.

Published Date : Dec 18 2020

SUMMARY :

It's the cube with digital coverage of Jeremy, thank you for coming on. Always great to be on the cube. And of course you finally had, um, David Richardson, who is the VP of serverless. And, you know, 2020 is not the same as what you needed in 2015 or 2010. And again, you know, Amazon and Dawson I mean, what, I mean, you know, some would say, you know, Hey, it's a feature, not a company. it. And so, you know, th th the job think about that for a second. And so the there's never been like a more important time for people to invest in observing the You know, what if a help desk ticket get, how do you track that? It kind of reminds me of the old adage of, um, you know, you know, you gotta run it running the operation, I mean, first of all, I love the builder mentality and with Amazon. I think what you sort of poking out is, is sort of the maturation on the day of you roll that, something to production before you start investigate. you know, that was during the mainframe client server transition. Multiple clouds is so, you know, maybe the private cloud waves coming Um, and you know, now you can get almost all of your infrastructure from the cloud. And so, you know, what, what I care about and what I think a lot of CEOs care about is that have come on the cube, as you hear it with Amazon, a lot of undifferentiated, heavy lifting, is see the big picture, you know, w w when, when there's something in my code, And as an entrepreneur, um, I gotta ask you observe Inc, which is, you know, being able to determine the, uh, I've been able to diagnose a system And the economics of the infrastructure now is such, that is you truly can ingest all the Alltel And so I was sense was that if we could allow folks to ingest all of this telemetry data job in that area, I think we can, you know, over time, you know, start to take on some of the bigger companies And I got one last set of questions for you around the industry, And so anybody who is interested in, I mean, we, you know, w w w as you'll see, if you go to observing.com, Um, what are you hearing in, um, in, in the VC circles, Um, you know, that is not really Um, and you know, our, our expectation is that, you know, They've got higher level services and, you know, I've been speculating for years. And you know, what snowflake does they did was they, Um, you know, I've, I've got like what And I think the smart money and smart entrepreneurs are going to see the opportunities. but, um, you know, the kind of things that you're going to be able to do in future. Great to see you as always, um, say hi to your family and stay safe. Yeah, you too. Jeremy Burton breaking down the analysis of day two of week three of re-invent coverage.

ENTITIES

Entity	Category	Confidence
Jeremy Burton	PERSON	0.99+
Andy McAfee	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
John Farrow	PERSON	0.99+
2015	DATE	0.99+
AWS	ORGANIZATION	0.99+
Steve Malania	PERSON	0.99+
Jeremy	PERSON	0.99+
David Richardson	PERSON	0.99+
70%	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
America	LOCATION	0.99+
2010	DATE	0.99+
Hillsborough	LOCATION	0.99+
12 floors	QUANTITY	0.99+
three	QUANTITY	0.99+
30 minutes	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
30	QUANTITY	0.99+
six	QUANTITY	0.99+
Rudy Valdez	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Dec 16th	DATE	0.99+
eighth	QUANTITY	0.99+
10	QUANTITY	0.99+
Aria	PERSON	0.99+
Aviatrix	ORGANIZATION	0.99+
4,000	QUANTITY	0.99+
observe Inc	ORGANIZATION	0.99+
next year	DATE	0.99+
tomorrow	DATE	0.99+
2020	DATE	0.99+
10 years	QUANTITY	0.99+
first	QUANTITY	0.99+
last year	DATE	0.99+
John	PERSON	0.99+
Intel	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Verner	PERSON	0.99+
20 years	QUANTITY	0.99+
yesterday	DATE	0.99+
ninth floor	QUANTITY	0.99+
nine months	QUANTITY	0.99+
Alltel	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
two days ago	DATE	0.98+
five years ago	DATE	0.98+
today	DATE	0.98+
20 years ago	DATE	0.98+
both	QUANTITY	0.98+
Ben Warren	PERSON	0.98+
pandemic	EVENT	0.98+
telco	ORGANIZATION	0.97+
one	QUANTITY	0.97+

Eileen Vidrine, US Air Force | MIT CDOIQ 2020

>> Announcer: From around the globe, it's theCube with digital coverage of MIT, Chief Data Officer and Information Quality Symposium brought to you by Silicon Angle Media. >> Hi, I'm Stu Miniman and this is the seventh year of theCubes coverage of the MIT, Chief Data Officer and Information Quality Symposium. We love getting to talk to these chief data officers and the people in this ecosystem, the importance of data, driving data-driven cultures, and really happy to welcome to the program, first time guests Eileen Vitrine, Eileen is the Chief Data Officer for the United States Air Force, Eileen, thank you so much for joining us. >> Thank you Stu really excited about being here today. >> All right, so the United States Air Force, I believe had it first CDO office in 2017, you were put in the CDO role in June of 2018. If you could, bring us back, give us how that was formed inside the Air force and how you came to be in that role. >> Well, Stu I like to say that we are a startup organization and a really mature organization, so it's really about culture change and it began by bringing a group of amazing citizen airman reservists back to the Air Force to bring their skills from industry and bring them into the Air Force. So, I like to say that we're a total force because we have active and reservists working with civilians on a daily basis and one of the first things we did in June was we stood up a data lab, that's based in the Jones building on Andrews Air Force Base. And there, we actually take small use cases that have enterprise focus, and we really try to dig deep to try to drive data insights, to inform senior leaders across the department on really important, what I would call enterprise focused challenges, it's pretty exciting. >> Yeah, it's been fascinating when we've dug into this ecosystem, of course while the data itself is very sensitive and I'm sure for the Air Force, there are some very highest level of security, the practices that are done as to how to leverage data, the line between public and private blurs, because you have people that have come from industry that go into government and people that are from government that have leveraged their experiences there. So, if you could give us a little bit of your background and what it is that your charter has been and what you're looking to build out, as you mentioned that culture of change. >> Well, I like to say I began my data leadership journey as an active duty soldier in the army, and I was originally a transportation officer, today we would use the title condition based maintenance, but back then, it was really about running the numbers so that I could optimize my truck fleet on the road each and every day, so that my soldiers were driving safely. Data has always been part of my leadership journey and so I like to say that one of our challenges is really to make sure that data is part of every airmans core DNA, so that they're using the right data at the right level to drive insights, whether it's tactical, operational or strategic. And so it's really about empowering each and every airman, which I think is pretty exciting. >> There's so many pieces of that data, you talk about data quality, there's obviously the data life cycle. I know your presentation that you're given here at the CDO, IQ talks about the data platform that your team has built, could you explain that? What are the key tenants and what maybe differentiates it from what other organizations might have done? >> So, when we first took the challenge to build our data lab, we really wanted to really come up. Our goal was to have a cross domain solution where we could solve data problems at the appropriate classification level. And so we built the VAULT data platform, VAULT stands for visible, accessible, understandable, linked, and trustworthy. And if you look at the DOD data strategy, they will also add the tenants of interoperability and secure. So, the first steps that we have really focused on is making data visible and accessible to airmen, to empower them, to drive insights from available data to solve their problems. So, it's really about that data empowerment, we like to use the hashtag built by airmen because it's really about each and every airman being part of the solution. And I think it's really an exciting time to be in the Air Force because any airman can solve a really hard challenge and it can very quickly wrap it up rapidly, escalate up with great velocity to senior leadership, to be an enterprise solution. >> Is there some basic training that goes on from a data standpoint? For any of those that have lived in data, oftentimes you can get lost in numbers, you have to have context, you need to understand how do I separate good from bad data, or when is data still valid? So, how does someone in the Air Force get some of that beta data competency? >> Well, we have taken a multitenant approach because each and every airman has different needs. So, we have quite a few pathfinders across the Air Force today, to help what I call, upscale our total force. And so I developed a partnership with the Air Force Institute of Technology and they now have a online graduate level data science certificate program. So, individuals studying at AFIT or remotely have the opportunity to really focus on building up their data touchpoints. Just recently, we have been working on a pathfinder to allow our data officers to get their ICCP Federal Data Sector Governance Certificate Program. So, we've been running what I would call short boot camps to prep data officers to be ready for that. And I think the one that I'm most excited about is that this year, this fall, new cadets at the U.S Air Force Academy will be able to have an undergraduate degree in data science and so it's not about a one prong approach, it's about having short courses as well as academe solutions to up skill our total force moving forward. >> Well, information absolutely is such an important differentiator(laughs) in general business and absolutely the military aspects are there. You mentioned the DOD talks about interoperability in their platform, can you speak a little bit to how you make sure that data is secure? Yet, I'm sure there's opportunities for other organizations, for there to be collaboration between them. >> Well, I like to say, that we don't fight alone. So, I work on a daily basis with my peers, Tom Cecila at the Department of Navy and Greg Garcia at the Department of Army, as well as Mr. David Berg in the DOD level. It's really important that we have an integrated approach moving forward and in the DOD we partner with our security experts, so it's not about us doing security individually, it's really about, in the Air Force we use a term called digital air force, and it's about optimizing and building a trusted partnership with our CIO colleagues, as well as our chief management colleagues because it's really about that trusted partnership to make sure that we're working collaboratively across the enterprise and whatever we do in the department, we also have to reach across our services so that we're all working together. >> Eileen, I'm curious if there's been much impact from the global pandemic. When I talk to enterprise companies, that they had to rapidly make sure that while they needed to protect data, when it was in their four walls and maybe for VPN, now everyone is accessing data, much more work from home and the like. I have to imagine some of those security measures you've already taken, but have there anything along those lines or anything else that this shift in where people are, and a little bit more dispersed has impacted your work? >> Well, the story that I like to say is, that this has given us velocity. So, prior to COVID, we built our VAULT data platform as a multitenancy platform that is also cross-domain solution, so it allows people to develop and do their problem solving in an appropriate classification level. And it allows us to connect or pushup if we need to into higher classification levels. The other thing that it has helped us really work smart because we do as much as we can in that unclassified environment and then using our cloud based solution in our gateways, it allows us to bring people in at a very scheduled component so that we maximize, or we optimize their time on site. And so I really think that it's really given us great velocity because it has really allowed people to work on the right problem set, on the right class of patient level at a specific time. And plus the other pieces, we look at what we're doing is that the problem set that we've had has really allowed people to become more data focused. I think that it's personal for folks moving forward, so it has increased understanding in terms of the need for data insights, as we move forward to drive decision making. It's not that data makes the decision, but it's using the insight to make the decision. >> And one of the interesting conversations we've been having about how to get to those data insights is the use of things like machine learning, artificial intelligence, anything you can share about, how you're looking at that journey, where you are along that discovery. >> Well, I love to say that in order to do AI and machine learning, you have to have great volumes of high quality data. And so really step one was visible, accessible data, but we in the Department of the Air Force stood up an accelerator at MIT. And so we have a group of amazing airmen that are actually working with MIT on a daily basis to solve some of those, what I would call opportunities for us to move forward. My office collaborates with them on a consistent basis, because they're doing additional use cases in that academic environment, which I'm pretty excited about because I think it gives us access to some of the smartest minds. >> All right, Eileen also I understand it's your first year doing the event. Unfortunately, we don't get, all come together in Cambridge, walking those hallways and being able to listen to some of those conversations and follow up is something we've very much enjoyed over the years. What excites you about being interact with your peers and participating in the event this year? >> Well, I really think it's about helping each other leverage the amazing lessons learned. I think that if we look collaboratively, both across industry and in the federal sector, there have been amazing lessons learned and it gives us a great forum for us to really share and leverage those lessons learned as we move forward so that we're not hitting the reboot button, but we actually are starting faster. So, it comes back to the velocity component, it all helps us go faster and at a higher quality level and I think that's really exciting. >> So, final question I have for you, we've talked for years about digital transformation, we've really said that having that data strategy and that culture of leveraging data is one of the most critical pieces of having gone through that transformation. For people that are maybe early on their journey, any advice that you'd give them, having worked through a couple of years of this and the experience you've had with your peers. >> I think that the first thing is that you have to really start with a blank slate and really look at the art of the possible. Don't think about what you've always done, think about where you want to go because there are many different paths to get there. And if you look at what the target goal is, it's really about making sure that you do that backward tracking to get to that goal. And the other piece that I tell my colleagues is celebrate the wins. My team of airmen, they are amazing, it's an honor to serve them and the reality is that they are doing great things and sometimes you want more. And it's really important to celebrate the victories because it's a very long journey and we keep moving the goalposts because we're always striving for excellence. >> Absolutely, it is always a journey that we're on, it's not about the destination. Eileen, thank you so much for sharing all that you've learned and glad you could participate. >> Thank you, STU, I appreciate being included today. Have a great day. >> Thanks and thank you for watching theCube. I'm Stu Miniman stay tuned for more from the MIT, CDO IQ event. (lively upbeat music)

Published Date : Sep 3 2020

SUMMARY :

brought to you by Silicon Angle Media. and the people in this ecosystem, Thank you Stu really All right, so the of the first things we did sure for the Air Force, at the right level to drive at the CDO, IQ talks to build our data lab, we have the opportunity to and absolutely the It's really important that we that they had to rapidly make Well, the story that I like to say is, And one of the interesting that in order to do AI and participating in the event this year? in the federal sector, is one of the most critical and really look at the art it's not about the destination. Have a great day. from the MIT, CDO IQ event.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Eileen	PERSON	0.99+
Claire	PERSON	0.99+
Tom Cecila	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Berg	PERSON	0.99+
2017	DATE	0.99+
Greg Garcia	PERSON	0.99+
June of 2018	DATE	0.99+
Jonathan Rosenberg	PERSON	0.99+
Michael Rose	PERSON	0.99+
June	DATE	0.99+
Eileen Vitrine	PERSON	0.99+
Blair	PERSON	0.99+
U.S Air Force Academy	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
Wednesday	DATE	0.99+
five minutes	QUANTITY	0.99+
Omni Channel	ORGANIZATION	0.99+
five billion	QUANTITY	0.99+
Air Force Institute of Technology	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Cambridge	LOCATION	0.99+
Thursday	DATE	0.99+
Orlando, Florida	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
United States Air Force	ORGANIZATION	0.99+
Eileen Vidrine	PERSON	0.99+
Ryan	PERSON	0.99+
Google	ORGANIZATION	0.99+
Blake	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Blair Pleasant	PERSON	0.99+
BC Strategies	ORGANIZATION	0.99+
Department of Navy	ORGANIZATION	0.99+
next year	DATE	0.99+
Stu	PERSON	0.99+
first	QUANTITY	0.99+
Confusion	ORGANIZATION	0.99+
today	DATE	0.99+
five o'Clock	DATE	0.99+
YouTube	ORGANIZATION	0.99+
seventh year	QUANTITY	0.99+
twenty years ago	DATE	0.99+
first year	QUANTITY	0.99+
twenty	QUANTITY	0.99+
decades ago	DATE	0.99+
last year	DATE	0.99+
Jones	LOCATION	0.98+
Andrews Air Force Base	LOCATION	0.98+
this year	DATE	0.98+
United States Air Force	ORGANIZATION	0.98+
last year	DATE	0.98+
five	QUANTITY	0.98+
five nine mugs	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first steps	QUANTITY	0.98+
this fall	DATE	0.98+
DOD	ORGANIZATION	0.98+
Enterprise Connect	ORGANIZATION	0.97+
Department of the Air Force	ORGANIZATION	0.97+
AFIT	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one	QUANTITY	0.97+
Department of Army	ORGANIZATION	0.97+
first time	QUANTITY	0.97+
each	QUANTITY	0.96+
CDO IQ	EVENT	0.96+
One	QUANTITY	0.95+
Twitter	ORGANIZATION	0.95+

Doug Laney, Caserta | MIT CDOIQ 2020

>> Announcer: From around the globe, it's theCUBE with digital coverage of MIT Chief Data Officer and Information Quality symposium brought to you by SiliconANGLE Media. >> Hi everybody. This is Dave Vellante and welcome back to theCUBE's coverage of the MIT CDOIQ 2020 event. Of course, it's gone virtual. We wish we were all together in Cambridge. They were going to move into a new building this year for years they've done this event at the Tang Center, moving into a new facility, but unfortunately going to have to wait at least a year, we'll see, But we've got a great guest. Nonetheless, Doug Laney is here. He's a Business Value Strategist, the bestselling author, an analyst, consultant then a long time CUBE friend. Doug, great to see you again. Thanks so much for coming on. >> Dave, great to be with you again as well. So can I ask you? You have been an advocate for obviously measuring the value of data, the CDO role. I don't take this the wrong way, but I feel like the last 150 days have done more to accelerate people's attention on the importance of data and the value of data than all the great work that you've done. What do you think? (laughing) >> It's always great when organizations, actually take advantage of some of these concepts of data value. You may be speaking specifically about the situation with United Airlines and American Airlines, where they have basically collateralized their customer loyalty data, their customer loyalty programs to the tunes of several billion dollars each. And one of the things that's very interesting about that is that the third party valuations of their customer loyalty data, resulted in numbers that were larger than the companies themselves. So basically the value of their data, which is as we've discussed previously off balance sheet is more valuable than the market cap of those companies themselves, which is just incredibly fascinating. >> Well, and of course, all you have to do is look to the Trillionaire's Club. And now of course, Apple pushing two trillion to really see the value that the market places on data. But the other thing is of course, COVID, everybody talks about the COVID acceleration. How have you seen it impact the awareness of the importance of data, whether it applies to business resiliency or even new monetization models? If you're not digital, you can't do business. And digital is all about data. >> I think the major challenge that most organizations are seeing from a data and analytics perspective due to COVID is that their traditional trend based forecast models are broken. If you're a company that's only forecasting based on your own historical data and not taking into consideration, or even identifying what are the leading indicators of your business, then COVID and the economic shutdown have entirely broken those models. So it's raised the awareness of companies to say, "Hey, how can we predict our business now? We can't do it based on our own historical data. We need to look externally at what are those external, maybe global indicators or other kinds of markets that proceed our own forecasts or our own activity." And so the conversion from trend based forecast models to what we call driver based forecast models, isn't easy for a lot of organizations to do. And one of the more difficult parts is identifying what are those external data factors from suppliers, from customers, from partners, from competitors, from complimentary products and services that are leading indicators of your business. And then recasting those models and executing on them. >> And that's a great point. If you think about COVID and how it's changed things, everything's changed, right? The ideal customer profile has changed, your value proposition to those customers has completely changed. You got to rethink that. And of course, it's very hard to predict even when this thing eventually comes back, some kind of hybrid mode, you used to be selling to people in an office environment. That's obviously changed. There's a lot that's permanent there. And data is potentially at least the forward indicator, the canary in the coal mine. >> Right. It also is the product and service. So not only can it help you and improve your forecasting models, but it can become a product or service that you're offering. Look at us right now, we would generally be face to face and person to person, but we're using video technology to transfer this content. And then one of the things that I... It took me awhile to realize, but a couple of months after the COVID shutdown, it occurred to me that even as a consulting organization, Caserta focuses on North America. But the reality is that every consultancy is now a global consultancy because we're all doing business remotely. There are no particular or real strong localization issues for doing consulting today. >> So we talked a lot over the years about the role of the CDO, how it's evolved, how it's changed the course of the early... The pre-title days it was coming out of a data quality world. And it's still vital. Of course, as we heard today from the Keynote, it's much more public, much more exposed, different public data sources, but the role has certainly evolved initially into regulated industries like financial, healthcare and government, but now, many, many more organizations have a CDO. My understanding is that you're giving a talk in the business case for the CDO. Help us understand that. >> Yeah. So one of the things that we've been doing here for the last couple of years is a running an ongoing study of how organizations are impacted by the role of the CDO. And really it's more of a correlation and looking at what are some of the qualities of organizations that have a CDO or don't have a CDO. So some of the things we found is that organizations with a CDO nearly twice as often, mention the importance of data and analytics in their annual report organizations with a C level CDO, meaning a true executive are four times more often likely to be using data, to transform the business. And when we're talking about using data and advanced analytics, we found that organizations with a CIO, not a CDO responsible for their data assets are only half as likely to be doing advanced analytics in any way. So there are a number of interesting things that we found about companies that have a CDO and how they operate a bit differently. >> I want to ask you about that. You mentioned the CIO and we're increasingly seeing lines of reporting and peer reporting alter shift. The sands are shifting a little bit. In the early days the CDO and still predominantly I think is an independent organization. We've seen a few cases and increasingly number where they're reporting into the CIO, we've seen the same thing by the way with the chief Information Security Officer, which used to be considered the fox watching the hen house. So we're seeing those shifts. We've also seen the CDO become more aligned with a technical role and sometimes even emerging out of that technical role. >> Yeah. I think the... I don't know, what I've seen more is that the CDOs are emerging from the business, companies are realizing that data is a business asset. It's not an IT asset. There was a time when data was tightly coupled with applications of technologies, but today data is very easily decoupled from those applications and usable in a wider variety of contexts. And for that reason, as data gets recognized as a business, not an IT asset, you want somebody from the business responsible for overseeing that asset. Yes, a lot of CDOs still report to the CIO, but increasingly more CDOs you're seeing and I think you'll see some other surveys from other organizations this week where the CDOs are more frequently reporting up to the CEO level, meaning they're true executives. Along I advocated for the bifurcation of the IT organization into separate I and T organizations. Again, there's no reason other than for historical purposes to keep the data and technology sides of the organizations so intertwined. >> Well, it makes sense that the Chief Data Officer would have an affinity with the lines of business. And you're seeing a lot of organizations, really trying to streamline their data pipeline, their data life cycles, bringing that together, infuse intelligence into that, but also take a systems view and really have the business be intimately involved, if not even owned into the data. You see a lot of emphasis on self-serve, what are you seeing in terms of that data pipeline or the data life cycle, if you will, that used to be wonky, hard core techies, but now it really involving a lot more constituent. >> Yeah. Well, the data life cycle used to be somewhat short. The data life cycles, they're longer and they're more a data networks than a life cycle and or a supply chain. And the reason is that companies are finding alternative uses for their data, not just using it for a single operational purpose or perhaps reporting purpose, but finding that there are new value streams that can be generated from data. There are value streams that can be generated internally. There are a variety of value streams that can be generated externally. So we work with companies to identify what are those variety of value streams? And then test their feasibility, are they ethically feasible? Are they legally feasible? Are they economically feasible? Can they scale? Do you have the technology capabilities? And so we'll run through a process of assessing the ideas that are generated. But the bottom line is that companies are realizing that data is an asset. It needs to be not just measured as one and managed as one, but also monetized as an asset. And as we've talked about previously, data has these unique qualities that it can be used over and over again, and it generate more data when you use it. And it can be used simultaneously for multiple purposes. So companies like, you mentioned, Apple and others have built business models, based on these unique qualities of data. But I think it's really incumbent upon any organization today to do so as well. >> But when you observed those companies that we talk about all the time, data is at the center of their organization. They maybe put people around that data. That's got to be one of the challenge for many of the incumbents is if we talked about the data silos, the different standards, different data quality, that's got to be fairly major blocker for people becoming a "Data-driven organization." >> It is because some organizations were developed as people driven product, driven brand driven, or other things to try to convert. To becoming data-driven, takes a high degree of data literacy or fluency. And I think there'll be a lot of talk about that this week. I'll certainly mention it as well. And so getting the organization to become data fluent and appreciate data as an asset and understand its possibilities and the art of the possible with data, it's a long road. So the culture change that goes along with it is really difficult. And so we're working with 150 year old consumer brand right now that wants to become more data-driven and they're very product driven. And we hear the CIO say, "We want people to understand that we're a data company that just happens to produce this product. We're not a product company that generates data." And once we realized that and started behaving in that fashion, then we'll be able to really win and thrive in our marketplace. >> So one of the key roles of a Chief Data Officers to understand how data affects the monetization of an organization. Obviously there are four profit companies of your healthcare organization saving lives, obviously being profitable as well, or at least staying within the budget, depending upon the structure of the organization. But a lot of people I think oftentimes misunderstand that it's like, "Okay, do I have to become a data broker? Am I selling data directly?" But I think, you pointed out many times and you just did that unlike oil, that's why we don't like that data as a new oil analogy, because it's so much more valuable and can be use, it doesn't fall because of its scarcity. But what are you finding just in terms of people's application of that notion of monetization? Cutting costs, increasing revenue, what are you seeing in the field? What's that spectrum look like? >> So one of the things I've done over the years is compile a library of hundreds and hundreds of examples of how organizations are using data and analytics in innovative ways. And I have a book in process that hopefully will be out this fall. I'm sharing a number of those inspirational examples. So that's the thing that organizations need to understand is that there are a variety of great examples out there, and they shouldn't just necessarily look to their own industry. There are inspirational examples from other industries as well, many clients come to me and they ask, "What are others in my industry doing?" And my flippant response to that is, "Why do you want to be in second place or third place? Why not take an idea from another industry, perhaps a digital product company and apply that to your own business." But like you mentioned, there are a variety of ways to monetize data. It doesn't involve necessarily selling it. You can deliver analytics, you can report on it, you can use it internally to generate improved business process performance. And as long as you're measuring how data's being applied and what its impact is, then you're in a position to claim that you're monetizing it. But if you're not measuring the impact of data on business processes or on customer relationships or partner supplier relationships or anything else, then it's difficult to claim that you're monetizing it. But one of the more interesting ways that we've been working with organizations to monetize their data, certainly in light of GDPR and the California consumer privacy act where I can't sell you my data anymore, but we've identified ways to monetize your customer data in a couple of ways. One is to synthesize the data, create synthetic data sets that retain the original statistical anomalies in the data or features of the data, but don't share actually any PII. But another interesting way that we've been working with organizations to monetize their data is what I call, Inverted data monetization, where again, I can't share my customer data with you, but I can share information about your products and services with my customers. And take a referral fee or a commission, based on that. So let's say I'm a hospital and I can't sell you my patient data, of course, due to variety of regulations, but I know who my diabetes patients are, and I can introduce them to your healthy meal plans, to your gym memberships, to your at home glucose monitoring kits. And again, take a referral fee or a cut of that action. So we're working with customers and the financial services firm industry and in the healthcare industry on just those kinds of examples. So we've identified hundreds of millions of dollars of incremental value for organizations that from their data that we're just sitting on. >> Interesting. Doug because you're a business value strategist at the top, where in the S curve do you see you're able to have the biggest impact. I doubt that you enter organizations where you say, "Oh, they've got it all figured out. They can't use my advice." But as well, sometimes in the early stages, you may not be able to have as big of an impact because there's not top down support or whatever, there's too much technical data, et cetera, where are you finding you can have the biggest impact, Doug? >> Generally we don't come in and run those kinds of data monetization or information innovation exercises, unless there's some degree of executive support. I've never done that at a lower level, but certainly there are lower level more immediate and vocational opportunities for data to deliver value through, to simply analytics. One of the simple examples I give is, I sold a home recently and when you put your house on the market, everybody comes out of the woodwork, the fly by night, mortgage companies, the moving companies, the box companies, the painters, the landscapers, all know you're moving because your data is in the U.S. and the MLS directory. And it was interesting. The only company that didn't reach out to me was my own bank, and so they lost the opportunity to introduce me to a Mortgage they'd retain me as a client, introduce me to my new branch, print me new checks, move the stuff in my safe deposit box, all of that. They missed a simple opportunity. And I'm thinking, this doesn't require rocket science to figure out which of your customers are moving, the MLS database or you can harvest it from Zillow or other sites is basically public domain data. And I was just thinking, how stupid simple would it have been for them to hire a high school programmer, give him a can of red bull and say, "Listen match our customer database to the MLS database to let us know who's moving on a daily or weekly basis." Some of these solutions are pretty simple. >> So is that part of what you do, come in with just hardcore tactical ideas like that? Are you also doing strategy? Tell me more about how you're spending your time. >> I trying to think more of a broader approach where we look at the data itself and again, people have said, "If you tortured enough, what would you tell us? We're just take that angle." We look at examples of how other organizations have monetized data and think about how to apply those and adapt those ideas to the company's own business. We look at key business drivers, internally and externally. We look at edge cases for their customers' businesses. We run through hypothesis generating activities. There are a variety of different kinds of activities that we do to generate ideas. And most of the time when we run these workshops, which last a week or two, we'll end up generating anywhere from 35 to 50 pretty solid ideas for generating new value streams from data. So when we talk about monetizing data, that's what we mean, generating new value streams. But like I said, then the next step is to go through that feasibility assessment and determining which of these ideas you actually want to pursue. >> So you're of course the longtime industry watcher as well, as a former Gartner Analyst, you have to be. My question is, if I think back... I've been around a while. If I think back at the peak of Microsoft's prominence in the PC era, it was like windows 95 and you felt like, "Wow, Microsoft is just so strong." And then of course the Linux comes along and a lot of open source changes and low and behold, a whole new set of leaders emerges. And you see the same thing today with the Trillionaire's Club and you feel like, "Wow, even COVID has been a tailwind for them." But you think about, "Okay, where could the disruption come to these large players that own huge clouds, they have all the data." Is data potentially a disruptor for what appear to be insurmountable odds against the newbies" >> There's always people coming up with new ways to leverage data or new sources of data to capture. So yeah, there's certainly not going to be around for forever, but it's been really fascinating to see the transformation of some companies I think nobody really exemplifies it more than IBM where they emerged from originally selling meat slicers. The Dayton Meat Slicer was their original product. And then they evolved into Manual Business Machines and then Electronic Business Machines. And then they dominated that. Then they dominated the mainframe software industry. Then they dominated the PC industry. Then they dominated the services industry to some degree. And so they're starting to get into data. And I think following that trajectory is something that really any organization should be looking at. When do you actually become a data company? Not just a product company or a service company or top. >> We have Inderpal Bhandari is one of our huge guests here. He's a Chief-- >> Sure. >> Data Officer of IBM, you know him well. And he talks about the journey that he's undertaken to transform the company into a data company. I think a lot of people don't really realize what's actually going on behind the scenes, whether it's financially oriented or revenue opportunities. But one of the things he stressed to me in our interview was that they're on average, they're reducing the end to end cycle time from raw data to insights by 70%, that's on average. And that's just an enormous, for a company that size, it's just enormous cost savings or revenue generating opportunity. >> There's no doubt that the technology behind data pipelines is improving and the process from moving data from those pipelines directly into predictive or diagnostic or prescriptive output is a lot more accelerated than the early days of data warehousing. >> Is the skills barrier is acute? It seems like it's lessened somewhat, the early Hadoop days you needed... Even data scientist... Is it still just a massive skill shortage, or we're starting to attack that. >> Well, I think companies are figuring out a way around the skill shortage by doing things like self service analytics and focusing on more easy to use mainstream type AI or advanced analytics technologies. But there's still very much a need for data scientists and organizations and the difficulty in finding people that are true data scientists. There's no real certification. And so really anybody can call themselves a data scientist but I think companies are getting good at interviewing and determining whether somebody's got the goods or not. But there are other types of skills that we don't really focus on, like the data engineering skills, there's still a huge need for data engineering. Data doesn't self-organize. There are some augmented analytics technologies that will automatically generate analytic output, but there really aren't technologies that automatically self-organize data. And so there's a huge need for data engineers. And then as we talked about, there's a large interest in external data and harvesting that and then ingesting it and even identifying what external data is out there. So one of the emerging roles that we're seeing, if not the sexiest role of the 21st century is the role of the Data Curator, somebody who acts as a librarian, identifying external data assets that are potentially valuable, testing them, evaluating them, negotiating and then figuring out how to ingest that data. So I think that's a really important role for an organization to have. Most companies have an entire department that procures office supplies, but they don't have anybody who's procuring data supplies. And when you think about which is more valuable to an organization? How do you not have somebody who's dedicated to identifying the world of external data assets that are out there? There are 10 million data sets published by government, organizations and NGOs. There are thousands and thousands of data brokers aggregating and sharing data. There's a web content that can be harvested, there's data from your partners and suppliers, there's data from social media. So to not have somebody who's on top of all that it demonstrates gross negligence by the organization. >> That is such an enlightening point, Doug. My last question is, I wonder how... If you can share with us how the pandemic has effected your business personally. As a consultant, you're on the road a lot, obviously not on the road so much, you're doing a lot of chalk talks, et cetera. How have you managed through this and how have you been able to maintain your efficacy with your clients? >> Most of our clients, given that they're in the digital world a bit already, made the switch pretty quick. Some of them took a month or two, some things went on hold but we're still seeing the same level of enthusiasm for data and doing things with data. In fact some companies have taken our (mumbles) that data to be their best defense in a crisis like this. It's affected our business and it's enabled us to do much more international work more easily than we used to. And I probably spend a lot less time on planes. So it gives me more time for writing and speaking and actually doing consulting. So that's been nice as well. >> Yeah, there's that bonus. Obviously theCUBE yes, we're not doing physical events anymore, but hey, we've got two studios operating. And Doug Laney, really appreciate you coming on. (Dough mumbles) Always a great guest and sharing your insights and have a great MIT CDOIQ. >> Thanks, you too, Dave, take care. (mumbles) >> Thanks Doug. All right. And thank you everybody for watching. This is Dave Vellante for theCUBE, our continuous coverage of the MIT Chief Data Officer conference, MIT CDOIQ, will be right back, right after this short break. (bright music)

Published Date : Sep 3 2020

SUMMARY :

symposium brought to you Doug, great to see you again. and the value of data And one of the things of the importance of data, And one of the more difficult the canary in the coal mine. But the reality is that every consultancy a talk in the business case for the CDO. So some of the things we found is that In the early days the CDO is that the CDOs are that data pipeline or the data life cycle, of assessing the ideas that are generated. for many of the incumbents and the art of the possible with data, of the organization. and apply that to your own business." I doubt that you enter organizations and the MLS directory. So is that part of what you do, And most of the time when of Microsoft's prominence in the PC era, the services industry to some degree. is one of our huge guests here. But one of the things he stressed to me is improving and the process the early Hadoop days you needed... and the difficulty in finding people and how have you been able to maintain our (mumbles) that data to be and sharing your insights Thanks, you too, Dave, take care. of the MIT Chief Data Officer conference,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Doug Laney	PERSON	0.99+
United Airlines	ORGANIZATION	0.99+
American Airlines	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Doug	PERSON	0.99+
thousands	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Cambridge	LOCATION	0.99+
21st century	DATE	0.99+
10 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
70%	QUANTITY	0.99+
Inderpal Bhandari	PERSON	0.99+
two trillion	QUANTITY	0.99+
windows 95	TITLE	0.99+
North America	LOCATION	0.99+
one	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
a month	QUANTITY	0.99+
35	QUANTITY	0.99+
two	QUANTITY	0.99+
third place	QUANTITY	0.99+
One	QUANTITY	0.99+
MLS	ORGANIZATION	0.98+
two studios	QUANTITY	0.98+
MIT CDOIQ 2020	EVENT	0.98+
Trillionaire's Club	ORGANIZATION	0.98+
today	DATE	0.98+
this week	DATE	0.98+
Tang Center	LOCATION	0.98+
California consumer privacy act	TITLE	0.97+
second place	QUANTITY	0.97+
Linux	TITLE	0.97+
COVID	EVENT	0.97+
Gartner	ORGANIZATION	0.97+
Zillow	ORGANIZATION	0.97+
50	QUANTITY	0.97+
GDPR	TITLE	0.97+
CUBE	ORGANIZATION	0.97+
this year	DATE	0.97+
MIT Chief Data Officer	EVENT	0.96+
theCUBE	ORGANIZATION	0.95+
a week	QUANTITY	0.94+
single	QUANTITY	0.94+
Caserta	ORGANIZATION	0.93+
four times	QUANTITY	0.92+
COVID	OTHER	0.92+
pandemic	EVENT	0.92+
2020	DATE	0.91+
hundreds of millions of dollars	QUANTITY	0.86+
150 year old	QUANTITY	0.86+
this fall	DATE	0.85+
MIT CDOIQ	EVENT	0.85+
last couple of years	DATE	0.84+
four profit companies	QUANTITY	0.84+
COVID	ORGANIZATION	0.82+
Dough	PERSON	0.78+
Keynote	EVENT	0.77+

Inderpal Bhandari, IBM | MIT CDOIQ 2020

>>from around the globe If the cube with digital coverage of M I t. Chief data officer and Information quality symposium brought to you by Silicon Angle Media >>Hello, everyone. This is Day Volonte and welcome back to our continuing coverage of the M I t. Chief Data Officer CDO I Q event Interpol Bhandari is here. He's a leading voice in the CDO community and a longtime Cubillan Interpol. Great to see you. Thanks for coming on for this. Especially >>program. My pleasure. >>So when you you and I first met, you laid out what I thought was, you know, one of the most cogent frameworks to understand what a CDO is job was where the priority should be. And one of those was really understanding how, how, how data contributes to the monetization of station aligning with lines of business, a number of other things. And that was several years ago. A lot of change since then. You know, we've been doing this conference since probably twenty thirteen and back then, you know, Hadoop was coming on strong. A lot of CEOs didn't want to go near the technology that's beginning to change. CDOs and cto Zehr becoming much more aligned at the hip. The reporting organizations have changed. But I love your perspective on what you've observed as changing in the CDO roll over the last half decade or so. >>Well, did you know that I became chief data officer in two thousand six? December two thousand and six And I have done this job four times four major overnight have created of the organization from scratch each time. Now, in December of two thousand six, when I became chief data officer, there were only four. Chief Data Officer, uh, boom and I was the first in health care, and there were three, three others, you know, one of the Internet one and credit guns one and banking. And I think I'm the only one actually left standing still doing this job. That's a good thing or a bad thing. But like, you know, it certainly has allowed me to love the craft and then also scripted down to the level that, you know, I actually do think of it purely as a craft. That is. I know, going into a mutual what I'm gonna do. They were on the central second. No, the interesting things that have unfolded. Obviously, the professions taken off There are literally thousands off chief data officers now, and there are plenty off changes. I think the main change, but the job is it's, I think, a little less daunting in terms off convincing the senior leadership that it's need it because I think the awareness at the CEO level is much, much, much better than what it waas in two thousand six. Across the world. Now, having said that, I think it is still only awareness and don't think that there's really a deep understanding of those levels. And so there's a lot off infusion, which is why you will. You kind of think this is my period. But you saw all these professions take off with C titles, right? Chief Data officer, chief analytics officer, chief digital officer and chief technology officer. See, I off course is being there for a long time. And but I think these newer see positions. They're all very, very related, and they all kind of went to the same need which had to do with enterprise transformation, digital transformation, that enterprises chief digital officer, that's another and and people were all trying to essentially feel the elephants and they could only see part of it at the senior levels, and they came up with which have a role you know, seemed most meaningful to them. But really, all of us are trying to do the same job, which is to accelerate digital transformation in the enterprise. Your comment about you kind of see that the seat eels and sea deals now, uh, partnering up much more than in the past, and I think that's in available the major driving force full. That is, in my view, anyway. It's is artificial intelligence as people try to infuse artificial intelligence. Well, then it's very technical field. Still, it's not something that you know you can just hand over to somebody who has the business jobs, but not the deep technical chops to pull that off. And so, in the case off chief data officers that do have the technical jobs, you'll see them also pretty much heading up the I effort in total and you know, as I do for the IBM case, will be building the Data and AI Enablement internal platform for for IBM. But I think in other cases you you've got Chief date officers who are coming in from a different angle. You know, they built Marghera but the CTO now, because they have to. Otherwise you cannot get a I infused into the organization. >>So there were a lot of other priorities, obviously certainly digital transformation. We've been talking about it for years, but still in many organisations, there was a sense of, well, not on my watch, maybe a sense of complacency or maybe just other priorities. Cove. It obviously has changed that now one hundred percent of the companies that we talked to are really putting this digital transformation on the front burner. So how has that changed the role of CDO? Has it just been interpolate an acceleration of that reality, or has it also somewhat altered the swim lanes? >>I think I think it's It's It's Bolt actually, so I have a way of looking at this in my mind, the CDO role. But if you look at it from a business perspective, they're looking for three things. The CEO is looking for three things from the CDO. One is you know this person is going to help with the revenue off the company by enabling the production of new products, new products of resulting in new revenue and so forth. That's kind of one aspect of the monetization. Another aspect is the CEO is going to help with the efficiency within the organization by making data a lot more accessible, as well as enabling insights that reduce into and cycle time for major processes. And so that's another way that they have monitor. And the last one is a risk reduction that they're going to reduce the risk, you know, as regulations. And as you have cybersecurity exposure on incidents that you know just keep keep accelerating as well. You're gonna have to also step in and help with that. So every CDO, the way their senior leadership looks at them is some mix off three. And in some cases, one has given more importance than the other, and so far, but that's how they are essentially looking at it now. I think what digital transformation has done is it's managed to accelerate, accelerate all three off these outcomes because you need to attend to all three as you move forward. But I think that the individual balance that's struck for individuals reveals really depends on their ah, their company, their situation, who their peers are, who is actually leading the transformation and so >>forth, you know, in the value pie. A lot of the early activity around CDO sort of emanated from the quality portions of the organization. It was sort of a compliance waited roll, not necessarily when you started your own journey here. Obviously been focused on monetization how data contributes to that. But But you saw that generally, organizations, even if they didn't have a CDO, they had this sort of back office alliance thing that has totally changed the the in the value equation. It's really much more about insights, as you mentioned. So one of the big changes we've seen in the organization is that data pipeline you mentioned and and cycle time. And I'd like to dig into that a little bit because you and I have talked about this. This is one of the ways that a chief data officer and the related organizations can add the most value reduction in that cycle time. That's really where the business value comes from. So I wonder if we could talk about that a little bit and how that the constituents in the stakeholders in that in that life cycle across that data pipeline have changed. >>That's a very good question. Very insightful questions. So if you look at ah, company like idea, you know, my role in totally within IBM is to enable Ibn itself to become an AI enterprise. So infuse a on into all our major business processes. You know, things like our supply chain lead to cash well, process, you know, our finance processes like accounts receivable and procurement that soulful every major process that you can think off is using Watson mouth. So that's the That's the That's the vision that's essentially what we've implemented. And that's how we are using that now as a showcase for clients and customers. One of the things that be realized is the data and Ai enablement spots off business. You know, the work that I do also has processes. Now that's the pipeline you refer to. You know, we're setting up the data pipeline. We're setting up the machine learning pipeline, deep learning blank like we're always setting up these pipelines, And so now you have the opportunity to actually turn the so called EI ladder on its head because the Islander has to do with a first You collected data, then you curated. You make sure that it's high quality, etcetera, etcetera, fit for EI. And then eventually you get to applying, you know, ai and then infusing it into business processes. And so far, But once you recognize that the very first the earliest creases of work with the data those themselves are essentially processes. You can infuse AI into those processes, and that's what's made the cycle time reduction. And although things that I'm talking about possible because it just makes it much, much easier for somebody to then implement ai within a lot enterprise, I mean, AI requires specialized knowledge. There are pieces of a I like deep learning, but there are, you know, typically a company's gonna have, like a handful of people who even understand what that is, how to apply it. You know how models drift when they need to be refreshed, etcetera, etcetera, and so that's difficult. You can't possibly expect every business process, every business area to have that expertise, and so you've then got to rely on some core group which is going to enable them to do so. But that group can't do it manually because I get otherwise. That doesn't scale again. So then you come down to these pipelines and you've got to actually infuse AI into these data and ai enablement processes so that it becomes much, much easier to scale across another. >>Some of the CEOs, maybe they don't have the reporting structure that you do, or or maybe it's more of a far flung organization. Not that IBM is not far flung, but they may not have the ability to sort of inject AI. Maybe they can advocate for it. Do you see that as a challenge for some CEOs? And how do they so to get through that, what's what's the way in which they should be working with their constituents across the organization to successfully infuse ai? >>Yeah, that's it's. In fact, you get a very good point. I mean, when I joined IBM, one of the first observations I made and I in fact made it to a senior leadership, is that I didn't think that from a business standpoint, people really understood what a I met. So when we talked about a cognitive enterprise on the I enterprise a zaydi em. You know, our clients don't really understand what that meant, which is why it became really important to enable IBM itself to be any I enterprise. You know that. That's my data strategy. Your you kind of alluded to the fact that I have this approach. There are these five steps, while the very first step is to come up with the data strategy that enables a business strategy that the company's on. And in my case, it was, Hey, I'm going to enable the company because it wants to become a cloud and cognitive company. I'm going to enable that. And so we essentially are data strategy became one off making IBM. It's something I enterprise, but the reason for doing that the reason why that was so important was because then we could use it as a showcase for clients and customers. And so But I'm talking with our clients and customers. That's my role. I'm really the only role I'm playing is what I call an experiential selling there. I'm saying, Forget about you know, the fact that we're selling this particular product or that particular product that you got GPU servers. We've got you know what's an open scale or whatever? It doesn't really matter. Why don't you come and see what we've done internally at scale? And then we'll also lay out for you all the different pain points that we have to work through using our products so that you can kind of make the same case when you when you when you apply it internally and same common with regard to the benefit, you know the cycle, time reduction, some of the cycle time reductions that we've seen in my process is itself, you know, like this. Think about metadata business metadata generating that is so difficult. And it's again, something that's critical if you want to scale your data because you know you can't really have a good catalogue of data if you don't have good business, meditate. Eso. Anybody looking at what's in your catalog won't understand what it is. They won't be able to use it etcetera. And so we've essentially automated business metadata generation using AI and the cycle time reduction that was like ninety five percent, you know, haven't actually argue. It's more than that, because in the past, most people would not. For many many data sets, the pragmatic approach would be. Don't even bother with the business matter data. Then it becomes just put somewhere in the are, you know, data architecture somewhere in your data leg or whatever, you have data warehouse, and then it becomes the data swamp because nobody understands it now with regard to our experience applying AI, infusing it across all our major business processes are average cycle time reduction is seventy percent, so just a tremendous amount of gains are there. But to your point, unless you're able to point to some application at scale within the enterprise, you know that's meaningful for the enterprise, Which is kind of what the what the role I play in terms of bringing it forward to our clients and customers. It's harder to argue. I'll make a case or investment into A I would then be enterprise without actually being able to point to those types of use cases that have been scaled where you can demonstrate the value. So that's extremely important part of the equation. To make sure that that happens on a regular basis with our clients and customers, I will say that you know your point is vomited a lot off. Our clients and customers come back and say, Tell me when they're having a conversation. I was having a conversation just last week with major major financial service of all nations, and I got the same point saying, If you're coming out of regulation, how do I convince my leadership about the value of a I and you know, I basically responded. He asked me about the scale use cases You can show that. But perhaps the biggest point that you can make as a CDO after the senior readership is can we afford to be left up? That is the I think the biggest, you know, point that the leadership has to appreciate. Can you afford to be left up? >>I want to come back to this notion of seventy percent on average, the cycle time reduction. That's astounding. And I want to make sure people understand the potential impacts. And, I would say suspected many CEOs, if not most understand sort of system thinking. It's obviously something that you're big on but often times within organisations. You might see them trying to optimize one little portion of the data lifecycle and you know having. Okay, hey, celebrate that success. But unless you can take that systems view and reduce that overall cycle time, that's really where the business value is. And I guess my we're real question around. This is Every organization has some kind of Northstar, many about profit, and you can increase revenue are cut costs, and you can do that with data. It might be saving lives, but ultimately to drive this data culture, you've got to get people thinking about getting insights that help you with that North Star, that mission of the company, but then taking a systems view and that's seventy percent cycle time reduction is just the enormous business value that that drives, I think, sometimes gets lost on people. And these air telephone numbers in the business case aren't >>yes, No, absolutely. It's, you know, there's just a tremendous amount of potential on, and it's it's not an easy, easy thing to do by any means. So we've been always very transparent about the Dave. As you know, we put forward this this blueprint right, the cognitive enterprise blueprint, how you get to it, and I kind of have these four major pillars for the blueprint. There's obviously does this data and you're getting the data ready for the consummation that you want to do but also things like training data sets. How do you kind of run hundreds of thousands of experiments on a regular basis, which kind of review to the other pillar, which is techology? But then the last two pillars are business process, change and the culture organizational culture, you know, managing organizational considerations, that culture. If you don't keep all four in lockstep, the transformation is usually not successful at an end to end level, then it becomes much more what you pointed out, which is you have kind of point solutions and the role, you know, the CEO role doesn't make the kind of strategic impact that otherwise it could do so and this also comes back to some of the only appointee of you to do. If you think about how do you keep those four pillars and lock sync? It means you've gotta have the data leader. You also gotta have the technology, and in some cases they might be the same people. Hey, just for the moment, sake of argument, let's say they're all different people and many, many times. They are so the data leader of the technology of you and the operations leaders because the other ones own the business processes as well as the organizational years. You know, they've got it all worked together to make it an effective conservation. And so the organization structure that you talked about that in some cases my peers may not have that. You know, that's that. That is true. If the if the senior leadership is not thinking overall digital transformation, it's going to be difficult for them to them go out that >>you've also seen that culturally, historically, when it comes to data and analytics, a lot of times that the lines of business you know their their first response is to attack the quality of the data because the data may not support their agenda. So there's this idea of a data culture on, and I want to ask you how self serve fits into that. I mean, to the degree that the business feels as though they actually have some kind of ownership in the data, and it's largely, you know, their responsibility as opposed to a lot of the finger pointing that has historically gone on. Whether it's been decision support or enterprise data, warehousing or even, you know, Data Lakes. They've sort of failed toe live up to that. That promise, particularly from a cultural standpoint, it and so I wonder, How have you guys done in that regard? How did you get there? Many Any other observations you could make in that regard? >>Yeah. So, you know, I think culture is probably the hardest nut to crack all of those four pillars that I back up and you've got You've got to address that, Uh, not, you know, not just stop down, but also bottom up as well. As you know, period. Appear I'll give you some some examples based on our experience, that idea. So the way my organization is set up is there is a obviously a technology on the other. People who are doing all the data engineering were kind of laying out the foundational technical elements or the transformation. You know, the the AI enabled one be planning networks, and so so that are those people. And then there is another senior leader who reports directly to me, and his organization is all around adoptions. He's responsible for essentially taking what's available in the technology and then working with the business areas to move forward and make this make and infuse. A. I do the processes that the business and he is looking. It's done in a bottom upwards, deliberately set up, designed it to be bottom up. So what I mean by that is the team on my side is fully empowered to move forward. Why did they find a like minded team on the other side and go ahead and do it? They don't have to come back for funding they don't have, You know, they just go ahead and do it. They're basically empowered to do that. And that particular set up enabled enabled us in a couple of years to have one hundred thousand internal users on our Central data and AI enabled platform. And when I mean hundred thousand users, I mean users who were using it on a monthly basis. We company, you know, So if you haven't used it in a month, we won't come. So there it's over one hundred thousand, even very rapidly to that. That's kind of the enterprise wide storm. That's kind of the bottom up direction. The top down direction Waas the strategic element that I talked with you about what I said, Hey, be our data strategy is going to be to create, make IBM itself into any I enterprise and then use that as a showcase for plants and customers That kind of and be reiterated back. And I worked the senior leadership on that view all the time talking to customers, the central and our senior leaders. And so that's kind of the air cover to do this, you know, that mix gives you, gives you that possibility. I think from a peer to peer standpoint, but you get to these lot scale and to end processes, and that there, a couple of ways I worked that one way is we've kind of looked at our enterprise data and said, Okay, therefore, major pillars off data that we want to go after data, tomato plants, data about our offerings, data about financial data, that s and then our work full student and then within that there are obviously some pillars, like some sales data that comes in and, you know, been workforce. You could have contractors. Was his employees a center But I think for the moment, about these four major pillars off data. And so let me map that to end to end large business processes within the company. You know, the really large ones, like Enterprise Performance Management, into a or lead to cash generation into and risk insides across our full supply chain and to and things like that. And we've kind of tied these four major data pillars to those major into and processes Well, well, yes, that there's a mechanism they're obviously in terms off facilitating, and to some extent one might argue, even forcing some interaction between teams that are the way they talk. But it also brings me and my peers much closer together when you set it up that way. And that means, you know, people from the HR side people from the operation side, the data side technology side, all coming together to really move things forward. So all three tracks being hit very, very hard to move the culture fall. >>Am I also correct that you have, uh, chief data officers that reporting to you whether it's a matrix or direct within the division's? Is that right? >>Yeah, so? So I mean, you know, for in terms off our structure, as you know, way our global company, we're also far flung company. We have many different products in business units and so forth. And so, uh, one of the things that I realized early on waas we are going to need data officers, each of those business units and the business units. There's obviously the enterprise objective. And, you know, you could think of the enterprise objectives in terms of some examples based on what I said in the past, which is so enterprise objective would be We've gotta have a data foundation by essentially making data along these four pillars. I talked about clients offerings, etcetera, you know, very accessible self service. You have mentioned south, so thank you. This is where the South seven speaks. Comes it right. So you can you can get at that data quickly and appropriately, right? You want to make sure that the access control, all that stuff is designed out and you're able to change your policies and you'd swap manual. But, you know, those things got implemented very rapidly and quickly. And so you've got you've got that piece off off the off the puzzle due to go after. And then I think the other aspect off off. This is, though, when you recognize that every business unit also has its own objectives and they are looking at some of those things somewhat differently. So I'll give you an example. We've got data any our product units. Now, those CEOs right there, concern is going to be a lot more around the products themselves And how were monetizing those box and so they're not per se concerned with, You know, how you reduce the enter and cycle time off IBM in total supply chain so that this is my point. So they but they're gonna have substantial considerations and objectives that they want to accomplish. And so I recognize that early on, and we came up with this notion off a data officer council and I helped staff the council s. So this is why that's the Matrix to reporting that we talked about. But I selected some of the key Blair's that we have in those units, and I also made sure they were funded by the unit. So they report into the units because their paycheck is actually determined. Pilot unit and which makes them than aligned with the objectives off the unit, but also obviously part of my central approach so that I can disseminate it out to the organization. It comes in very, very handy when you are trying to do things across the company as well. So when we you know GDP our way, we have to get the company ready for Judy PR, I would say that this mechanism became a key key aspect of what enabled us to move forward and do it rapidly. Trouble them >>be because you had the structure that perhaps the lines of business weren't. Maybe is concerned about GDP are, but you had to be concerned with it overall. And this allowed you to sort of hiding their importance, >>right? Because think of in the case of Jeannie PR, they have to be a company wide policy and implementation, right? And if he did not have that structure already in place, it would have made it that much harder. Do you get that uniformity and consistency across the company, right, You know, So you will have to in the weapon that structure, but we already have it because way said Hey, this is around for data. We're gonna have these types of considerations that they are. And so we have this thing regular. You know, this man network that meat meets regularly every month, actually, and you know, when things like GDP are much more frequently than that, >>right? So that makes sense. We're out of time. But I wonder if we could just close if you could address the M I t CDO audience that probably this is the largest audience, Believe or not, now that it's that's virtual definitely expanded the audience, but it's still a very elite group. And the reason why I was so pleased that you agreed to do this is because you've got one of the more complex organizations out there and you've succeeded. And, ah, a lot of the hard, hard work. So what? What message would you leave the M I t CDO audience Interpol? >>So I would say that you know, it's it's this particular professional. Receiving a profession is, uh, if I have to pick one trait of let me pick two traits, I think what is your A change agent? So you have to be really comfortable with change things are going to change, the organization is going to look to you to make those changes. And so that's what aspect off your job, you know, may or may not be part of me immediately. But the those particular set of skills and characteristics and something that you know, one has to, uh one has to develop or time, And I think the other thing I would say is it's a continuous looming jaw. So you continue sexism and things keep changing around you and changing rapidly. And, you know, if you just even think just in terms off the subject areas, I mean this Syria today you've got to understand technology. Obviously, you've gotta understand data you've got to understand in a I and data science. You've got to understand cybersecurity. You've gotta understand the regulatory framework, and you've got to keep all that in mind, and you've got to distill it down to certain trends. That's that's happening, right? I mean, so this is an example of that is that there's a trend towards more regulation around privacy and also in terms off individual ownership of data, which is very different from what's before the that's kind of weather. Bucket's going and so you've got to be on top off all those things. And so the you know, the characteristic of being a continual learner, I think is a is a key aspect off this job. One other thing I would add. And this is All Star Coleman nineteen, you know, prik over nineteen in terms of those four pillars that we talked about, you know, which had to do with the data technology, business process and organization and culture. From a CDO perspective, the data and technology will obviously from consent, I would say most covert nineteen most the civil unrest. And so far, you know, the other two aspects are going to be critical as we move forward. And so the people aspect of the job has never bean, you know, more important down it's today, right? That's something that I find myself regularly doing the stalking at all levels of the organization, one on a one, which is something that we never really did before. But now we find time to do it so obviously is doable. I don't think it's just it's a change that's here to stay, and it ships >>well to your to your point about change if you were in your comfort zone before twenty twenty two things years certainly taking you out of it into Parliament. All right, thanks so much for coming back in. The Cuban addressing the M I t CDO audience really appreciate it. >>Thank you for having me. That my pleasant >>You're very welcome. And thank you for watching everybody. This is Dave a lot. They will be right back after this short >>break. You're watching the queue.

Published Date : Sep 3 2020

SUMMARY :

to you by Silicon Angle Media Great to see you. So when you you and I first met, you laid out what I thought was, you know, one of the most cogent frameworks and they came up with which have a role you know, seemed most meaningful to them. So how has that changed the role of CDO? And the last one is a risk reduction that they're going to reduce the risk, you know, So one of the big changes we've seen in the organization is that data pipeline you mentioned and and Now that's the pipeline you refer that you do, or or maybe it's more of a far flung organization. That is the I think the biggest, you know, and you know having. and the role, you know, the CEO role doesn't make the kind of strategic impact and it's largely, you know, their responsibility as opposed to a lot of the finger pointing that has historically gone And that means, you know, people from the HR side people from the operation side, So I mean, you know, for in terms off our structure, as you know, And this allowed you to sort of hiding their importance, and consistency across the company, right, You know, So you will have to in the weapon that structure, And the reason why I was so pleased that you agreed to do this is because you've got one And so the you know, the characteristic of being a two things years certainly taking you out of it into Parliament. Thank you for having me. And thank you for watching everybody. You're watching the queue.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
seventy percent	QUANTITY	0.99+
December	DATE	0.99+
Inderpal Bhandari	PERSON	0.99+
seventy percent	QUANTITY	0.99+
three	QUANTITY	0.99+
first step	QUANTITY	0.99+
five steps	QUANTITY	0.99+
ninety five percent	QUANTITY	0.99+
two thousand	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
hundred thousand users	QUANTITY	0.99+
last week	DATE	0.99+
Dave	PERSON	0.99+
thousands	QUANTITY	0.99+
one hundred thousand	QUANTITY	0.99+
One	QUANTITY	0.99+
one hundred percent	QUANTITY	0.99+
four	QUANTITY	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
two traits	QUANTITY	0.98+
each	QUANTITY	0.98+
Northstar	ORGANIZATION	0.98+
two aspects	QUANTITY	0.98+
today	DATE	0.98+
four pillars	QUANTITY	0.97+
first response	QUANTITY	0.97+
North Star	ORGANIZATION	0.97+
Syria	LOCATION	0.97+
three things	QUANTITY	0.97+
second	QUANTITY	0.96+
over one hundred thousand	QUANTITY	0.95+
several years ago	DATE	0.95+
one trait	QUANTITY	0.94+
six	QUANTITY	0.93+
years	QUANTITY	0.93+
nineteen	QUANTITY	0.93+
one way	QUANTITY	0.93+
four major pillars	QUANTITY	0.92+
last half decade	DATE	0.92+
Ibn	ORGANIZATION	0.92+
Interpol	PERSON	0.91+
Bhandari	PERSON	0.91+
first observations	QUANTITY	0.91+
each time	QUANTITY	0.9+
MIT	ORGANIZATION	0.9+
hundreds of thousands of experiments	QUANTITY	0.89+
CDO	TITLE	0.89+
two pillars	QUANTITY	0.87+
a month	QUANTITY	0.86+
one aspect	QUANTITY	0.86+
twenty thirteen	DATE	0.85+
Jeannie	PERSON	0.84+
two things	QUANTITY	0.83+
four pillars	QUANTITY	0.82+
2020	DATE	0.8+

Shanthi Vigneshwaran, FDA | CUBE Conversation, June 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a cube conversation. >> Everyone welcome to this cube conversation here in the Palo Alto cube studios. I'm John Furrier your host of theCUBE, with a great guest here, Shanthi Vigneshwaran, who is with the Office of Strategic programs in the Center for Drug Evaluation and Research within the US Food and Drug Administration, FDA, is the Informatica Intelligent Disrupter of the Year award. Congratulations, Shanthi welcome to this cube conversation. Thanks for joining me. >> Thank you for having me. >> Congratulations on being the Informatica Intelligent Disrupter of the year award. Tell us more about the organization. I see FDA everyone's probably concerned these days making sure things going faster and faster, more complex, more things are happening. Tell us about your organization and what you work on. >> FDA is huge, our organization is Center for Drug Evaluation research. And its core mission is to promote public health by ensuring the availability of safety and effective drugs. For example any drugs you go and buy it in the pharmacy today, Our administration helps in trying to approve them and make sure it's so in term of quality and integrity of the marketed products in the industry. My office is specifically Office of strategic programs whose mission is to transform the drug regulatory operations with the customer focus through analytics and informatics. They work towards the advancement for the CDERs public health mission. >> What are some of the objectives that you guys have? What are some things you guys have as your core top objectives of the CDER, the drug research group? >> The core objectives is we wanted to make sure that we are promoting a safe use of the marketed drugs. We want to make sure there's the availability of the drugs that are going to the patients are effective. And also the quality of the drugs that are being marketed are able to protect public health. >> What are some of the challenges that you guys have to take in managing the pharmaceutical safety, because I can only imagine certainly now that supply chains, tracing, monitoring, drug efficacy, safety, all these things are happening. What are some of the challenges in doing all this? >> In our office there are challenges in three different areas. One is the drug regulation challenges because as drugs are being more advanced and as there are more increasingly complex products, and there are challenging in the development area of the drugs, we wanted to make sure here we have a regulation that supports any advancement in science and technology. The other thing is also Congress is actually given new authorities and roles for the FDA to act. For example the Drug Quality and Security Act, which means any drug that's they want to track and trace all the drugs that goes to the public is they know who are the distributors, who are the manufacturers. Then you have the 21st Century Cures Act, and also the CARES Act package which was recently assigned, which also has a lot of the OTC drug regulatory modernization. Then there's also the area of globalization because just as disease don't have any borders, Product safety and quality are no longer on one country. It's basically a lot of the drugs that are being manufactured are overseas and as a result we wanted to make sure there are 300 US ports. And we want to make sure the FDA regulated shipments are coming through correctly to proper venues and everything is done correctly. Those are some the challenges we have to deal with. >> So much going on a lot of moving purchase as people say, there's always drug shortages, always demand, knowing that and tracking it. I can only imagine the world you're living in because you got to be innovative, got to be fast, got to be cutting edge, got to get the quality right. Data is super critical. And can you share take a minute to explain some of the data challenges you have to address and how you did that. Because I mean I could almost just my mind's blown just thinking about how you live it every day. Can you just share some of those challenges that you had to address and how did you do? >> Some of the key challenges we actually see is we have roughly 170,000 regulatory submissions per year. There are roughly 88,000 firm registration and product listing that comes to us, and then there are more than 2 million adverse event reports. So with all these data submissions and organization as such as us we need it, we have multiple systems where this data is acquired and each has its own criteria for validating the data. Adding to it are internal and external stakeholders also want certain rules and the way the data is being identified. So we wanted to make sure there is a robust MDM framework to make sure to cleanse and enrich and standardize the data. So that it basically make sure the trust and the availability and the consistent of the data, is being supplied to published to the CDER regulatory data users. >> You guys are dealing with- >> Otherwise like it's almost to give them a 360 degree view of the drug development lifecycle. Through each of the different phases, both pre market which is before the drug hits the market, and then after it hits the market. We still want to make sure the data we receive still supports a regulatory review and decision making process. >> Yeah, and you got to deliver a consumer product to get people at the right time. All these things have to happen, and you can see it clearly the impacts everyday life. I got to ask you that the database question 'cause the database geek inside of me is just going okay. I can only imagine the silos and the different systems and the codes, because data silos is big document. We've been reporting on this on theCUBE for a long time around, making data available automation. All these things have to happen if there's data availability. Can you just take one more minute talk about some of the challenges there because you got to break down the silos at the same time you really can't replace them. >> That's true. What we did was we did leave it more of us I mean, step back like seven years ago, when we did the data management. We had like a lot of silo systems as well. And we wanted to look at we wanted to establish a, we knew we wanted to establish a master data management. So we took a little bit more of a strategic vision. And so what we ended up saying is identifying what are the key areas of the domain that will give us some kind of a relationship. What are the key areas that will give us the 360 degree lifecycle? So that's what we did. We identified the domains. And then we took a step back and said and then we looked at what is the first domain we wanted to tackle. Because we know what are these domains are going to be. And then we were like, okay, let's take a step back and say which is the domain we do it first that will give us the most return on investment, which will make people actually look at it and say, hey, this makes sense. This data is good. So that's what we ended up looking at. We looked at it as at both ends. One is from a end user perspective. Which is the one they get the benefit out of and also from a data silo perspective which is the one data domains that are common, where there's duplication that we can consolidate. >> So that's good. You did the work up front. That's critical knowing what you want to do and get out of it. What were some of the benefits you guys got out of it. From an IT standpoint, how does that translate to the business benefits? And what was achieved? >> I think the benefits we got from the IT standpoint was a lot of the deduplication was not theirs. Which basically means like a lot of the legacy systems and all of the manual data quality work we had to do we automated it. We had bots, we also had other automation process that we actually put into work with Informatica, that actually helped us to make sure it's the cost of it actually went for us considerably. For example it used to take us three days to process submissions. Now it takes us less than 24 hours to do it, for the users to see the data. So it was a little bit more, we saw the, we wanted to look at what are the low hanging fruits where it's labor intensive and how can we improve it. That's how we acted there. >> What are some of the things that you're experiencing? I mean, like, we look back at what it was before, where it is now? Is it more agility, you more responsive to the changes? Was it an aspirin? Was it a complete transformation? Was some pain reduced? Can you share just some color commentary on kind of before the way it was before and then what you're experiencing now? >> So for us, I think before, we didn't know where the for us, I mean, I wouldn't say we didn't know it, when we have the data, we looked at product and it was just product. We looked at manufactured they were all in separate silos. But when we did the MDM domain, we were able to look at the relationship. And it was very interesting to see the relationship because we now are able to say is. for example, if there is a drug shortage during due to hurricane, with the data we have, we can narrow down and say, Hey, this area is going to be affected which means these are the manufacturing facilities in that area , that are going to be not be able to function or impacted by it. We can get to the place where the hurricane tracks we use the National Weather Service data, but it helps us to narrow down some of the challenges and we can able to predict where the next risk is going to be. >> And then before the old model, there was either a blind spot or you were ad hoc, probably right? Probably didn't have that with you. >> Yeah, before you were either blind or you're doing in a more of a reactionary not proactively. Now we are able to do a little bit more proactively. And even with I mean drug shortages and drug supply chain are the biggest benefit we saw with this model. Because, for us the drug supply chain means linking the pre and post market phases that lets us know if there's a trigger and the adverse events, we actually can go back to the pre market side and see where the traceability is who's at that truck. What are all the different things that was going on. >> This is one of the common threats I see in innovation where people look at the business model and data and look at it as a competitive advantage, in this case proactivity on using data to make decisions before things happen, less reactivity. So that increases time. I mean, that would probably you're saying, and you get there faster, if you can see it, understand it, and impact the workflows involved. This is a major part of the data innovation that's going on and you starting to see new kinds of data whereas has come out. So again, starting to see a real new changeover to scaling up this kind of concept almost foundationally. What's your thoughts just as someone who's a practitioner in the industry as you start to get this kind of feelings and seeing the benefits? What's next, what do you see happening because you haven't success. How do you scale it? What how do you guys look at that? >> I think our next is we have the domains and we actually have the practices that we work. We look at it as it's basically data always just changes. So we look at is like what are some of the ways that we can improve the data? How can we take it to the next level. Because now they talk about power. They are also warehouse data lakes. So we want to see is how can we take these domains and get that relationship or get that linkages when there is a bigger set of data that's available for us. What can we use that and it actually we think there are other use cases we wanted to explore and see what is the benefit that we can get a little bit more on the predictability to do like post market surveillance or like to look at like safety signals and other things to see what are the quick things that we can use for the business operations. >> It's really a lot more fun. You're in there using the data. You're seeing the benefits and real. This is what clouds all about the data clouds here. It's scaling. Super fun to talk about and excited. When you see the impacts in real time, not waiting for later. So congratulations. You guys have been selected and you receive recognition from Informatica as the 2020, Intelligent Disrupter of the year. congratulations. What does that mean for your organization? >> I think we were super excited about it. But one thing I can say is when we embarked on this work, like seven years ago, or so, problem was like we were trying to identify and develop new scientific methods to improve the quality of our drugs to get that 360 degree view of the drug development lifecycle. The program today enables FDA CDER to capture all the granular details of data we need for the regulatory data. It helps us to support the informed decisions that we have to make in real time sometimes or and also to make sure when there's an emergency, we are able to respond with a quick look at the data to say like, hey this is what we need to do. It also helps the teams. It recognizes all the hard work. And the hours we put into establishing the program and it helped to build the awareness within FDA and also with the industry of our political master data management is. >> It's a great reward to see the fruits of the labor and good decision making I'm sure it was a lot of hard work. For folks out there watching, who are also kind of grinding away in some cases, some cases moving faster. You guys are epitome of a supply chain that's super critical. And speed is critical. Quality is critical. A lot of days critical. A lot of businesses are starting to feel this as part of an integrated data strategy. And I'm a big proponent. I think you guys have have a great example of this. What advice would you have for other practitioners because you got data scientists, but yet data engineers now who are trying to architect and create scale, and programmability, and automation, and you got the scientists in the the front lines coming together and they all feed into applications. So it's kind of a new things go on. Your advice to folks out there, on how to do this, how to do it right, the learnings, share. >> I think the key thing I, at least for my learning experience was, it's not within one year you're going to accomplish it, It's kind of we have to be very patient. And it's a long road. If you make mistakes, you will have to go back and reassess. Even with us, with all the work we did, we almost went back a couple of the domains because we thought like, hey, there are additional use cases how this can be helpful. There are additional, for example, we went with the supply chain, but then now we go back and look at it and say like, hy, there may be other things that we can use with the supply chain not just with this data, can we expand it? How can we look at the study data or other information so that's what we try to do. It's not like you're done with MDM and that is it. Your domain is complete. It's almost like you look at it and it creates a web and you need to look at each domain and you want to come back to it and see how it is you have to go. But the starting point is you need to establish what are your key domains. That will actually drive your vision for the next four or five years. You can't just do bottom up, it's more of like a top down approach. >> That's great. That's great the insight. And again, it's never done. I mean, it's data is coming. It's not going away. It's going to be integrated. It's going to be shared. You got to scale it up. A lot of hard work. >> Yeah. >> Shanthi thank you so much for the insight. Congratulations on your receiving the Disrupter of the Year Award winner for Informatica. congratulations. Intelligence >> Yeah, thank you very much for having me. Thank you. >> Thank you for sharing, Shanthi Vigneshswaran is here, Office of Strategic programs at the Center for Drug Evaluation and Research with the US FDA. Thanks for joining us, I'm John Furrier for theCUBE. Thanks for watching. (soft music)

Published Date : Jun 23 2020

SUMMARY :

leaders all around the world, of the Year award. Disrupter of the year award. and integrity of the marketed of the drugs that are going What are some of the all the drugs that goes to the public of the data challenges you have to address and the way the data is being identified. of the drug development lifecycle. of the challenges there because you got What are the key areas that will give us You did the work up front. and all of the manual data quality work of the challenges and or you were ad hoc, probably right? and the adverse events, and seeing the benefits? on the predictability to do Disrupter of the year. And the hours we put into of the labor and good decision making couple of the domains That's great the insight. the Disrupter of the Year Yeah, thank you very at the Center for Drug

ENTITIES

Entity	Category	Confidence
Shanthi	PERSON	0.99+
Shanthi Vigneshwaran	PERSON	0.99+
June 2020	DATE	0.99+
Drug Quality and Security Act	TITLE	0.99+
Congress	ORGANIZATION	0.99+
Shanthi Vigneshswaran	PERSON	0.99+
three days	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Center for Drug Evaluation and Research	ORGANIZATION	0.99+
Center for Drug Evaluation research	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
360 degree	QUANTITY	0.99+
FDA	ORGANIZATION	0.99+
CARES Act	TITLE	0.99+
Office of Strategic programs	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
21st Century Cures Act	TITLE	0.99+
less than 24 hours	QUANTITY	0.99+
Boston	LOCATION	0.99+
one year	QUANTITY	0.99+
seven years ago	DATE	0.98+
300	QUANTITY	0.98+
both ends	QUANTITY	0.98+
One	QUANTITY	0.98+
each domain	QUANTITY	0.98+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
each	QUANTITY	0.97+
today	DATE	0.97+
US Food and Drug Administration	ORGANIZATION	0.96+
National Weather Service	ORGANIZATION	0.96+
theCUBE	ORGANIZATION	0.95+
more than 2 million adverse event reports	QUANTITY	0.93+
2020	DATE	0.92+
first domain	QUANTITY	0.92+
Informatica Intelligent Disrupter of the Year	TITLE	0.9+
Office of strategic programs	ORGANIZATION	0.89+
Office of Strategic programs	ORGANIZATION	0.89+
Informatica Intelligent Disrupter of the year	TITLE	0.88+
one country	QUANTITY	0.88+
Disrupter of the Year Award	TITLE	0.83+
one more minute	QUANTITY	0.83+
US FDA	ORGANIZATION	0.82+
CDER	ORGANIZATION	0.8+
170,000 regulatory submissions per	QUANTITY	0.78+
88,000 firm	QUANTITY	0.76+
three different areas	QUANTITY	0.74+
Intelligent Disrupter	EVENT	0.7+
next four	DATE	0.63+
five years	QUANTITY	0.61+
ports	QUANTITY	0.56+
US	LOCATION	0.54+

Bob De Caux & Bas de Vos, IFS | IFS World 2019

>>Bly from Boston, Massachusetts. It's the cube covering ifs world conference 2019 brought to you by ifs. >>Okay. We're back in Boston, Massachusetts ifs world day one. You walked into cube Dave Vellante with Paul Gillen boss Devoss is here. He's the director of ISF I F S labs and Bob Dico who's the vice president of AI and RPA at ifs jets. Welcome. Good to see you again. Good morning bossy. We're on last year. I'm talking about innovation ifs labs. First of all, tell us about ifs labs and what you've been up to in the last 12 months. Well, I have has Lapsis a functioning as the new technology incubator. Fire Fest writes over continuously looking at opportunities to bring innovation into, into product and help our customers take advantage of all the new things out there to yeah. To, to create better businesses. And one of the things I talked about last year is how we want to be close to our customers. And I think, uh, that's what we have been doing over the pasta pasta year. Really be close to our customers. So Bob, you got, you got the cool title, AI, RPA, all the hot cool topics. So help us understand what role you guys play as ifs. As a software developer, are you building AI? Are you building RPA? Are you integrating it? Yes, yes. Get your paint. >>I mean, our value to our customers comes from wrapping up the technology, the AI, the RPA, the IOT into product in a way that it's going to help their business. So it's going to be easy to use. They're not going to need to be a technical specialist to take advantage of it. It's going to be embedded in the product in a way they can take advantage of very easily that that's the key for us as a software developer. We don't want to offer them a platform that they can just go and do their own thing. We want to sort of control it, make it easier for them. >>So I presume it's not a coincidence that you guys are on together. So this stuff starts in the labs and then your job is to commercialize it. Right? So, so take machine intelligence for example. I mean it can be so many things to so many different people. Take us back to sort of, you know, the starting point, you know, within reason of your work on machine intelligence, what you were thinking at the time, maybe some of the experiments that you did and how it ends up in the product. Oh, very good question. Right? So I think we start at a, Oh, well first of all, I think ifs has been using a machine learning at, at various points in our products for many, many years of Trumbull in our dynamic scheduling engine. We have been using neural networks to optimize fuel serve scheduling for quite some many years. >>But I think, um, if we go back like two years, what we sold is that, uh, there, there's a real potential, um, in our products that if you will take machine learning algorithms inside of the product to actually, um, help ultimately certain decisions in there, um, that could potentially help our business quite a bit. And the role of ifs lapse back in the day as that we just started experimenting, right? So we went out to different customers. Uh, we started engaging with them to see, okay, what kind of data do we have, what kind of use cases are there? And basically based on that, we sort of developed a vision around AI and a division back in the day was based on on three important aspects, human machine interaction optimization and automation. And that kind of really lended well with our customer use case. We talked quite a bit about that or the previous world conference. >>So at that point we basically decided, okay, you know what, we need to make serious work of this, uh, experimenting as boots. But at a certain point you have to conclude that the experiments were successful, which we did. And at that point we decided to look at, okay, how can we make this into a product and how to normally go system. We started engaging with them more intensively and starting to hand over in this guys, we decided the most also a good moment to bring somebody on board that actually has even more experience and knowledge in AI and what we already had as hive as labs. But that could basically take over the Baton. And say, okay, now I am going to run with it and actually start commercializing and productizing that still in collaboration with IVIS laps. But yeah, taking that next step in the road and then then Bob came onboard. >>Christian Pedersen made the point during the keynote this morning that you have to avoid the, the appeal of technology for technology's sake. You have to have it. I start with the business use case. You are both very technology, very deep into the technology. How do you keep disciplined to avoid letting the technology lead your, your activities? >>Well, both. Yeah. So, so I think a good example is what we see this world's going fronts as well. It is staying closer to customer and, and, and accepting and realizing that there is no, um, there's no use in just creating technology for sake of technology as you say yourself. So what we did here for example, is that we showcase collaboration projects with, with customers. So, for example, we show showcase a woman chair pack, which um, as a, as a manufacturing of spouting pouches down here in Massachusetts actually, uh, and they wanted to invest in robotics to get our widows. So what we basically did is actually wind into their factory literally on the factory floor and start innovating there. So instead of just thinking about, okay, how do robotics and AI for subrogations or one of our older products work together, we set, let's experiment on the shop floor off a customer instead of inside of the ivory towers. Sometimes our competitors to them, they'll start to answer your question. >>Sure. I can pick up a little, a little feasible. Yeah. Well, so in, I think the really important thing, and again, Christian touched on it this morning is not the individual technologies themselves. It's how they work together. Um, we see a lot of the underlying technologies becoming more commoditized. That's not where companies are really starting to differentiate algorithms after a while become algorithms. There's a good way of doing things. They might evolve slightly over time, but effectively you can open source a lot of these things. You can take advantage, the value comes from that next layer up. How you take those technologies together, how you can create end to end processes. So if we take something like predictive, we would have an asset. We would have sensors on that asset that would be providing real time data, uh, to an IOT system. We can combine that with historical maintenance data stored within a classic ERP system. >>We can pull that together, use machine learning on it to make a prediction for when that machine is gonna break down. And based on that prediction, we can raise a work order and if we do that over enough assets, we can then optimize our technicians. So instead of having to wait for it to break down, we can know in advance, we can plan for people to be in the right the right place. It's that end to end process where the value is. We have to bring that together in a way that we can offer it to our customers. There's certainly, you know, a lot of talk in the press about machines replacing humans. Machine of all machines have always replaced humans. But for the first time in history, it's with cognitive functions. Now it's, people get freaked out. A little bit about that. I'm hearing a theme of, of augmentation, you know, at this event. >>But I wonder if you could share your thoughts with regard to things like AI automation, robotic process automation. How are customers, you know, adopting them? Is there sort of concern up front? I mean we've talked to a number of RPA customers that, you know, initially maybe are hesitant but then say, wow, I'm automating all those tasks that I hate and sort of lean in. But at the same time, you know, it's clear that this could have an effect on people's jobs and lives. What are your thoughts? Sure. Do you want to kick off on them? Yeah, I'll know. Yeah, absolutely. That's fine. So I think in terms of the, the automation, the low level tasks, as you say, that can free up people to focus on higher value activities. Something like RPA, those bots, they can work 24, seven, they can do it error free. >>Um, it's often doing work that people don't enjoy anyway. So that tends to actually raise morale, raise productivity, and allow you to do tasks faster. And the augmentation, I think is where it gets very interesting because you need to, you often don't want to automate all your decisions. You want people to have the final say, but you want to provide them more information, better, more pertinent ways of making that decision. And so it's very important. If you can do that, then you've got to build the trust with them. If you're going to give them an AI decision that's just out of a black box and just say, there's a 70% chance of this happening. And what I founded in my career is that people don't tend to believe that or they start questioning it and that's where you have difficulty. So this is where explainable AI comes in. >>I do to be able to state clearly why that prediction is being made, what are the key drivers going into it? Or if that's not possible, at least giving them the confidence to see, well, you're not sure about this prediction. You can play around with it. You can see I'm right, but I'm going to make you more comfortable and then hopefully you're going to understand and, and sort of move with it. And then it starts sort of finding its way more naturally into the workplace. So that's, I think the key to building up successful open sexually. What it is is it's sort of giving a human the, the, the parameters the and saying, okay, now you can make the call as to whether or not you want to place that bet or make a different decision or hold off and get more data. Is that right? >>Uh, yeah. I think a lot of it is about setting the threshold and the parameters with within which you want to operate. Often if a model is very confident, either you know, a yes or a no, you probably be quite happy to let it automate. Take that three, it's the borderline decision where it gets interesting. You probably would still want someone to look over it, but you want them to do it consistently. You want them to do it using all the information to hand and say that's what you do. You're presented to them. And to add to that, um, I think we also should not forget they said a lot of our customers, a lot of companies are, are actually struggling finding quality stuff, right? I mean aging of the workforce riots, we're, we're old. I'm retiring eventually. Right? So aging of the workforce is a potential issue. >>Funding, lack of quality. Stop. So if I go back to the chair pack example I was just talking about, um, and, and, and some of the benefits they get out of that robotics projects, um, um, is of course they're saving money right there. They're saving about one point $5 million a year on money on that project, but their most important benefits for them, it's actually the fact that I have been able to move the people from the work floor doing that into higher scope positions, effectively countering the labor shortage today. They were limited in their operations, but in fact, I had two few quality stuff. And by putting the robots in, they were able to reposition those people and that's for them the most important benefits. So I think there's always a little bit of a balance. Um, but I also think we eventually need robots. >>We need ultimation to also keep up with the work that needs to be done. Maybe you can speak to Bobby, you can speak to software robots. We've, Pete with people think of robots, they tend to think of machines, but in fact software robots are, where are the a, the real growth is right now, the greatest growth is right now. How pervasive will software robots be in the workplace do you think in the three to five years? >> I think the software robots as they are now within the RPA space, um, they fulfill a sort of part of the Avril automation picture, but they're never going to be the whole thing. I see them very much as bringing different systems together, moving data between systems, allowing them to interact more effectively. But, um, within systems themselves, uh, you know, the bots can only really scratched the surface. >>They're interacting with software in the same way a human would on the whole by clicking buttons going through, et cetera, beneath the surface. Uh, you know, for example, within the ifs products we have got data understanding how people interact with our products. We can use machine learning on that data to learn, to make recommendations to do things that our software but wouldn't be able to see. So I think it's a combination. There's software bots, they're kind of on the outside looking in, but they're very good at bringing things together. And then insight you've got that sort of deeper automation to take real advantage of the individual pieces of software. >> This may be a little out there, but you guys >>are, you guys are deep into, into the next generation lot to talk right now about quantum and how we could see workable quantum computers within the next two to two to three years. How, what do you think the, the outlook is there? How is that going to shake things up? So >>let me answer this. We were actually a having an active project and I for slabs currently could looking at quantum computing, right? Um, there's a lot of promise in it. Uh, there's also a lot of unfilled, unfulfilled problems in that, right? But if you look at the, the potential, I think where it really starts playing, um, into, uh, into benefits is if the larger the, the, the optimization problems, the larger the algorithms are that we have to run, the more benefits it actually starts bringing us. So if you're asking me for an for an outlook, I say there is potential definitely, especially in optimization problems. Right. Um, but I also think that the realistic outlook is quite far out. Uh, yes, we're all experimenting it and I think it's our responsibility as ifs or ciphers laps to also look on what it could potentially mean for applications as we FSI Fs. >>But my personal opinion is the odd Lucas. Yeah. So what comes five to 10 years out? What comes first? Quantum computing or fully autonomous driverless vehicles? Oh, that's a tricky question. I mean, I would say in terms of the practical commercial application, it's going to be the latter in that much so that's quite a ways off. Yeah, I think so. Of course. Question back on on RPA, what are you guys exactly doing on RPA? Are you developing your own robotic process automation software or are you integrating, doing both say within the products? We, you know, if we think of RPA as, as this means of interacting with the graphical user interface in a way that a human would within the product. Um, we, we're thinking more in terms of automating processes using the machine learning as I mentioned, to learn from experience, et cetera. Uh, in a way that will take advantage of things like our API eighth, an API APIs that are discussed on main stage today. >>RPA is very much our way of interacting with other systems, allowing other systems when trapped with ifs, allowing us to, to send messages out. So we need to make it as easy as possible for those bots to call us. Uh, you know, that can be by making our screens nice and accessible and easy to use. But I think the way that RPA is going, a lot of the major vendors are becoming orchestrators really. They're creating these, these studios where you can drag and drop different components into to do ACR, provide cognitive services and you know, elements that you could drag and drop in would be to say, ah, take data from a file and load it into ifs and put it in a purchase order. And you can just drag that in and then it doesn't really matter how it connects to YFS. It can do that via the API. And I think it probably will say it's creating the ability to talk to ifs. That's the most important thing for us. So you're making your products a RPA ready, friendly >>you, it sounds like you're using it for your own purposes, but you're not an RPA vendor per se. You know what I'm saying? Okay. Here's how you do an automation. You're gonna integrate that with other RPA leadership product. I think we would really take a more firm partner approach to it. Right? So if a customer, I mean, there's different ways of integrating systems to get our RPA as a Google on there. There's other ways as well, right? That if a customer actually, um, wants to integrate the systems together using RPA, very good choice, we make sure that our products are as ready as much for that as possible. Of course we will look at the partner ecosystem to make sure that we have sufficient and the right partners in there that a customer has as a choice in what we recommends. But basically we say where we want to be agnostic to what kind of RPA feminists sits in there that was standing there was obviously a lot of geopolitical stuff going on with tariffs and the like. >>So not withstanding that, do you feel as though things like automation, RPA, AI will swing the pendulum back to onshore manufacturing, whether it's Europe or, or U S or is the costs still so dramatically advantageous to, you know, manufacture in China? Well, that pendulum swing in your opinion as a result of automation? Um, I have a good, good question. Um, I'm not sure it's will completely swing, but it will definitely be influenced. Right. One of the examples I've seen in the RPA space ride wire a company before we would actually have an outsourcing project in India where people would just type over D uh, DDD, the purchase orders right now. Now in RPA bolts scans. I didn't, so they don't need the Indian North shore anymore. But it's always a balance between, you know, what's the benefit of what's the cost of developing technology and that's, and it's, and, and it's almost like a macro economical sort of discussion. >>One of the discussions I had with my colleagues in Sri Lanka, um, and, and maybe completely off topic example, we were talking about carwash, right? So us in the, in the Western world we have car wash where you drive your car through, right? They don't have them in Sri Lankan. All the car washes are by hands. But the difference is because labor is cheaper there that it's actually cheaper to have people washing your car while we'd also in the us for example, that's more expensive than actually having a machine doing it. Right. So it is a, it's a macro economical sort of question that is quite interesting to see how that develops over the next couple of years. All right, Jess. Well thanks very much for coming on the cube. Great discussion. Really appreciate it. Thank you very much. You're welcome. All right. I'll keep it right there, but he gave a latte. Paul Gillen moved back. Ifs world from Boston. You watch in the queue.

Published Date : Oct 8 2019

SUMMARY :

ifs world conference 2019 brought to you by ifs. Good to see you again. So it's going to be easy to use. So I presume it's not a coincidence that you guys are on together. take machine learning algorithms inside of the product to actually, um, help ultimately certain So at that point we basically decided, okay, you know what, we need to make serious work of this, Christian Pedersen made the point during the keynote this morning that you have to avoid the, um, there's no use in just creating technology for sake of technology as you say yourself. So if we take something like predictive, we would have an asset. We have to bring that together in a way that we can offer it to our customers. But at the same time, you know, it's clear that this could have an effect in my career is that people don't tend to believe that or they start questioning it and that's where you have difficulty. but I'm going to make you more comfortable and then hopefully you're going to understand and, And to add to that, um, I think we also should not it's actually the fact that I have been able to move the people from the work floor doing that into in the three to five years? uh, you know, the bots can only really scratched the surface. Uh, you know, for example, within the ifs products we How, what do you think the, the outlook is there? But if you look at the, the potential, I think where it really starts Question back on on RPA, what are you guys exactly doing on RPA? to do ACR, provide cognitive services and you know, elements that you could and the right partners in there that a customer has as a choice in what we recommends. So not withstanding that, do you feel as though things like automation, in the Western world we have car wash where you drive your car through, right?

ENTITIES

Entity	Category	Confidence
Bobby	PERSON	0.99+
India	LOCATION	0.99+
Paul Gillen	PERSON	0.99+
Christian Pedersen	PERSON	0.99+
Bob Dico	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Massachusetts	LOCATION	0.99+
Boston	LOCATION	0.99+
China	LOCATION	0.99+
three	QUANTITY	0.99+
Bob	PERSON	0.99+
70%	QUANTITY	0.99+
Sri Lanka	LOCATION	0.99+
last year	DATE	0.99+
five	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
Jess	PERSON	0.99+
Google	ORGANIZATION	0.99+
24	QUANTITY	0.99+
both	QUANTITY	0.99+
five years	QUANTITY	0.99+
Pete	PERSON	0.99+
Bob De Caux	PERSON	0.99+
One	QUANTITY	0.99+
three years	QUANTITY	0.98+
first	QUANTITY	0.98+
two	QUANTITY	0.98+
10 years	QUANTITY	0.98+
seven	QUANTITY	0.98+
Bas de Vos	PERSON	0.98+
IVIS	ORGANIZATION	0.97+
today	DATE	0.97+
first time	QUANTITY	0.97+
two years	QUANTITY	0.97+
Europe	LOCATION	0.96+
First	QUANTITY	0.96+
eighth	TITLE	0.96+
three important aspects	QUANTITY	0.95+
Fire Fest	EVENT	0.93+
one	QUANTITY	0.93+
$5 million a year	QUANTITY	0.92+
conference 2019	EVENT	0.91+
this morning	DATE	0.88+
ISF I F S labs	ORGANIZATION	0.87+
Sri Lankan	LOCATION	0.87+
Lucas	PERSON	0.85+
RPA	ORGANIZATION	0.85+
Christian	ORGANIZATION	0.85+
Avril	ORGANIZATION	0.84+
Trumbull	ORGANIZATION	0.84+
Devoss	ORGANIZATION	0.83+
IFS	ORGANIZATION	0.83+
Indian North shore	LOCATION	0.83+
last 12 months	DATE	0.83+
two few quality	QUANTITY	0.79+
next couple of years	DATE	0.76+
about one point	QUANTITY	0.7+
vice president	PERSON	0.67+
ifs	EVENT	0.67+
day one	QUANTITY	0.66+
IFS World 2019	EVENT	0.65+
FSI Fs	OTHER	0.6+
U S	LOCATION	0.59+
Western	LOCATION	0.56+
RPA	TITLE	0.56+
API	OTHER	0.55+
AI	ORGANIZATION	0.53+
Lapsis	ORGANIZATION	0.45+
Baton	LOCATION	0.36+

Chris Lynch, AtScale | MIT CDOIQ 2019

>> From Cambridge, Massachusetts it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by, SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. You're watching theCUBE, the leader in live tech coverage. I'm Dave Vellante with my co-host, Paul Gillan. Chris Lynch, good friend is here CEO, newly minted CEO and AtScale and legend. Good to see you. >> In my own mind. >> In mine too. >> It's great to be here. >> It's awesome, thank you for taking time. I know how busy you are, you're running around like crazy your next big thing. I was excited to hear that you got back into it. I predicted it a while ago you were a very successful venture capitalists but at heart, you're startup guy, aren't ya? >> Yeah 100%, 100%. I couldn't be more thrilled, I feel invigorated. I think I've told you many times, when you've interviewed me and asked me about the transition from being an entrepreneur to being a VC and since it's a PG show, I've got a different analog than the one I usually give you. I used to be a movie star and now I'm an executive producer of movies. Now am back to being a movie star, hopefully. >> yeah well, so you told me when you first became a VC you said, I look for startups that have a 10X impact either 10X value, 10X cost reduction. What was it that attracted you to AtScale? What's the 10X? >> AtScale, addresses $150 billion market problem which is basically bringing traditional BI to the cloud. >> That's the other thing you told me, big markets. >> Yeah, so that's the first thing massive market opportunity. The second is, the innovation component and where the 10X comes we're uniquely qualified to virtualize data into the pipeline and out. So I like to say that we're the bridge between BI and AI and back. We make every BI user, a citizen data scientist and that's a game changer. And that's sort of the new futuristic component of what we do. So one part is steeped in, that $150 billion BI marketplace in a traditional analytics platforms and then the second piece is into you delivering the data, into these BI excuse me, these AI machine learning platforms. >> Do you see that ultimately getting integrated into some kind of larger, data pipeline framework. I mean, maybe it lives in the cloud or maybe on prem, how do you see that evolving over time? >> So I believe that, with AtScale as one single pane of glass, we basically are providing an API, to the data and to the user, one single API. The reason that today we haven't seen the delivery of the promise of big data is because we don't have big data. Fortunate 2000 companies don't have big data. They have lots of data but to me big data means you can have one logical view of that data and get the best data pumped into these models in these tools, and today that's not the case. They're constricted by location they're constricted by vendor they're constricted by whether it's in the cloud or on prem. We eliminate those restrictions. >> The single API, I think is important actually. Because when you look at some of these guys what they're doing with their data pipeline they might have 10 or 15 unique API's that they're trying to manage. So there's a simplification aspect to, I suppose. >> One of the knocks on traditional BI has always been the need for extract databases and all the ETL that goes that's involved in that. Do you guys avoid that stage? You go to the production data directly or what's the-- >> It's a great question. The way I put it is, we bring Moses to the mountain the mountain being the data, Moses being the user. Traditionally, what people have been trying to do is bring the mountain to Moses, doesn't scale. At AtScale, we provide an abstraction a logical distraction between the data and the BI user. >> You don't touch, you don't move the data. >> We don't move the data. Which is what's unique and that's what's delivering I think, way more than a 10X delivery in value. >> Because you leave the data in place you bring that value to wherever the data is. Which is the original concept of Hadoop, by the way. That was what was profound about Hadoop everybody craps on it now, but that was the game changer and if you could take advantage of that that's how you tap your 10X. >> To the difference is, we're not, to your point we're not moving the data. Hadoop, in my humble opinion why it plateaued is because to get the value, you had to ask the user to bring and put data in yet another platform. And the reason that we're not delivering on big data as an industry, I believe is because we've too many data sources, too many platforms too many consumers of data and too many producers. As we build all these islands of data, with no connectivity. The idea is, we'll create this big data lake and we're going to physically put everything in there. Guess what? Someday turned out to be never. Because people aren't going to deal with the business disruption. We move thousands of users from a platform like Teradata to a platform like Snowflake or Google BigQuery, we don't care. We're a multi-cloud and we're a hybrid cloud. But we do it without any disruption. You're using Excel, you just continue and use it. You just see the results are faster. You use Tableau, same difference. >> So we had all the vertical rock stars in here. So we had Colin in yesterday, we had Stonebraker around earlier. Andy Palmer just came on and Chris here with the CEO who ultimately sold the company to HP. That really didn't do anything with it and then spun it off and now it's back. Aaron was, he had a spring in his step yesterday. So when you think about, Vertica. The technology behind Vertica go back 10 years and where we come now give us a little journey of, your data journey. >> So I think it plays into the, the original assertion is that, vertical is a best-in-class platform for analytics but it was yet another platform. The analog I give now, is now we have Snowflake and six months, 12 months from now we're going to have another one. And that creates a set of problems if you have to live in the physical world. Because you've all these islands of data and I believe, it's about the data not about the models, it's about the data. You can't get optimal results if you don't have an optimal access to the pertinent data. I believe that having that Universal API is going to make the next platform that more valuable. You're not going to be making the trade-off is, okay we have this platform that has some neat capability but the trade-off is from an enterprise architecture perspective we're never going to be able to connect all this stuff. That's how all of these things proliferated. My view is, in a world where you have that single pane of glass, that abstraction layer between the user and the data. Then innovation can be spawned quicker and you can use these tools effectively 'cause you're not compromising being able to get a logical view of the data and get access to it as a user. >> What's your issue with Snowflake you mentioned them, Mugli's company-- >> No issue, they're a great partner of ours. We eliminate the friction between the user going from an on-prem solution to the cloud. >> Slootman just took over there. So you know where that's going. >> Yep (laughing) >> Frank's got the magic touch. Okay good, you say they're a partner yours how are you guys partnering? >> They refer us into customers that, if you want to buy Snowflake now the next issue is, how do i migrate? You don't. You put our virtualization layer in and then we allow you access to Snowflake in a non-disruptive way, versus having to move data into their system or into a particular cloud which creates sales friction. >> Moving data is just, you want to avoid it at all cost. >> I do want to ask you because I met with your predecessors, Dave Mariani last year and I know he was kind of a reluctant CEO he didn't really want to be CEO but wanted to be CTO, which is what he is now. How did that come about, that they found you that you connected with them and decided this was the right opportunity. >> That's a great question. I actually looked at the company at the seed stage when I was in venture, but I had this thing as you know that, I wanted to move companies to Boston and they're about my vintage age-wise and he's married with four kids so that wasn't in the cards. I said look, it doesn't make sense for me to seed this company 'cause I can't give you the time you're out in California everything I'm instrumenting is around Boston. We parted friends. And I was skeptical whether he could build this 'cause people have been talking about building a heterogeneous universal semantic layer, for years and it's never come to fruition. And then he read in Fortune or Forbes that I was leaving Accomplice and that I was looking for one more company to operate. He reached out and he told me what they were doing that hey, we really built it but we need help and I don't want to run this. It's not right for the company and the opportunity So he said, "I'll come and I'll consult to you." I put together a plan and I had my Vertica and data robot. NekTony guys do the technical diligence to make sure that the architecture wasn't wedded to the dupe, like all the other ones were and when I saw it wasn't then I knew the market opportunity was to take that, rifle and point it at that legacy $150 billion BI market not at the billion dollar market of Hadoop. And when we did that, we've been growing at 162% quarter-over-quarter. We've built development centers in Bulgaria. We've moved all operations, non-technical to Boston here down in our South Station. We've been on fire and we are the partner of choice of every cloud manner, because we eliminate the sales friction, for customers being able to take advantage of movement to the cloud and we're able through our intelligent pipeline and capability. We're able to reduce the cost significantly of queries because we understand and we were able to intelligently cash those queries. >> Sales ops is here, all-- >> Sales marketing, customer support, customer success and we're building a machine learning team here at Dev team here. >> Where are you in that sort of Boston build-out? >> We have an office on 711 Atlantic that we opened in the fall. We're actually moving from 4,000 square feet to 10,000 this month. In less than six months and we'll house by the first year, 100 employees in Boston 100 in Bulgaria and about that same hundred in San Mateo. >> Are you going after net new business mainly? Or there's a lot of legacy BI out there are you more displacing those products? >> A couple of things. What we find is that, customers want to evolve into the cloud, they don't want a revolution they want a evolution. So we allow them, because we support hybrid cloud to keep some data behind the firewall and then experiment with moving other data to the cloud platform of choice but we're still providing that one logical view. I would say most of our customers are looking to reap platform, off of Teradata or something onto a, another platform like Snowflake. And then we have a set of customers that see that as part of the solution but not the whole solution. They're more true hybrids but I would say that 80% of our customers are traditional BI customers that are trying to contemporize their environments and be able to take advantage of tabular support and multidimensional, the things that we do in addition to the cube world. >> They can keep whatever they're using. >> Correct, that's the key. >> Did you do the series D, you did, right? >> Yes, Morgan Stanely led. >> So you're not actively but you're good for now, It was like $50 million >> Yeah we raised $50 million. >> You're good for a bit. Who's in the Chris Lynch target? (laughs) Who's the enemy? Vertica, I could say it was the traditional database guys. Who's the? >> We're in a unique position, we're almost Switzerland so we could be friend to foe, of anybody in that ecosystem because we can, non-disruptively re-platform customers between legacy platforms or from legacy platforms to the cloud. We're an interesting position. >> So similar to the file sharing. File virtualization company >> The Copier. >> Copier yeah. >> It puts us in an interesting position. They need to be friends with us and at the same time I'm sure that they're concerned about the capabilities we have but we have a number of retail customers for instance that have asked us to move down from Amazon to Google BigQuery, which we accommodate and because we can do that non-disruptively. The cost and the ability to move is eliminated. It gives customers true freedom of choice. >> How worried are you, that AWS tries to replicate what you guys do. You're in their sights. >> I think there are technical, legal and structural barriers to them doing that. The technical is, this team has been at it for six and a half years. So to do what we do, they'll have to do what we've done. Structurally from a business perspective if they could, I'm not sure they want to. The way to think about Amazon is, they're no different than Teradata, except for they want the same vendor lock-in except they want it to be the Amazon Cloud when Teradata wanted it to be, their data warehouse. >> They don't promote multi-cloud versus-- >> Yeah, they don't want multi-cloud they don't want >> On Prem >> Customers to have a freedom of choice. Would they really enable a heterogeneous abstraction layer, I don't think they would nor do I think any of the big guys would. They all claim to have this capability for their system. It's like the old IBM adage I'm in prison but the food's going to get three squares a day, I get cable TV but I'm in prison. (laughing) >> Awesome, all right, parting thoughts. >> Parting thoughts, oh geez you got to give me a question I'm not that creative. >> What's next, for you guys? What should we be paying attention to? >> I think you're going to see some significant announcements in September regarding the company and relationships that I think will validate the impact we're having in the market. >> Give you some leverage >> Yeah, will give us, better channel leverage. We have a major technical announcement that I think will be significant to the marketplace and what will be highly disruptive to some of the people you just mentioned. In terms of really raising the bar for customers to be able to have the freedom of choice without any sort of vendor lock-in. And I think that that will create some counter strike which we'll be ready for. (laughing) >> If you've never heard of AtScale before trust me you're going to in the next 18 months. Chris Lynch, thanks so much for coming on theCUBE. >> It's my pleasure. >> Great to see you. All right, keep it right there everybody we're back with our next guest, right after this short break you're watching theCUBE from MIT, right back. (upbeat music)

Published Date : Aug 2 2019

SUMMARY :

Brought to you by, SiliconANGLE Media. Good to see you. that you got back into it. and asked me about the transition What was it that attracted you to AtScale? traditional BI to the cloud. That's the other thing and then the second piece is into you I mean, maybe it lives in the cloud and get the best data Because when you look and all the ETL that goes is bring the mountain don't move the data. We don't move the data. and if you could take advantage of that is because to get the value, So when you think about, Vertica. and I believe, it's about the data We eliminate the friction between the user So you know where that's going. Frank's got the magic touch. and then we allow you access to Snowflake you want to avoid it that they found you and it's never come to fruition. and we're building a by the first year, 100 employees in Boston the things that we do Who's in the Chris Lynch target? to the cloud. So similar to the file sharing. about the capabilities we have tries to replicate what you guys do. So to do what we do, they'll I'm in prison but the food's you got to give me a question in September regarding the to some of the people you just mentioned. in the next 18 months. Great to see you.

ENTITIES

Entity	Category	Confidence
Paul Gillan	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Chris Lynch	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Bulgaria	LOCATION	0.99+
September	DATE	0.99+
Chris	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
Andy Palmer	PERSON	0.99+
Dave Mariani	PERSON	0.99+
California	LOCATION	0.99+
Aaron	PERSON	0.99+
Boston	LOCATION	0.99+
San Mateo	LOCATION	0.99+
$150 billion	QUANTITY	0.99+
$50 million	QUANTITY	0.99+
$150 billion	QUANTITY	0.99+
Moses	PERSON	0.99+
80%	QUANTITY	0.99+
4,000 square feet	QUANTITY	0.99+
last year	DATE	0.99+
second piece	QUANTITY	0.99+
162%	QUANTITY	0.99+
South Station	LOCATION	0.99+
AtScale	ORGANIZATION	0.99+
Morgan Stanely	PERSON	0.99+
100%	QUANTITY	0.99+
four kids	QUANTITY	0.99+
Excel	TITLE	0.99+
six and a half years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Tableau	TITLE	0.99+
yesterday	DATE	0.99+
first	QUANTITY	0.99+
second	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
less than six months	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Frank	PERSON	0.99+
today	DATE	0.98+
this month	DATE	0.98+
Switzerland	LOCATION	0.98+
Hadoop	TITLE	0.98+
10X	QUANTITY	0.98+
100 employees	QUANTITY	0.98+
one part	QUANTITY	0.98+
Slootman	PERSON	0.98+
10,000	QUANTITY	0.97+
Vertica	ORGANIZATION	0.97+
Mugli	ORGANIZATION	0.97+
Google	ORGANIZATION	0.97+
15 unique API	QUANTITY	0.96+
hundred	QUANTITY	0.96+
six months	QUANTITY	0.96+
three squares a day	QUANTITY	0.96+
thousands of users	QUANTITY	0.96+
NekTony	ORGANIZATION	0.96+
Fortune	TITLE	0.96+
12 months	QUANTITY	0.95+
single API	QUANTITY	0.95+
711 Atlantic	LOCATION	0.95+
2000 companies	QUANTITY	0.94+
One	QUANTITY	0.94+
next 18 months	DATE	0.94+
Colin	PERSON	0.93+
one more company	QUANTITY	0.92+
one single API	QUANTITY	0.92+
single pane	QUANTITY	0.91+

Susan Wilson, Informatica & Blake Andrews, New York Life | MIT CDOIQ 2019

(techno music) >> From Cambridge, Massachusetts, it's theCUBE. Covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts everybody, we're here with theCUBE at the MIT Chief Data Officer Information Quality Conference. I'm Dave Vellante with my co-host Paul Gillin. Susan Wilson is here, she's the vice president of data governance and she's the leader at Informatica. Blake Anders is the corporate vice president of data governance at New York Life. Folks, welcome to theCUBE, thanks for coming on. >> Thank you. >> Thank you. >> So, Susan, interesting title; VP, data governance leader, Informatica. So, what are you leading at Informatica? >> We're helping our customers realize their business outcomes and objectives. Prior to joining Informatica about 7 years ago, I was actually a customer myself, and so often times I'm working with our customers to understand where they are, where they going, and how to best help them; because we recognize data governance is more than just a tool, it's a capability that represents people, the processes, the culture, as well as the technology. >> Yeah so you've walked the walk, and you can empathize with what your customers are going through. And Blake, your role, as the corporate VP, but more specifically the data governance lead. >> Right, so I lead the data governance capabilities and execution group at New York Life. We're focused on providing skills and tools that enable government's activities across the enterprise at the company. >> How long has that function been in place? >> We've been in place for about two and half years now. >> So, I don't know if you guys heard Mark Ramsey this morning, the key-note, but basically he said, okay, we started with enterprise data warehouse, we went to master data management, then we kind of did this top-down enterprise data model; that all failed. So we said, all right, let's pump the governance. Here you go guys, you fix our corporate data problem. Now, right tool for the right job but, and so, we were sort of joking, did data governance fail? No, you always have to have data governance. It's like brushing your teeth. But so, like I said, I don't know if you heard that, but what are your thoughts on that sort of evolution that he described? As sort of, failures of things like EDW to live up to expectations and then, okay guys over to you. Is that a common theme? >> It is a common theme, and what we're finding with many of our customers is that they had tried many of the, if you will, the methodologies around data governance, right? Around policies and structures. And we describe this as the Data 1.0 journey, which was more application-centric reporting to Data 2.0 to data warehousing. And a lot of the failed attempts, if you will, at centralizing, if you will, all of your data, to now Data 3.0, where we look at the explosion of data, the volumes of data, the number of data consumers, the expectations of the chief data officer to solve business outcomes; crushing under the scale of, I can't fit all of this into a centralized data at repository, I need something that will help me scale and to become more agile. And so, that message does resonate with us, but we're not saying data warehouses don't exist. They absolutely do for trusted data sources, but the ability to be agile and to address many of your organizations needs and to be able to service multiple consumers is top-of-mind for many of our customers. >> And the mind set from 1.0 to 2.0 to 3.0 has changed. From, you know, data as a liability, to now data as this massive asset. It's sort of-- >> Value, yeah. >> Yeah, and the pendulum is swung. It's almost like a see-saw. Where, and I'm not sure it's ever going to flip back, but it is to a certain extent; people are starting to realize, wow, we have to be careful about what we do with our data. But still, it's go, go, go. But, what's the experience at New York Life? I mean, you know. A company that's been around for a long time, conservative, wants to make sure risk averse, obviously. >> Right. >> But at the same time, you want to keep moving as the market moves. >> Right, and we look at data governance as really an enabler and a value-add activity. We're not a governance practice for the sake of governance. We're not there to create a lot of policies and restrictions. We're there to add value and to enable innovation in our business and really drive that execution, that efficiency. >> So how do you do that? Square that circle for me, because a lot of people think, when people think security and governance and compliance they think, oh, that stifles innovation. How do you make governance an engine of innovation? >> You provide transparency around your data. So, it's transparency around, what does the data mean? What data assets do we have? Where can I find that? Where are my most trusted sources of data? What does the quality of that data look like? So all those things together really enable your data consumers to take that information and create new value for the company. So it's really about enabling your value creators throughout the organization. >> So data is an ingredient. I can tell you where it is, I can give you some kind of rating as to the quality of that data and it's usefulness. And then you can take it and do what you need to do with it in your specific line of business. >> That's right. >> Now you said you've been at this two and half years, so what stages have you gone through since you first began the data governance initiative. >> Sure, so our first year, year and half was really focused on building the foundations, establishing the playbook for data governance and building our processes and understanding how data governance needed to be implemented to fit New York Life in the culture of the company. The last twelve months or so has really been focused on operationalizing governance. So we've got the foundations in place, now it's about implementing tools to further augment those capabilities and help assist our data stewards and give them a better skill set and a better tool set to do their jobs. >> Are you, sort of, crowdsourcing the process? I mean, you have a defined set of people who are responsible for governance, or is everyone taking a role? >> So, it is a two-pronged approach, we do have dedicated data stewards. There's approximately 15 across various lines of business throughout the company. But, we are building towards a data democratization aspect. So, we want people to be self-sufficient in finding the data that they need and understanding the data. And then, when they have questions, relying on our stewards as a network of subject matter experts who also have some authorizations to make changes and adapt the data as needed. >> Susan, one of the challenges that we see is that the chief data officers often times are not involved in some of these skunkworks AI projects. They're sort of either hidden, maybe not even hidden, but they're in the line of business, they're moving. You know, there's a mentality of move fast and break things. The challenge with AI is, if you start operationalizing AI and you're breaking things without data quality, without data governance, you can really affect lives. We've seen it. In one of these unintended consequences. I mean, Facebook is the obvious example and there are many, many others. But, are you seeing that? How are you seeing organizations dealing with that problem? >> As Blake was mentioning often times what it is about, you've got to start with transparency, and you got to start with collaborating across your lines of businesses, including the data scientists, and including in terms of what they are doing. And actually provide that level of transparency, provide a level of collaboration. And a lot of that is through the use of our technology enablers to basically go out and find where the data is and what people are using and to be able to provide a mechanism for them to collaborate in terms of, hey, how do I get access to that? I didn't realize you were the SME for that particular component. And then also, did you realize that there is a policy associated to the data that you're managing and it can't be shared externally or with certain consumer data sets. So, the objective really is around how to create a platform to ensure that any one in your organization, whether I'm in the line of business, that I don't have a technical background, or someone who does have a technical background, they can come and access and understand that information and connect with their peers. >> So you're helping them to discover the data. What do you do at that stage? >> What we do at that stage is, creating insights for anyone in the organization to understand it from an impact analysis perspective. So, for example, if I'm going to make changes, to as well as discovery. Where exactly is my information? And so we have-- >> Right. How do you help your customers discover that data? >> Through machine learning and artificial intelligence capabilities of our, specifically, our data catalog, that allows us to do that. So we use such things like similarity based matching which help us to identify. It doesn't have to be named, in miscellaneous text one, it could be named in that particular column name. But, in our ability to scan and discover we can identify in that column what is potentially social security number. It might have resided over years of having this data, but you may not realize that it's still stored there. Our ability to identify that and report that out to the data stewards as well as the data analysts, as well as to the privacy individuals is critical. So, with that being said, then they can actually identify the appropriate policies that need to be adhered to, alongside with it in terms of quality, in terms of, is there something that we need to archive. So that's where we're helping our customers in that aspect. >> So you can infer from the data, the meta data, and then, with a fair degree of accuracy, categorize it and automate that. >> Exactly. We've got a customer that actually ran this and they said that, you know, we took three people, three months to actually physically tag where all this information existed across something like 7,000 critical data elements. And, basically, after the set up and the scanning procedures, within seconds we were able to get within 90% precision. Because, again, we've dealt a lot with meta data. It's core to our artificial intelligence and machine learning. And it's core to how we built out our platforms to share that meta data, to do something with that meta data. It's not just about sharing the glossary and the definition information. We also want to automate and reduce the manual burden. Because we recognize with that scale, manual documentation, manual cataloging and tagging just, >> It doesn't work. >> It doesn't work. It doesn't scale. >> Humans are bad at it. >> They're horrible at it. >> So I presume you have a chief data officer at New York Life, is that correct? >> We have a chief data and analytics officer, yes. >> Okay, and you work within that group? >> Yes, that is correct. >> Do you report it to that? >> Yes, so-- >> And that individual, yeah, describe the organization. >> So that sits in our lines of business. Originally, our data governance office sat in technology. And then, our early 2018 we actually re-orged into the business under the chief data and analytics officer when that role was formed. So we sit under that group along with a data solutions and governance team that includes several of our data stewards and also some others, some data engineer-type roles. And then, our center for data science and analytics as well that contains a lot of our data science teams in that type of work. >> So in thinking about some of these, I was describing to Susan, as these skunkworks projects, is the data team, the chief data officer's team involved in those projects or is it sort of a, go run water through the pipes, get an MVP and then you guys come in. How does that all work? >> We're working to try to centralize that function as much as we can, because we do believe there's value in the left hand knowing what the right hand is doing in those types of things. So we're trying to build those communications channels and build that network of data consumers across the organization. >> It's hard right? >> It is. >> Because the line of business wants to move fast, and you're saying, hey, we can help. And they think you're going to slow them down, but in fact, you got to make the case and show the success because you're actually not going to slow them down to terms of the ultimate outcome. I think that's the case that you're trying to make, right? >> And that's one of the things that we try to really focus on and I think that's one of the advantages to us being embedded in the business under the CDAO role, is that we can then say our objectives are your objectives. We are here to add value and to align with what you're working on. We're not trying to slow you down or hinder you, we're really trying to bring more to the table and augment what you're already trying to achieve. >> Sometimes getting that organization right means everything, as we've seen. >> Absolutely. >> That's right. >> How are you applying governance discipline to unstructured data? >> That's actually something that's a little bit further down our road map, but one of the things that we have started doing is looking at our taxonomy's for structured data and aligning those with the taxonomy's that we're using to classify unstructured data. So, that's something we're in the early stages with, so that when we get to that process of looking at more of our unstructured content, we can, we already have a good feel for there's alignment between the way that we think about and organize those concepts. >> Have you identified automation tools that can help to bring structure to that unstructured data? >> Yes, we have. And there are several tools out there that we're continuing to investigate and look at. But, that's one of the key things that we're trying to achieve through this process is bringing structure to unstructured content. >> So, the conference. First year at the conference. >> Yes. >> Kind of key take aways, things that interesting to you, learnings? >> Oh, yes, well the number of CDO's that are here and what's top of mind for them. I mean, it ranges from, how do I stand up my operating model? We just had a session just about 30 minutes ago. A lot of questions around, how do I set up my organization structure? How do I stand up my operating model so that I could be flexible? To, right, the data scientists, to the folks that are more traditional in structured and trusted data. So, still these things are top-of-mind and because they're recognizing the market is also changing too. And the growing amount of expectations, not only solving business outcomes, but also regulatory compliance, privacy is also top-of-mind for a lot of customers. In terms of, how would I get started? And what's the appropriate structure and mechanism for doing so? So we're getting a lot of those types of questions as well. So, the good thing is many of us have had years of experience in this phase and the convergence of us being able to support our customers, not only in our principles around how we implement the framework, but also the technology is really coming together very nicely. >> Anything you'd add, Blake? >> I think it's really impressive to see the level of engagement with thought leaders and decision makers in the data space. You know, as Susan mentioned, we just got out of our session and really, by the end of it, it turned into more of an open discussion. There was just this kind of back and forth between the participants. And so it's really engaging to see that level of passion from such a distinguished group of individuals who are all kind of here to share thoughts and ideas. >> Well anytime you come to a conference, it's sort of any open forum like this, you learn a lot. When you're at MIT, it's like super-charged. With the big brains. >> Exactly, you feel it when you come on the campus. >> You feel smarter when you walk out of here. >> Exactly, I know. >> Well, guys, thanks so much for coming to theCUBE. It was great to have you. >> Thank you for having us. We appreciate it, thank you. >> You're welcome. All right, keep it right there everybody. Paul and I will be back with our next guest. You're watching theCUBE from MIT in Cambridge. We'll be right back. (techno music)

Published Date : Aug 2 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Susan Wilson is here, she's the vice president So, what are you leading at Informatica? and how to best help them; but more specifically the data governance lead. Right, so I lead the data governance capabilities and then, okay guys over to you. And a lot of the failed attempts, if you will, And the mind set from 1.0 to 2.0 to 3.0 has changed. Where, and I'm not sure it's ever going to flip back, But at the same time, Right, and we look at data governance So how do you do that? What does the quality of that data look like? and do what you need to do with it so what stages have you gone through in the culture of the company. in finding the data that they need is that the chief data officers often times and to be able to provide a mechanism What do you do at that stage? So, for example, if I'm going to make changes, How do you help your customers discover that data? and report that out to the data stewards and then, with a fair degree of accuracy, categorize it And it's core to how we built out our platforms It doesn't work. And that individual, And then, our early 2018 we actually re-orged is the data team, the chief data officer's team and build that network of data consumers but in fact, you got to make the case and show the success and to align with what you're working on. Sometimes getting that organization right but one of the things that we have started doing is bringing structure to unstructured content. So, the conference. And the growing amount of expectations, and decision makers in the data space. it's sort of any open forum like this, you learn a lot. when you come on the campus. Well, guys, thanks so much for coming to theCUBE. Thank you for having us. Paul and I will be back with our next guest.

ENTITIES

Entity	Category	Confidence
Paul Gillin	PERSON	0.99+
Susan	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Paul	PERSON	0.99+
Susan Wilson	PERSON	0.99+
Blake	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Cambridge	LOCATION	0.99+
Mark Ramsey	PERSON	0.99+
Blake Anders	PERSON	0.99+
three months	QUANTITY	0.99+
three people	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
New York Life	ORGANIZATION	0.99+
early 2018	DATE	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
First year	QUANTITY	0.99+
one	QUANTITY	0.99+
90%	QUANTITY	0.99+
two and half years	QUANTITY	0.98+
first	QUANTITY	0.98+
approximately 15	QUANTITY	0.98+
7,000 critical data elements	QUANTITY	0.97+
about two and half years	QUANTITY	0.97+
first year	QUANTITY	0.96+
two	QUANTITY	0.96+
about 30 minutes ago	DATE	0.96+
theCUBE	ORGANIZATION	0.95+
Blake Andrews	PERSON	0.95+
MIT Chief Data Officer and	EVENT	0.93+
MIT Chief Data Officer Information Quality Conference	EVENT	0.91+
EDW	ORGANIZATION	0.86+
last twelve months	DATE	0.86+
skunkworks	ORGANIZATION	0.85+
CDAO	ORGANIZATION	0.85+
this morning	DATE	0.83+
MIT	ORGANIZATION	0.83+
7 years ago	DATE	0.78+
year	QUANTITY	0.75+
Information Quality Symposium 2019	EVENT	0.74+
3.0	OTHER	0.66+
York Life	ORGANIZATION	0.66+
2.0	OTHER	0.59+
MIT CDOIQ 2019	EVENT	0.58+
half	QUANTITY	0.52+
Data 2.0	OTHER	0.52+
Data 3.0	TITLE	0.45+
1.0	OTHER	0.43+
Data	OTHER	0.21+

Robert Abate, Global IDS | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE. Covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. (futuristic music) >> Welcome back to Cambridge, Massachusetts everybody. You're watching theCUBE, the leader in live tech coverage. We go out to the events and we extract the signal from the noise. This is day two, we're sort of wrapping up the Chief Data Officer event. It's MIT CDOIQ, it started as an information quality event and with the ascendancy of big data the CDO emerged and really took center stage here. And it's interesting to know that it's kind of come full circle back to information quality. People are realizing all this data we have, you know the old saying, garbage in, garbage out. So the information quality worlds and this chief data officer world have really come colliding together. Robert Abate is here, he's the Vice President and CDO of Global IDS and also the co-chair of next year's, the 14th annual MIT CDOIQ. Robert, thanks for coming on. >> Oh, well thank you. >> Now you're a CDO by background, give us a little history of your career. >> Sure, sure. Well I started out with an Electrical Engineering degree and went into applications development. By 2000, I was leading the Ralph Lauren's IT, and I realized when Ralph Lauren hired me, he was getting ready to go public. And his problem was he had hired eight different accounting firms to do eight different divisions. And each of those eight divisions were reporting a number, but the big number didn't add up, so he couldn't go public. So he searched the industry to find somebody who could figure out the problem. Now I was, at the time, working in applications and had built this system called Service Oriented Architectures, a way of integrating applications. And I said, "Well I don't know if I could solve the problem, "but I'll give it a shot." And what I did was, just by taking each silo as it's own problem, which was what EID Accounting Firm had done, I was able to figure out that one of Ralph Lauren's policies was if you buy a garment, you can return it anytime, anywhere, forever, however long you own it. And he didn't think about that, but what that meant is somebody could go to a Bloomingdale's, buy a garment and then go to his outlet store and return it. Well, the cross channels were different systems. So the outlet stores were his own business, retail was a different business, there was a completely different, each one had their own AS/400, their own data. So what I quickly learned was, the problem wasn't the systems, the problem was the data. And it took me about two months to figure it out and he offered me a job, he said well, I was a consultant at the time, he says, "I'm offering you a job, you're going to run my IT." >> Great user experience but hard to count. >> (laughs) Hard to count. So that's when I, probably 1999 was when that happened. I went into data and started researching-- >> Sorry, so how long did it take you to figure that out? You said a couple of months? >> A couple of months, I think it was about two months. >> 'Cause jeez, it took Oracle what, 10 years to build Fusion with SOA? That's pretty good. (laughs) >> This was a little bit of luck. When we started integrating the applications we learned that the messages that we were sending back and forth didn't match, and we said, "Well that's impossible, it can't not match." But what didn't match was it was coming from one channel and being returned in another channel, and the returns showed here didn't balance with the returns on this side. So it was a data problem. >> So a forensics showdown. So what did you do after? >> After that I went into ICICI Bank which was a large bank in India who was trying to integrate their systems, and again, this was a data problem. But they heard me giving a talk at a conference on how SOA had solved the data challenge, and they said, "We're a bank with a wholesale, a retail, "and other divisions, "and we can't integrate the systems, can you?" I said, "Well yeah, I'd build a website "and make them web services and now what'll happen is "each of those'll kind of communicate." And I was at ICICI Bank for about six months in Mumbai, and finished that which was a success, came back and started consulting because now a lot of companies were really interested in this concept of Service Oriented Architectures. Back then when we first published on it, myself, Peter Aiken, and a gentleman named Joseph Burke published on it in 1996. The publisher didn't accept the book, it was a really interesting thing. We wrote the book called, "Services Based Architectures: A Way to Integrate Systems." And the way Wiley & Sons, or most publishers work is, they'll have three industry experts read your book and if they don't think what you're saying has any value, they, forget about it. So one guy said this is brilliant, one guy says, "These guys don't know what they're talking about," and the third guy says, "I don't even think what they're talking about is feasible." So they decided not to publish. Four years later it came back and said, "We want to publish the book," and Peter said, "You know what, they lost their chance." We were ahead of them by four years, they didn't understand the technology. So that was kind of cool. So from there I went into consulting, eventually took a position as the Head of Enterprise and Director of Enterprise Information Architecture with Walmart. And Walmart, as you know, is a huge entity, almost the size of the federal government. So to build an architecture that integrates Walmart would've been a challenge, a behemoth challenge, and I took it on with a phenomenal team. >> And when was this, like what timeframe? >> This was 2010, and by the end of 2010 we had presented an architecture to the CIO and the rest of the organization, and they came back to me about a week later and said, "Look, everybody agrees what you did was brilliant, "but nobody knows how to implement it. "So we're taking you away, "you're no longer Director of Information Architecture, "you're now Director of Enterprise Information Management. "Build it. "Prove that what you say you could do, you could do." So we built something called the Data CAFE, and CAFE was an acronym, it stood for: Collaborative Analytics Facility for the Enterprise. What we did was we took data from one of the divisions, because you didn't want to take on the whole beast, boil the ocean. We picked Sam's Club and we worked with their CFO, and because we had information about customers we were able to build a room with seven 80 inch monitors that surrounded anyone in the room. And in the center was the Cisco telecommunications so you could be a part of a meeting. >> The TelePresence. >> TelePresence. And we built one room in one facility, and one room in another facility, and we labeled the monitors, one red, one blue, one green, and we said, "There's got to be a way where we can build "data science so it's interactive, so somebody, "an executive could walk into the room, "touch the screen, and drill into features. "And in another room "the features would be changing simultaneously." And that's what we built. The room was brought up on Black Friday of 2013, and we were able to see the trends of sales on the East Coast that we quickly, the executives in the room, and these are the CEO of Walmart and the heads of Sam's Club and the like, they were able to change the distribution in the Mountain Time Zone and west time zones because of the sales on the East Coast gave them the idea, well these things are going to sell, and these things aren't. And they saw a tremendous increase in productivity. We received the 2014, my team received the 2014 Walmart Innovation Project of the Year. >> And that's no slouch. Walmart has always been heavily data-oriented. I don't know if it's urban legend or not, but the famous story in the '80s of the beer and the diapers, right? Walmart would position beer next to diapers, why would they do that? Well the father goes in to buy the diapers for the baby, picks up a six pack while he's on the way, so they just move those proximate to each other. (laughs) >> In terms of data, Walmart really learned that there's an advantage to understanding how to place items in places that, a path that you might take in a store, and knowing that path, they actually have a term for it, I believe it's called, I'm sorry, I forgot the name but it's-- >> Selling more stuff. (laughs) >> Yeah, it's selling more stuff. It's the way you position items on a shelf. And Walmart had the brilliance, or at least I thought it was brilliant, that they would make their vendors the data champion. So the vendor, let's say Procter & Gamble's a vendor, and they sell this one product the most. They would then be the champion for that aisle. Oh, it's called planogramming. So the planogramming, the way the shelves were organized, would be set up by Procter & Gamble for that entire area, working with all their other vendors. And so Walmart would give the data to them and say, "You do it." And what I was purporting was, well, we shouldn't just be giving the data away, we should be using that data. And that was the advent of that. From there I moved to Kimberly-Clark, I became Global Director of Enterprise Data Management and Analytics. Their challenge was they had different teams, there were four different instances of SAP around the globe. One for Latin America, one for North America called the Enterprise Edition, one for EMEA, Europe, Middle East, and Africa, and one for Asia-Pacific. Well when you have four different instances of SAP, that means your master data doesn't exist because the same thing that happens in this facility is different here. And every company faces this challenge. If they implement more than one of a system the specialty fields get used by different companies in different ways. >> The gold standard, the gold version. >> The golden version. So I built a team by bringing together all the different international teams, and created one team that was able to integrate best practices and standards around data governance, data quality. Built BI teams for each of the regions, and then a data science and advanced analytics team. >> Wow, so okay, so that makes you uniquely qualified to coach here at the conference. >> Oh, I don't know about that. (laughs) There are some real, there are some geniuses here. >> No but, I say that because these are your peeps. >> Yes, they are, they are. >> And so, you're a practitioner, this conference is all about practitioners talking to practitioners, it's content-heavy, There's not a lot of fluff. Lunches aren't sponsored, there's no lanyard sponsor and it's not like, you know, there's very subtle sponsor desks, you have to have sponsors 'cause otherwise the conference's not enabled, and you've got costs associated with it. But it's a very intimate event and I think you guys want to keep it that way. >> And I really believe you're dead-on. When you go to most industry conferences, the industry conferences, the sponsors, you know, change the format or are heavily into the format. Here you have industry thought leaders from all over the globe. CDOs of major Fortune 500 companies who are working with their peers and exchanging ideas. I've had conversations with a number of CDOs and the thought leadership at this conference, I've never seen this type of thought leadership in any conference. >> Yeah, I mean the percentage of presentations by practitioners, even when there's a vendor name, they have a practitioner, you know, internal practitioner presenting so it's 99.9% which is why people attend. We're moving venues next year, I understand. Just did a little tour of the new venue, so, going to be able to accommodate more attendees, so that's great. >> Yeah it is. >> So what are your objectives in thinking ahead a year from now? >> Well, you know, I'm taking over from my current peer, Dr. Arka Mukherjee, who just did a phenomenal job of finding speakers. People who are in the industry, who are presenting challenges, and allowing others to interact. So I hope could do a similar thing which is, find with my peers people who have real world challenges, bring them to the forum so they can be debated. On top of that, there are some amazing, you know, technology change is just so fast. One of the areas like big data I remember only five years ago the chart of big data vendors maybe had 50 people on it, now you would need the table to put all the vendors. >> Who's not a data vendor, you know? >> Who's not a data vendor? (laughs) So I would think the best thing we could do is, is find, just get all the CDOs and CDO-types into a room, and let us debate and talk about these points and issues. I've seen just some tremendous interactions, great questions, people giving advice to others. I've learned a lot here. >> And how about long term, where do you see this going? How many CDOs are there in the world, do you know? Is that a number that's known? >> That's a really interesting point because, you know, only five years ago there weren't that many CDOs to be called. And then Gartner four years ago or so put out an article saying, "Every company really should have a CDO." Not just for the purpose of advancing your data, and to Doug Laney's point that data is being monetized, there's a need to have someone responsible for information 'cause we're in the Information Age. And a CIO really is focused on infrastructure, making sure I've got my PCs, making sure I've got a LAN, I've got websites. The focus on data has really, because of the Information Age, has turned data into an asset. So organizations realize, if you utilize that asset, let me reverse this, if you don't use data as an asset, you will be out of business. I heard a quote, I don't know if it's true, "Only 10 years ago, 250 of the Fortune 10 no longer exists." >> Yeah, something like that, the turnover's amazing. >> Many of those companies were companies that decided not to make the change to be data-enabled, to make data decision processing. Companies still use data warehouses, they're always going to use them, and a warehouse is a rear-view mirror, it tells you what happened last week, last month, last year. But today's businesses work forward-looking. And just like driving a car, it'd be really hard to drive your car through a rear-view mirror. So what companies are doing today are saying, "Okay, let's start looking at this as forward-looking, "a prescriptive and predictive analytics, "rather than just what happened in the past." I'll give you an example. In a major company that is a supplier of consumer products, they were leading in the industry and their sales started to drop, and they didn't know why. Well, with a data science team, we were able to determine by pulling in data from the CDC, now these are sources that only 20 years ago nobody ever used to bring in data in the enterprise, now 60% of your data is external. So we brought in data from the CDC, we brought in data on maternal births from the national government, we brought in data from the Census Bureau, we brought in data from sources of advertising and targeted marketing towards mothers. Pulled all that data together and said, "Why are diaper sales down?" Well they were targeting the large regions of the country and putting ads in TV stations in New York and California, big population centers. Birth rates in population centers have declined. Birth rates in certain other regions, like the south, and the Bible Belt, if I can call it that, have increased. So by changing the marketing, their product sales went up. >> Advertising to Texas. >> Well, you know, and that brings to one of the points, I heard a lecture today about ethics. We made it a point at Walmart that if you ran a query that reduced a result to less than five people, we wouldn't allow you to see the result. Because, think about it, I could say, "What is my neighbor buying? "What are you buying?" So there's an ethical component to this as well. But that, you know, data is not political. Data is not chauvinistic. It doesn't discriminate, it just gives you facts. It's the interpretation of that that is hard CDOs, because we have to say to someone, "Look, this is the fact, and your 25 years "of experience in the business, "granted, is tremendous and it's needed, "but the facts are saying this, "and that would mean that the business "would have to change its direction." And it's hard for people to do, so it requires that. >> So whether it's called the chief data officer, whatever the data czar rubric is, the head of analytics, there's obviously the data quality component there whatever that is, this is the conference for, as I called them, your peeps, for that role in the organization. People often ask, "Will that role be around?" I think it's clear, it's solidifying. Yes, you see the chief digital officer emerging and there's a lot of tailwinds there, but the information quality component, the data architecture component, it's here to stay. And this is the premiere conference, the premiere event, that I know of anyway. There are a couple of others, perhaps, but it's great to see all the success. When I first came here in 2013 there were probably about 130 folks here. Today, I think there were 500 people registered almost. Next year, I think 600 is kind of the target, and I think it's very reasonable with the new space. So congratulations on all the success, and thank you for stepping up to the co-chair role, I really appreciate it. >> Well, let me tell you I thank you guys. You provide a voice at these IT conferences that we really need, and that is the ability to get the message out. That people do think and care, the industry is not thoughtless and heartless. With all the data breaches and everything going on there's a lot of fear, fear, loathing, and anticipation. But having your voice, kind of like ESPN and a sports show, gives the technology community, which is getting larger and larger by the day, a voice and we need that so, thank you. >> Well thank you, Robert. We appreciate that, it was great to have you on. Appreciate the time. >> Great to be here, thank you. >> All right, and thank you for watching. We'll be right back with out next guest as we wrap up day two of MIT CDOIQ. You're watching theCUBE. (futuristic music)

Published Date : Aug 1 2019

SUMMARY :

Brought to you by SiliconANGLE Media. and also the co-chair of next year's, give us a little history of your career. So he searched the industry to find somebody (laughs) Hard to count. 10 years to build Fusion with SOA? and the returns showed here So what did you do after? and the third guy says, And in the center was the Cisco telecommunications and the heads of Sam's Club and the like, Well the father goes in to buy the diapers for the baby, (laughs) So the planogramming, the way the shelves were organized, and created one team that was able to integrate so that makes you uniquely qualified to coach here There are some real, there are some geniuses here. and it's not like, you know, the industry conferences, the sponsors, you know, Yeah, I mean the percentage of presentations by One of the areas like big data I remember just get all the CDOs and CDO-types into a room, because of the Information Age, and the Bible Belt, if I can call it that, have increased. It's the interpretation of that that is hard CDOs, the data architecture component, it's here to stay. and that is the ability to get the message out. We appreciate that, it was great to have you on. All right, and thank you for watching.

ENTITIES

Entity	Category	Confidence
Peter	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Peter Aiken	PERSON	0.99+
Robert Abate	PERSON	0.99+
Robert	PERSON	0.99+
Procter & Gamble	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
India	LOCATION	0.99+
Mumbai	LOCATION	0.99+
Census Bureau	ORGANIZATION	0.99+
2010	DATE	0.99+
1996	DATE	0.99+
New York	LOCATION	0.99+
last week	DATE	0.99+
last year	DATE	0.99+
last month	DATE	0.99+
60%	QUANTITY	0.99+
Bloomingdale	ORGANIZATION	0.99+
Next year	DATE	0.99+
1999	DATE	0.99+
Texas	LOCATION	0.99+
25 years	QUANTITY	0.99+
10 years	QUANTITY	0.99+
one room	QUANTITY	0.99+
2014	DATE	0.99+
2013	DATE	0.99+
Doug Laney	PERSON	0.99+
Sam's Club	ORGANIZATION	0.99+
ICICI Bank	ORGANIZATION	0.99+
99.9%	QUANTITY	0.99+
Wiley & Sons	ORGANIZATION	0.99+
50 people	QUANTITY	0.99+
Arka Mukherjee	PERSON	0.99+
next year	DATE	0.99+
Jos	PERSON	0.99+
Today	DATE	0.99+
third guy	QUANTITY	0.99+
2000	DATE	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
500 people	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
one channel	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
each	QUANTITY	0.99+
One	QUANTITY	0.99+
CDC	ORGANIZATION	0.99+
less than five people	QUANTITY	0.99+
Ralph Lauren	ORGANIZATION	0.99+
one guy	QUANTITY	0.99+
six pack	QUANTITY	0.99+
ESPN	ORGANIZATION	0.99+
four years ago	DATE	0.98+
Africa	LOCATION	0.98+
SOA	TITLE	0.98+
five years ago	DATE	0.98+
California	LOCATION	0.98+
Gartner	ORGANIZATION	0.98+
three industry experts	QUANTITY	0.98+
Global IDS	ORGANIZATION	0.98+
Four years later	DATE	0.98+
600	QUANTITY	0.98+
20 years ago	DATE	0.98+
East Coast	LOCATION	0.98+
250	QUANTITY	0.98+
Middle East	LOCATION	0.98+
four years	QUANTITY	0.98+
one team	QUANTITY	0.97+
months	QUANTITY	0.97+
first	QUANTITY	0.97+
about two months	QUANTITY	0.97+
Latin America	LOCATION	0.97+

Andy Palmer, TAMR | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's the Cube covering M. I. T. Chief Data officer and Information Quality Symposium 2019 Brought to you by Silicon Angle Media >> Welcome back to M I. T. Everybody watching the Cube. The leader in live tech coverage we hear a Day two of the M I t chief data officer information Quality Conference Day Volonte with Paul Dillon. Andy Palmer's here. He's the co founder and CEO of Tamer. Good to see again. It's great to see it actually coming out. So I didn't ask this to Mike. I could kind of infirm from someone's dances. But why did you guys start >> Tamer? >> Well, it really started with an academic project that Mike was doing over at M. I. T. And I was over in of artists at the time. Is the chief get officer over there? And what we really found was that there were a lot of companies really suffering from data mastering as the primary bottleneck in their company did used great new tech like the vertical system that we've built and, you know, automated a lot of their warehousing and such. But the real bottleneck was getting lots of data integrated and mastered really, really >> quickly. Yeah, He took us through the sort of problems with obviously the d. W. In terms of scaling master data management and the scanning problems was Was that really the problem that you were trying to solve? >> Yeah, it really was. And when we started, I mean, it was like, seven years ago, eight years ago, now that we started the company and maybe almost 10 when we started working on the academic project, and at that time, people weren't really thinking are worried about that. They were still kind of digesting big data. A zit was called, but I think what Mike and I kind of felt was going on was that people were gonna get over the big data, Um, and the volume of data. And we're going to start worrying about the variety of the data and how to make the data cleaner and more organized. And, uh, I think I think way called that one pretty much right. Maybe >> we're a little >> bit early, but but I think now variety is the big problem >> with the other thing about your big day. Big data's oftentimes associated with Duke, which was a batch and then you sort of saw the shifter real time and spark was gonna fix all that. And so what are you seeing in terms of the trends in terms of how data is being used to drive almost near real time business decisions. >> You know, Mike and I came out really specifically back in 2007 and declared that we thought, uh, Hadoop and H D f s was going to be far less impactful than other people. >> 07 >> Yeah, Yeah. And Mike Mike actually was really aggressive and saying it was gonna be a disaster. And I think we've finally seen that actually play out of it now that the bloom is off the rose, so to speak. And so they're They're these fundamental things that big companies struggle with in terms of their data and, you know, cleaning it up and organizing it and making it, Iike want. Anybody that's worked at one of these big companies can tell you that the data that they get from most of their internal system sucks plain and simple, and so cleaning up that data, turning it into something it's an asset rather than liability is really what what tamers all about? And it's kind of our mission. We're out there to do this and it sort of pails and compare. Do you think about the amount of money that some of these companies have spent on systems like ASAP on you're like, Yeah, but all the data inside of the systems so bad and so, uh, ugly and unuseful like we're gonna fix that problem. >> So you're you're you're special sauce and machine learning. Where are you applying machine learning most most effectively when >> we apply machine learning to probably the least sexy problem on the planet. There are a lot of companies out there that use machine learning and a I t o do predictive algorithms and all kinds of cool stuff. All we do with machine learning is actually use it to clean up data and organize data. Get it ready for people to use a I I I started in the eye industry back in the late 19 eighties on, you know, really, I learned from the sky. Marvin Minsky and Mark Marvin taught me two things. First was garbage in garbage out. There's no algorithm that's worth anything unless you've got great data, and the 2nd 1 is it's always about the human in the machine working together. And I've really been working on those two same principles most of my career, and Tamer really brings both of those together. Our goal is to prepare data so that it can be used analytically inside of these companies, that it's actually high quality and useful. And the way we do that involves bringing together the machine, mostly these advanced machine learning algorithms with humans, subject matter experts inside of these companies that actually know all the ins and outs and all the intricacies of the data inside of their company. >> So say garbage in garbage out. If you don't have good training data course you're not going good ML model. How much how much upfront work is required. G. I know it was one of your customers and how much time is required to put together on ML model that can deal with 20,000,000 records like that? >> Well, you know, the amazing thing that this happened for us in the last five years, especially is that now we've got we've built enough models from scratch inside of these large global 2000 companies that very rarely do we go into a place where there we don't already have a model that's pre built. That they can use is a starting point. And I think that's the same thing that's happening in modeling in general. If you look a great companies like data robot Andi and even in in the Python community ml live that the accessibility of these modeling tools and the models themselves are actually so they're commoditized. And so most of our models and most of the projects we work on, we've already got a model. That's a starting point. We don't really have to start from scratch. >> You mentioned gonna ta I in the eighties Is that is the notion of a I Is it same as it was in the eighties and now we've just got the tooling, the horsepower, the data to take advantage of it is the concept changed? The >> math is all the same, like, you know, absolutely full stop, like there's really no new math. The two things I think that have changed our first. There's a lot more data that's available now, and, you know, uh, neural nets are a great example, right? in Marvin's things that, you know when you look at Google translate and how aggressively they used neural nets, it was the quantity of data that was available that actually made neural nets work. The second thing that that's that's changed is the cheap availability of Compute that Now the largest supercomputer in the world is available to rent by the minute. And so we've got all this data. You've got all this really cheap compute. And then third thing is what you alluded to earlier. The accessibility of all the math that now it's becoming so simple and easy to apply these math techniques, and they're becoming you know, it's It's almost to the point where the average data scientists not the advance With the average data, scientists can do a practice. Aye, aye. Techniques that 20 years ago required five PhDs. >> It's not surprising that Google, with its new neural net technology, all the search data that it has has been so successful. It's a surprise you that that Amazon with Alexa was able to compete so effectively. >> Oh, I think that I would never underestimate Amazon and their ability to, you know, build great tact. They've done some amazing work. One of my favorite Mike and I actually, one of our favorite examples in the last, uh, three years, they took their red shift system, you know, that competed with with Veronica and they they re implemented it and, you know, as a compiled system and it really runs incredibly fast. I mean, that that feat of engineering, what was truly exceptional >> to hear you say that Because it wasn't Red Shift originally Park. So yeah, that's right, Larry Ellison craps all over Red Shift because it's just open source offer that they just took and repackage. But you're saying they did some major engineering to Oh >> my gosh, yeah, It's like Mike and I both way Never. You know, we always compared par, excelled over tika, and, you know, we always knew we were better in a whole bunch of ways. But this this latest rewrite that they've done this compiled version like it's really good. >> So as a guy has been doing a eye for 30 years now, and it's really seeing it come into its own, a lot of a I project seems right now are sort of low hanging fruit is it's small scale stuff where you see a I in five years what kind of projects are going our bar company's gonna be undertaking and what kind of new applications are gonna come out of this? But >> I think we're at the very beginning of this cycle, and actually there's a lot more potential than has been realized. So I think we are in the pick the low hanging fruit kind of a thing. But some of the potential applications of A I are so much more impactful, especially as we modernize core infrastructure in the enterprise. So the enterprise is sort of living with this huge legacy burden. And we always air encouraging a tamer our customers to think of all their existing legacy systems is just dated generating machines and the faster they can get that data into a state where they can start doing state of the art A. I work on top of it, the better. And so really, you know, you gotta put the legacy burden aside and kind of draw this line in the sand so that as you really get, build their muscles on the A. I side that you can take advantage of that with all the data that they're generating every single day. >> Everything about these data repose. He's Enterprise Data Warehouse. You guys built better with MPP technology. Better data warehouses, the master data management stuff, the top down, you know, Enterprise data models, Dupin in big data, none of them really lived up to their promise, you know? Yeah, it's kind of somewhat unfair toe toe like the MPP guys because you said, Hey, we're just gonna run faster. And you did. But you didn't say you're gonna change the world and all that stuff, right? Where's e d? W? Did Do you feel like this next wave is actually gonna live up to the promise? >> I think the next phase is it's very logical. Like, you know, I know you're talking to Chris Lynch here in a minute, and you know what? They're doing it at scale and at scale and tamer. These companies are all in the same general area. That's kind of related to how do you take all this data and actually prepare it and turn it into something that's consumable really quickly and easily for all of these new data consumers in the enterprise and like so that that's the next logical phase in this process. Now, will this phase be the one that finally sort of meets the high expectations that were set 2030 years ago with enterprise data warehousing? I don't know, but we're certainly getting closer >> to I kind of hoped knockers, and we'll have less to do any other cool stuff that you see out there. That was a technology just >> I'm huge. I'm fanatical right now about health care. I think that the opportunity for health care to be transformed with technology is, you know, almost makes everything else look like chump change. What aspect of health care? Well, I think that the most obvious thing is that now, with the consumer sort of in the driver seat in healthcare, that technology companies that come in and provide consumer driven solutions that meet the needs of patients, regardless of how dysfunctional the health care system is, that's killer stuff. We had a great company here in Boston called Pill Pack was a great example of that where they just build something better for consumers, and it was so popular and so, you know, broadly adopted again again. Eventually, Amazon bought it for $1,000,000,000. But those kinds of things and health care Pill pack is just the beginning. There's lots and lots of those kinds of opportunities. >> Well, it's right. Healthcare's ripe for disruption on, and it hasn't been hit with the digital destruction. And neither is financialservices. Really? Certainly, defenses has not yet another. They're high risk industry, so Absolutely takes longer. Well, Andy, thanks so much for making the time. You know, You gotta run. Yeah. Yeah. Thank you. All right, keep it right. Everybody move back with our next guest right after this short break. You're watching the Cube from M I T c B O Q. Right back.

Published Date : Aug 1 2019

SUMMARY :

you by Silicon Angle Media But why did you guys start like the vertical system that we've built and, you know, the problem that you were trying to solve? now that we started the company and maybe almost 10 when we started working on the academic And so what are you seeing in terms of the trends in terms of how data that we thought, uh, Hadoop and H D f s was going to be far big companies struggle with in terms of their data and, you know, cleaning it up and organizing Where are you applying machine the eye industry back in the late 19 eighties on, you know, If you don't have good training data course And so most of our models and most of the projects we work on, we've already got a model. math is all the same, like, you know, absolutely full stop, like there's really no new math. It's a surprise you that that Amazon implemented it and, you know, as a compiled system and to hear you say that Because it wasn't Red Shift originally Park. we always compared par, excelled over tika, and, you know, we always knew we were better in a whole bunch of ways. And so really, you know, you gotta put the legacy of them really lived up to their promise, you know? That's kind of related to how do you take all this data and actually to I kind of hoped knockers, and we'll have less to do any other cool stuff that you see out health care to be transformed with technology is, you know, Well, Andy, thanks so much for making the time.

ENTITIES

Entity	Category	Confidence
Mike	PERSON	0.99+
Andy	PERSON	0.99+
Andy Palmer	PERSON	0.99+
Mark Marvin	PERSON	0.99+
2007	DATE	0.99+
Amazon	ORGANIZATION	0.99+
Paul Dillon	PERSON	0.99+
Boston	LOCATION	0.99+
$1,000,000,000	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
Marvin Minsky	PERSON	0.99+
Larry Ellison	PERSON	0.99+
First	QUANTITY	0.99+
both	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
second thing	QUANTITY	0.99+
third thing	QUANTITY	0.99+
20,000,000 records	QUANTITY	0.99+
two same principles	QUANTITY	0.99+
seven years ago	DATE	0.99+
eight years ago	DATE	0.99+
Mike Mike	PERSON	0.98+
three years	QUANTITY	0.98+
late 19 eighties	DATE	0.98+
first	QUANTITY	0.98+
five years	QUANTITY	0.98+
2030 years ago	DATE	0.98+
2nd 1	QUANTITY	0.98+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
two things	QUANTITY	0.97+
five PhDs	QUANTITY	0.97+
Day two	QUANTITY	0.97+
Veronica	PERSON	0.97+
M I. T.	PERSON	0.96+
Marvin	PERSON	0.96+
20 years ago	DATE	0.96+
Python	TITLE	0.96+
eighties	DATE	0.94+
2019	DATE	0.94+
2000 companies	QUANTITY	0.94+
Red Shift	TITLE	0.94+
Duke	ORGANIZATION	0.93+
Alexa	TITLE	0.91+
last five years	DATE	0.9+
M I t	EVENT	0.88+
almost 10	QUANTITY	0.87+
TAMR	PERSON	0.86+
Andi	PERSON	0.8+
M. I. T.	ORGANIZATION	0.79+
Tamer	ORGANIZATION	0.78+
Information Quality Symposium	EVENT	0.78+
Quality Conference Day Volonte	EVENT	0.77+
Tamer	PERSON	0.77+
Google translate	TITLE	0.75+
single day	QUANTITY	0.71+
H	PERSON	0.71+
Chief	PERSON	0.66+
Hadoop	PERSON	0.64+
MIT	ORGANIZATION	0.63+
Cube	ORGANIZATION	0.61+
more	QUANTITY	0.6+
M. I. T.	PERSON	0.57+
Pill pack	COMMERCIAL_ITEM	0.56+
Pill Pack	ORGANIZATION	0.53+
D f s	ORGANIZATION	0.48+
Park	TITLE	0.44+
CDOIQ	EVENT	0.32+
Cube	PERSON	0.27+

Aaron Kalb, Alation | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. (dramatic music) >> Welcome back to Cambridge, Massachusetts, everybody. This is theCUBE, the leader in live tech coverage. We go out to the events, and we extract the signal from then noise. And, we're here at the MIT CDOIQ, the Chief Data Officer conference. I'm Dave Vellante with my cohost Paul Gillin. Day two of our wall to wall coverage. Aaron Kalb is here. He's the cofounder and chief data officer of Alation. Aaron, thanks for making the time to come on. >> Thanks so much Dave and Paul for having me. >> You're welcome. So, words matter, you know, and we've been talking about data, and big data, and the three Vs, and data is the new oil, and all this stuff. You gave a talk this week about, you know, "We're maybe not talking the right language "when it comes to data." What did you mean by all that? >> Absolutely, so I get a little bit frustrated by some of these cliques we hear at conference after conference, and the one I, sort of, took aim at in this talk is, data is the new oil. I think what people want to invoke with that is to say, in the same way that oil powered the industrial age, data's powering the information age. Just saying, data's really cool and trendy and important. That's true, but there are a lot of other associations and contexts that people have with data, and some of them don't really apply as, I'm sorry, with oil. And, some of them apply, as well, to data. >> So, is data more valuable than oil? >> Well, I think they're each valuable in different ways, but I think there's a couple issues with the metaphor. One is that data is scarce and dwindling, and part of value comes from the fact that it's so rare. Whereas, the experience with data is that it's so plentiful and abundant, we're almost drowning in it. And so, what I contend is, instead of talking about data as compared to oil, we should talk about data compared to water. And, the idea is, you know, water is very plentiful on the planet, but sometimes, you know, if you have saltwater or contaminated water, you can't drink it. Water is good for different purposes, depending on its form, and so it's all about getting the right data for the right purpose, like water. >> Well, we've certainly, at least in my opinion, fought wars, Paul, over oil. >> And, over water. >> And, certainly, conflicts over water. Do you think we'll be fighting wars over data? Or, are we already? >> No, we might be. One of my favorite talks from the sessions here was a keynote by the CDO for the Department of Defense, who was talking about, you know, the civic duty about transparency but was observing that, actually, more IP addresses from China and Russia are looking at our public datasets than from within the country. So, you know, it's definitely a resource that can be very powerful. >> So, what was the reaction to your premise from the audience. What kind of questions did you get? >> You know, people actually responded very favorably, including some folks from the oil and gas industry, which I was pleased to find. We have a lot of customers in energy, so that was cool. But, what it was nice being here at MIT and just really geeking out about language and linguistics and data with a bunch of CDOs and other people who are, kind of, data intellectuals. >> Right, so if data is not the new oil. >> And, water isn't really a good analogy either, because the supply of water is finite. >> That's true. >> So, what is data? >> Yeah. >> Space? >> Yeah, it's a good point. >> Matter? >> Maybe it is like the universe in that it's always expanding, right, somehow. Right, because any thing, any physic which is on the planet probably won't be growing at that exponential speed. >> So, give us the punchline. >> Well, so I contend that water, while imperfect, is, actually, a really good metaphor that helps for a lot of things. It has properties like the fact that if it's a data quality issue, it flows downstream like pollution in a river. It's the fact that it can come in different forms, useful for different purposes. You might have gray water, right, which is good enough for, you know, irrigation or industrial purposes, but not safe to drink. And so, you rely on metadata to get the data that's in the right form. And, you know, the talk is more fun because you've a lot of visual examples that make this clear. >> Yeah, of course, yeah. >> I actually had one person in the audience say that he used a similar analogy in his own company, so it's fun to trade notes. >> So, chief data officer is a relatively new title for you, is it not? In terms of your role at Alation. >> Yeah, that's right, and the most fun thing about my job is being able to interact with all of the other CDOs and CDAOs at a conference like this. And, it was cool to see. I believe this conference doubled since the last year. Is that right? >> No. >> No, it's up about a hundred, though. >> Right. >> Well. >> And, it's about double from three years ago. >> And, when we first started, in 2013, yeah. >> 130 people, yeah. >> Yeah, it was a very small and intimate event. >> Yeah, here we're outgrowing this building, it seems. >> Yeah, they're kicking us out. >> I think what's interesting is, you know, if we do a little bit of analysis, this is a small data, within our own company, you know, our biggest and most visionary customers typically bought Alation. The buyer champion either was a CDO or they weren't a CDO when they bought the software and have since been promoted to be a CDO. And so, seeing this trend of more and more CDOs cropping up is really exciting for us. And also, just hearing all of the people at the conference saying, two trends we're hearing. A move from, sort of, infrastructure and technology to driving business value, and a move from defense and governance to, sort of, playing offense and doing revenue generation with data. Both of those trends are really exciting for us. >> So, don't hate me for asking this question, because what a lot of companies will do is, they'll give somebody a CDO title, and it's, kind of, a little bit of gimmick, right, to go to market. And, they'll drag you into sales, because I'm sure they do, as a cofounder. But, as well, I know CDOs at tech companies that are actually trying to apply new techniques, figure out how data contributes to their business, how they can cut costs, raise revenue. Do you have an internal role, as well? >> Absolutely, yeah. >> Explain that. >> So, Alation, you know, we're about 250 people, so we're not at the same scale as many of the attendees here. But, we want to learn, you know, from the best, and always apply everything that we learn internally as well. So, obviously, analytics, data science is a huge role in our internal operations. >> And so, what kinds of initiatives are you driving internally? Is it, sort of, cost initiatives, efficiency, innovation? >> Yeah, I think it's all of the above, right. Every single division and both in the, sort of, operational efficiency and cost cutting side as well as figuring out the next big bet to make, can be informed by data. And, our goal was to empower a curious and rational world, and our every decision be based not on the highest paid person's opinion, but on the best evidence possible. And so, you know, the goal of my function is largely to enable that both centrally and within each business unit. >> I want to talk to you about data catalogs a bit because it's a topic close to my heart. I've talked to a lot of data catalog companies over the last couple years, and it seems like, for one thing, the market's very crowded right now. It seems to me. Would you agree there are a lot of options out there? >> Yeah, you know, it's been interesting because when we started it, we were basically the first company to make this technology and to, kind of, use this term, data catalog, in this way. And, it's been validating to see, you know, a lot of big players and other startups even, kind of, coming to that terminology. But, yeah, it has gotten more crowded, and I think our customers who, or our prospects, used to ask us, you know, "What is it that you do? "Explain this catalog metaphor to me," are now saying, "Yeah, catalogs, heard about that." >> It doesn't need to be defined anymore. >> "Which one should I pick? "Why you?" Yeah. >> What distinguished one product from another, you know? What are the major differentiation points? >> Yeah, I think one thing that's interesting is, you know, my talk was about how the metaphors we use shape the way we think. And, I think there's a sense in which, kind of, the history of each company shapes their philosophy and their approach, so we've always been a data catalog company. That's our one product. Some of the other catalog vendors come from ETL background, so they're a lot more focused on technical metadata and infrastructure. Some of the catalog products grew out of governance, and so it's, sort of, governance first, no sorry, defense first and then offense secondary. So, I think that's one of the things, I think, we encourage our prospects to look at, is, kind of, the soul of the company and how that affects their decisions. The other thing is, of course, technology. And, what we at Alation are really excited about, and it's been validating to hear Gartner and others and a lot of the people here, like the GSK keynote speaker yesterday, talking about the importance of comprehensiveness and on taking a behavioral approach, right. We have our Behavioral IO technology that really says, "Let's not look at all the bits and the bytes, "but how are people using the data to drive results?" As our core differentiator. >> Do your customers generally standardize on one data catalog, or might they have multiple catalogs for multiple purposes? >> Yeah, you know, we heard a term more last season, of catalog of catalogs, you know. And, people here can get arbitrarily, you know, meta, meta, meta data, where we like to go there. I think the customers we see most successful tend to have one catalog that serves this function of the single source of reference. Many of our customers will say, you know, that their catalog serves as, sort of, their internal Google for data. Or, the one stop shop where you could find everything. Even though they may have many different sources, Typically you don't want to have siloed catalogs. It makes it harder to find what you're looking for. >> Let's play a little word association with some metaphors. Data lake. (laughter) >> Data lake's another one that I sort of hate. If you think about it, people had data warehouses and didn't love them, but at least, when you put something into a warehouse, you can get it out, right. If you throw something into a lake, you know, there's really no hope you're ever going to find it. It's probably not going to be in great shape, and we're not surprised to find that many folks who invested heavily in data lakes are now having to invest in a layer over it, to make it comprehensible and searchable. >> So, yeah, the lake is where we hide the stolen cars. Data swamp. >> Yeah, I mean, I think if your point is it's worse than lake, it works. But, I think we can do better a lake, right. >> How about data ocean? (laughter) >> You know, out of respect for John Furrier, I'll say it's fantastic. But, to us we think, you know, it isn't really about the size. The more data you have, people think the more data the better. It's actually the more data the worse unless you have a mechanism for finding the little bit of data that is relevant and useful for your task and put it to use. >> And to, want to set up, enter the catalog. So, technically, how does the catalog solve that problem? >> Totally, so if we think about, maybe let's go to the warehouse, for example. But, it works just as well on a data lake in practice. >> Yeah, cool. >> Through the catalog is. It starts with the inventory, you know, what's on every single shelf. But, if you think about what Amazon has done, they have the inventory warehouse in the back, but what you see as a consumer is a simple search interface, where you type in the word of the product you're looking for. And then, you see ranked suggestions for different items, you know, toasters, lamps, whatever, books I want to buy. Same thing for data. I can type in, you know, if I'm at the DOD, you know, information about aircraft, or information about, you know, drug discovery if I'm at GSK. And, I should be able to therefore see all of the different data sets that I have. And, that's true in almost any catalog, that you can do some search over the curated data sets there. With Alation in particular, what I can see is, who's using it, how are they using it, what are they joining it with, what results do they find in that process. And, that can really accelerate the pace of discovery. >> Go ahead. >> I'm sorry, Dave. To what degree can you automate some of that detail, like who's using it and what it's being used for. I mean, doesn't that rely on people curating the catalog? Or, to what degree can you automate that? >> Yeah, so it's a great question. I think, sometimes, there's a sense with AI or ML that it's like the computer is making the decisions or making things up. Which is, obviously, very scary. Usually, the training data comes from humans. So, our goal is to learn from humans in two ways. There's learning from humans where humans explicitly teach you. Somebody goes and says, "This is goal standard data versus this is, "you know, low quality data." And, they do that manually. But, there's also learning implicitly from people. So, in the same way on amazon.com, if I buy one item and then buy another, I'm doing that for my own purposes, but Amazon can do collaborative filtering over all of these trends and say, "You might want to buy this item." We can do a similar thing where we parse the query logs, parse the usage logs and be eye tools, and can basically watch what people are doing for their own purposes. Not to, you know, extra work on top of their job to help us. We can learn from that and make everybody more effective. >> Aaron, is data classification a part of all this? Again, when we started in the industry, data classification was a manual exercise. It's always been a challenge. Certainly, people have applied math to it. You've seen support vector machines and probabilistic latent cement tech indexing being used to classify data. Have we solved that problem, as an industry? Can you automate the classification of data on creation or use at this point in time? >> Well, one thing that came up in a few talks about AI and ML here is, regardless of the algorithm you're using, whether it's, you know, IFH or SVM, or something really modern and exciting that keeps learning. >> Stuff that's been around forever or, it's like you say, some new stuff, right. >> Yeah, you know, actually, I think it was said best by Michael Collins at the DOD, that data is more important than the algorithm because even the best algorithm is useless without really good training data. Plus, the algorithm's, kind of, everyone's got them. So, really often, training data is the limiting reactant in getting really good classification. One thing we try to do at Alation is create an upward spiral where maybe some data is curated manually, and then we can use that as a seed to make some suggestions about how to label other data. And then, it's easier to just do a confirm or deny of a guess than to actually manually label everything. So, then you get more training, get it faster, and it kind of accelerates that way instead of being a big burden. >> So, that's really the advancement in the last five to what, five, six years. Where you're able to use machine intelligence to, sort of, solve that problem as opposed to brute forcing it with some algorithm. Is that fair? >> Yeah, I think that's right, and I think what gets me very excited is when you can have these interactive loops where the human helps the computer, which helps the human. You get, again, this upward spiral. Instead of saying, "We have to have all of this, "you know, manual step done "before we even do the first step," or trying to have an algorithm brute force it without any human intervention. >> It's kind of like notes key mode on write, except it actually works. I'm just kidding to all my ADP friends. All right, Aaron, hey. Thanks very much for coming on theCUBE, but give your last word on the event. I think, is this your first one or no? >> This is our first time here. >> Yeah, okay. So, what are your thoughts? >> I think we'll be back. It's just so exciting to get people who are thinking really big about data but are also practitioners who are solving real business problems. And, just the exchange of ideas and best practices has been really inspiring for me. >> Yeah, that's great. >> Yeah. >> Well, thank you for the support of the event, and thanks for coming on theCUBE. It was great to see you again. >> Thanks Dave, thanks Paul. >> All right, you're welcome. >> Thank you, sir. >> All right, keep it right there, everybody. We'll be back with our next guest right after this short break. You're watching theCUBE from MIT CDOIQ. Be right back. (upbeat music)

Published Date : Aug 1 2019

SUMMARY :

brought to you by SiliconANGLE Media. Aaron, thanks for making the time to come on. and data is the new oil, and all this stuff. in the same way that oil powered the industrial age, And, the idea is, you know, water is very plentiful Well, we've certainly, at least in my opinion, Do you think we'll be fighting wars over data? So, you know, it's definitely a resource What kind of questions did you get? We have a lot of customers in energy, so that was cool. because the supply of water is finite. Maybe it is like the universe And, you know, the talk is more fun because you've a lot I actually had one person in the audience say So, chief data officer is a relatively Yeah, that's right, and the most fun thing I think what's interesting is, you know, And, they'll drag you into sales, But, we want to learn, you know, from the best, And so, you know, the goal of my function I want to talk to you about data catalogs a bit And, it's been validating to see, you know, "Which one should I pick? Yeah, I think one thing that's interesting is, you know, Or, the one stop shop where you could find everything. Data lake. when you put something into a warehouse, So, yeah, the lake is where we hide the stolen cars. But, I think we can do better a lake, right. But, to us we think, you know, So, technically, how does the catalog solve that problem? maybe let's go to the warehouse, for example. I can type in, you know, if I'm at the DOD, you know, Or, to what degree can you automate that? Not to, you know, extra work on top of their job to help us. Can you automate the classification of data whether it's, you know, IFH or SVM, or something it's like you say, some new stuff, right. Yeah, you know, actually, I think it was said best in the last five to what, five, six years. when you can have these interactive loops I'm just kidding to all my ADP friends. So, what are your thoughts? And, just the exchange of ideas It was great to see you again. We'll be back with our next guest

ENTITIES

Entity	Category	Confidence
Michael Collins	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Paul	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
2013	DATE	0.99+
Aaron Kalb	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Aaron	PERSON	0.99+
five	QUANTITY	0.99+
Department of Defense	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
John Furrier	PERSON	0.99+
amazon.com	ORGANIZATION	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Alation	PERSON	0.99+
Alation	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
one item	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
first step	QUANTITY	0.99+
last year	DATE	0.99+
GSK	ORGANIZATION	0.99+
both	QUANTITY	0.99+
DOD	ORGANIZATION	0.99+
one person	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
130 people	QUANTITY	0.98+
One	QUANTITY	0.98+
first time	QUANTITY	0.98+
MIT	ORGANIZATION	0.98+
one product	QUANTITY	0.97+
three years ago	DATE	0.97+
this week	DATE	0.97+
two	QUANTITY	0.97+
MIT CDOIQ	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.96+
one data catalog	QUANTITY	0.96+
each	QUANTITY	0.96+
each company	QUANTITY	0.95+
Both	QUANTITY	0.95+
one thing	QUANTITY	0.95+
first one	QUANTITY	0.94+
one catalog	QUANTITY	0.93+
two trends	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
first	QUANTITY	0.92+
first company	QUANTITY	0.92+
last couple years	DATE	0.92+
CDO	ORGANIZATION	0.91+
about a hundred	QUANTITY	0.91+
single shelf	QUANTITY	0.88+
about 250 people	QUANTITY	0.88+
single source	QUANTITY	0.87+
China	LOCATION	0.87+
2019	DATE	0.86+
Day two	QUANTITY	0.86+
one	QUANTITY	0.85+
each business unit	QUANTITY	0.82+
MIT CDOIQ	EVENT	0.79+
ADP	ORGANIZATION	0.79+
couple issues	QUANTITY	0.76+
Information Quality Symposium 2019	EVENT	0.76+
One thing	QUANTITY	0.7+
single division	QUANTITY	0.69+
one stop	QUANTITY	0.68+
Russia	LOCATION	0.64+
three	QUANTITY	0.61+
double	QUANTITY	0.59+
favorite	QUANTITY	0.5+
CDOIQ	EVENT	0.46+
Chief	PERSON	0.42+

Jeanne Ross, MIT CISR | MIT CDOIQ 2019

(techno music) >> From Cambridge, Massachusetts, it's theCUBE. Covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. >> Welcome back to MIT CDOIQ. The CDO Information Quality Conference. You're watching theCUBE, the leader in live tech coverage. My name is Dave Vellante. I'm here with my co-host, Paul Gillin. This is our day two of our two day coverage. Jean Ross is here. She's the principle research scientist at MIT CISR, Jean good to see you again. >> Nice to be here! >> Welcome back. Okay, what do all these acronyms stand for, I forget. MIT CISR. >> CISR which we pronounce scissor, is the Center for Information Systems Research. It's a research center that's been at MIT since 1974, studying how big companies use technology effectively. >> So and, what's your role as a research scientist? >> As a research scientist, I work with both researchers and with company leaders to understand what's going on out there, and try to present some simple succinct ideas about how companies can generate greater value from information technology. >> Well, I guess not much has changed in information technology since 1974. (laughing) So let's fast forward to the big, hot trend, digital transformation, digital business. What's the difference between a business and a digital business? >> Right now, you're hoping there's no difference for you and your business. >> (chuckling) Yeah, for sure. >> The main thing about a digital business is it's being inspired by technology. So in the past, we would establish a strategy, and then we would check out technology and say, okay, how can technology make us more effective with that strategy? Today, and this has been driven a lot by start-ups, we have to stop and say, well wait a minute, what is technology making possible? Because if we're not thinking about it, there sure are a lot of students at MIT who are, and we're going to miss the boat. We're going to get Ubered if you will, somebody's going to think of a value proposition that we should be offering and aren't, and we'll be left in the dust. So, our digital businesses are those that are recognizing the opportunities that digital technologies make possible. >> Now, and what about data? In terms of the role of digital business, it seems like that's an underpinning of a digital business. Is it not? >> Yeah, the single biggest capability that digital technologies provide, is ubiquitous data that's readily accessible anytime. So when we think about being inspired by technology, we could reframe that as inspired by the availability of ubiquitous data that's readily accessible. >> Your premise about the difference between digitization and digital business is interesting. It's more than just a sematic debate. Do companies now, when companies talk about digital transformation these days, in fact, are most of them of thinking of digitization rather than really transformative business change? >> Yeah, this is so interesting to me. In 2006, we wrote a book that said, you need to become more agile, and you need to rely on information technology to get you there. And these are basic things like SAP and salesforce.com and things like that. Just making sure that your core processes are disciplined and reliable and predictable. We said this in 2006. What we didn't know is that we were explaining digitization, which is very effective use of technology in your underlying process. Today, when somebody says to me, we're going digital, I'm thinking about the new value propositions, the implications of the data, right? And they're often actually saying they're finally doing what we thought they should do in 2006. The problem is, in 2006, we said get going on this, it's a long journey. This could take you six, 10 years to accomplish. And then we gave examples of companies that took six to 10 years. LEGO, and USAA and really great companies. And now, companies are going, "Ah, you know, we really ought to do that". They don't have six to 10 years. They get this done now, or they're in trouble, and it's still a really big deal. >> So how realistic is it? I mean, you've got big established companies that have got all these information silos, as we've been hearing for the last two days, just pulling their information together, knowing what they've got is a huge challenge for them. Meanwhile, you're competing with born on the web, digitally native start-ups that don't have any of that legacy, is it really feasible for these companies to reinvent themselves in the way you're talking about? Or should they just be buying the companies that have already done it? >> Well good luck with buying, because what happens is that when a company starts up, they can do anything, but they can't do it to scale. So most of these start-ups are going to have to sell themselves because they don't know anything about scale. And the problem is, the companies that want to buy them up know about the scale of big global companies but they don't know how to do this seamlessly because they didn't do the basic digitization. They relied on basically, a lot of heroes in their company to pull of the scale. So now they have to rely more on technology than they did in the past, but they still have a leg up if you will, on the start-up that doesn't want to worry about the discipline of scaling up a good idea. They'd rather just go off and have another good idea, right? They're perpetual entrepreneurs if you will. So if we look at the start-ups, they're not really your concern. Your concern is the very well run company, that's been around, knows how to be inspired by technology and now says, "Oh I see what you're capable of doing, "or should be capable of doing. "I think I'll move into your space". So this, the Amazon's, and the USAA's and the LEGO's who say "We're good at what we do, "and we could be doing more". We're watching Schneider Electric, Phillips's, Ferovial. These are big ole companies who get digital, and they are going to start moving into a lot of people's territory. >> So let's take the example of those incumbents that you've used as examples of companies that are leaning into digital, and presumably doing a good job of it, they've got a lot of legacy debt, as you know people call it technical debt. The question I have is how they're using machine intelligence. So if you think about Facebook, Amazon, Microsoft, Google, they own horizontal technologies around machine intelligence. The incumbents that you mentioned, do not. Now do they close the gap? They're not going to build their own A.I. They're going to buy it, and then apply it. It's how they apply it that's going to be the difference. So do you agree with that premise, and where are they getting it, do they have the skill sets to do it, how are they closing that gap? >> They're definitely partnering. When you say they're not going to build any of it, that's actually not quite true. They're going to build a lot around the edges. They'll rely on partners like Microsoft and Google to provide some of the core, >> Yes, right. >> But they are bringing in their own experts to take it to the, basically to the customer level. How do I take, let me just take Schneider Electric for an example. They have gone from being an electrical equipment manufacturer, to a purveyor of energy management solutions. It's quite a different value proposition. To do that, they need a lot of intelligence. Some of it is data analytics of old, and some of it is just better representation on dashboards and things like that. But there is a layer of intelligence that is new, and it is absolutely essential to them by relying on partners and their own expertise in what they do for customers, and then co-creating a fair amount with customers, they can do things that other companies cannot. >> And they're developing a software presumably, a SAS revenue stream as part of that, right? >> Yeah, absolutely. >> How about the innovators dilemma though, the problem that these companies often have grown up, they're very big, they're very profitable, they see disruption coming, but they are unable to make the change, their shareholders won't let them make the change, they know what they have to do, but they're simply not able to do it, and then they become paralyzed. Is there a -- I mean, looking at some of the companies you just mentioned, how did they get over that mindset? >> This is real leadership from CEO's, who basically explain to their boards and to their investors, this is our future, we are... we're either going this direction or we're going down. And they sell it. It's brilliant salesmanship, and it's why when we go out to study great companies, we don't have that many to choose from. I mean, they are hard to find, right? So you are at such a competitive advantage right now. If you understand, if your own internal processes are cleaned up and you know how to rely on the E.R.P's and the C.R.M's, to get that done, and on the other hand, you're using the intelligence to provide value propositions, that new technologies and data make possible, that is an incredibly powerful combination, but you have to invest. You have to convince your boards and your investors that it's a good idea, you have to change your talent internally, and the biggest surprise is, you have to convince your customers that they want something from you that they never wanted before. So you got a lot of work to do to pull this off. >> Right now, in today's economy, the economy is sort of lifting all boats. But as we saw when the .com implosion happened in 2001, often these breakdown gives birth to great, new companies. Do you see that the next recession, which is inevitably coming, will be sort of the turning point for some of these companies that can't change? >> It's a really good question. I do expect that there are going to be companies that don't make it. And I think that they will fail at different rates based on their, not just the economy, but their industry, and what competitors do, and things like that. But I do think we're going to see some companies fail. We're going to see many other companies understand that they are too complex. They are simply too complex. They cannot do things end to end and seamlessly and present a great customer experience, because they're doing everything. So we're going to see some pretty dramatic changes, we're going to see failure, it's a fair assumption that when we see the economy crash, it's also going to contribute, but that's, it's not the whole story. >> But when the .com blew up, you had the internet guys that actually had a business model to make money, and the guys that didn't, the guys that didn't went away, and then you also had the incumbents that embrace the internet, so when we came out of that .com downturn, you had the survivors, who was Google and eBay, and obviously Amazon, and then you had incumbent companies who had online retailing, and e-tailing and e-commerce etc, who thrived. I would suspect you're going to see something similar, but I wonder what you guys think. The street today is rewarding growth. And we got another near record high today after the rate cut yesterday. And so, but companies that aren't making money are getting rewarded, 'cause they're growing. Well when the recession comes, those guys are going to get crushed. >> Right. >> Yeah. >> And you're going to have these other companies emerge, and you'll see the winners, are going to be those ones who have truly digitized, not just talking the talk, or transformed really, to use your definition. That's what I would expect. I don't know, what do you think about that? >> I totally agree. And, I mean, we look at industries like retail, and they have been fundamentally transformed. There's still lots of opportunities for innovation, and we're going to see some winners that have kind of struggled early but not given up, and they're kind of finding their footing. But we're losing some. We're losing a lot, right? I think the surprise is that we thought digital was going to replace what we did. We'd stop going to stores, we'd stop reading books, we wouldn't have newspapers anymore. And it hasn't done that. Its only added, it hasn't taken anything away. >> It could-- >> I don't think the newspaper industry has been unscathed by digital. >> No, nor has retail. >> Nor has retail, right. >> No, no no, not unscathed, but here's the big challenge. Is if I could substitute, If I could move from newspaper to online, I'm fine. You don't get to do that. You add online to what you've got, right? And I think this right now is the big challenge. Is that nothing's gone away, at least yet. So we have to sustain the business we are, so that it can feed the business we want to be. And we have to make that transition into new capabilities. I would argue that established companies need to become very binary, that there are people that do nothing but sustain and make better and better and better, who they are. While others, are creating the new reality. You see this in auto companies by the way. They're creating not just the autonomous automobiles, but the mobility services, the whole new value propositions, that will become a bigger and bigger part of their revenue stream, but right now are tiny. >> So, here's the scary thing to me. And again, I'd love to hear your thoughts on this. And I've been an outspoken critic of Liz Warren's attack on big tech. >> Absolutely. >> I just think if they're breaking the law, and they're really acting like monopolies, the D.O.J and F.T.C should do something, but to me, you don't just break up big tech because they're good capitalists. Having said that, one of the things that scares me is, when you see Apple getting into payment systems, Amazon getting into grocery and logistics. Digital allows you to do something that's never happened before which is, you can traverse industries. >> Yep. >> Yeah, absolutely >> You used to have this stack of industries, and if you were in that industry, you're stuck in healthcare, you're stuck in financial services or whatever it was. And today, digital allows you to traverse those. >> It absolutely does. And so in theory, Amazon and Apple and Facebook and Google, they can attack virtually any industry and they kind of are. >> Yeah they kind are. I would certainly not break up anything. I would really look hard though at acquisitions, because I think that's where some of this is coming from. They can stop the overwhelming growth, but I do think you're right. That you get these opportunities from digital that are just so much easier because they're basically sharing information and technology, not building buildings and equipment and all that kind of thing. But I think there all limits to all this. I do not fear these companies. I think there, we need some law, we need some regulations, they're fine. They are adding a lot of value and the great companies, I mean, you look at the Schneider's and the Phillips, yeah they fear what some of them can do, but they're looking forward to what they provide underneath. >> Doesn't Cloud change the equation here? I mean, when you think of something like Amazon getting into the payments business, or Google in the payments business, you know it used to be that the creating of global payments processing network, just going global was a huge barrier to entry. Now, you don't have nearly that same level of impediment right? I mean the cloud eliminates much of the traditional barrier. >> Yeah, but I'll tell you what limits it, is complexity. Every company we've studied gets a little over anxious and becomes too complex, and they cannot run themselves effectively anymore. It happens to everyone. I mean, remember when we were terrified about what Microsoft was going to become? But then it got competition because it's trying to do so many things, and somebody else is offering, Sales Force and others, something simpler. And this will happen to every company that gets overly ambitious. Something simpler will come along, and everybody will go "Oh thank goodness". Something simpler. >> Well with Microsoft, I would argue two things. One is the D.O.J put some handcuffs on them , and two, with Steve Ballmer, I wouldn't get his nose out of Windows, and then finally stuck on a (mumbles) (laughter) >> Well it's they had a platform shift. >> Well this is exactly it. They will make those kind of calls . >> Sure, and I think that talks to their legacy, that they won't end up like Digital Equipment Corp or Wang and D.G, who just ignored the future and held onto the past. But I think, a colleague of ours, David Moschella wrote a book, it's called "Seeing Digital". And his premise was we're moving from a world of remote cloud services, to one where you have to, to use your word, ubiquitous digital services that you can access upon which you can build your business and new business models. I mean, the simplest example is Waves, you mentioned Uber. They're using Cloud, they're using OAuth.in with Google, Facebook or LinkedIn and they've got a security layer, there's an A.I layer, there's all your BlockChain, mobile, cognitive, it's all these sets of services that are now ubiquitous on which you're building, so you're leveraging, he calls it the matrix, to the extent that these companies that you're studying, these incumbents can leverage that matrix, they should be fine. >> Yes. >> The part of the problem is, they say "No, we're going to invent everything ourselves, we're going to build it all ourselves". To use Andy Jassy's term, it's non-differentiated heavy lifting, slows them down, but there's no reason why they can't tap that matrix, >> Absolutely >> And take advantage of it. Where I do get scared is, the Facebooks, Apples, Googles, Amazons, they're matrix companies, their data is at their core, and they get this. It's not like they're putting data around the core, data is the core. So your thoughts on that? I mean, it looks like your slide about disruption, it's coming. >> Yeah, yeah, yeah, yeah. >> No industry is safe. >> Yeah, well I'll go back to the complexity argument. We studied complexity at length, and complexity is a killer. And as we get too ambitious, and we're constantly looking for growth, we start doing things that create more and more tensions in our various lines of business, causes to create silos, that then we have to coordinate. I just think every single company that, no cloud is going to save us from this. It, complexity will kill us. And we have to keep reminding ourselves to limit that complexity, and we've just not seen the example of the company that got that right. Sooner or later, they just kind of chop them, you know, create problems for themselves. >> Well isn't that inherent though in growth? >> Absolutely! >> It's just like, big companies slow down. >> That's right. >> They can't make decisions as quickly. >> That's right. >> I haven't seen a big company yet that moves nimbly. >> Exactly, and that's the complexity thing-- >> Well wait a minute, what about AWS? They're a 40 billion dollar company. >> Oh yeah, yeah, yeah >> They're like the agile gorilla. >> Yeah, yeah, yeah. >> I mean, I think they're breaking the rule, and my argument would be, because they have data at their core, and they've got that, its a bromide, but that common data model, that they can apply now to virtually any business. You know, we're been expecting, a lot of people have been expecting that growth to attenuate. I mean it hasn't yet, we'll see. But they're like a 40 billion dollar firm-- >> No that's a good example yeah. >> So we'll see. And Microsoft, is the other one. Microsoft is demonstrating double digit growth. For such a large company, it's astounding. I wonder, if the law of large numbers is being challenged, so. >> Yeah, well it's interesting. I do think that what now constitutes "so big" that you're really going to struggle with the complexity. I think that has definitely been elevated a lot. But I still think there will be a point at which human beings can't handle-- >> They're getting away. >> Whatever level of complexity we reach, yeah. >> Well sure, right because even though this great new, it's your point. Cloud technology, you know, there's going to be something better that comes along. Even, I think Jassy might have said, If we had to do it all over again, we would have built the whole thing on lambda functions >> Yeah. >> Oh, yeah. >> Not on, you know so there you go. >> So maybe someone else does that-- >> Yeah, there you go. >> So now they've got their hybrid. >> Yeah, yeah. >> Yeah, absolutely. >> You know maybe it'll take another ten years, but well Jean, thanks so much for coming to theCUBE, >> it was great to have you. >> My pleasure! >> Appreciate you coming back. >> Really fun to talk. >> All right, keep right there everybody, Paul Gillin and Dave Villante, we'll be right back from MIT CDOIQ, you're watching theCUBE. (chuckles) (techno music)

Published Date : Aug 1 2019

SUMMARY :

brought to you by SiliconANGLE Media. Jean good to see you again. Okay, what do all these acronyms stand for, I forget. is the Center for Information Systems Research. to understand what's going on out there, So let's fast forward to the big, hot trend, for you and your business. We're going to get Ubered if you will, Now, and what about data? Yeah, the single biggest capability and digital business is interesting. information technology to get you there. to reinvent themselves in the way you're talking about? and they are going to start moving into It's how they apply it that's going to be the difference. They're going to build a lot around the edges. and it is absolutely essential to them I mean, looking at some of the companies you just mentioned, and the biggest surprise is, you have to convince often these breakdown gives birth to great, new companies. I do expect that there are going to be companies and then you also had the incumbents I don't know, what do you think about that? and they have been fundamentally transformed. I don't think the newspaper industry so that it can feed the business we want to be. So, here's the scary thing to me. but to me, you don't just break up big tech and if you were in that industry, they can attack virtually any industry and they kind of are. But I think there all limits to all this. I mean, when you think of something like and they cannot run themselves effectively anymore. One is the D.O.J put some handcuffs on them , Well this is exactly it. Sure, and I think that talks to their legacy, The part of the problem is, they say data is the core. that then we have to coordinate. Well wait a minute, what about AWS? that growth to attenuate. And Microsoft, is the other one. I do think that what now constitutes "so big" that you're there's going to be something better that comes along. Paul Gillin and Dave Villante,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
David Moschella	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Jean Ross	PERSON	0.99+
2006	DATE	0.99+
six	QUANTITY	0.99+
Steve Ballmer	PERSON	0.99+
Jeanne Ross	PERSON	0.99+
Liz Warren	PERSON	0.99+
LEGO	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Schneider Electric	ORGANIZATION	0.99+
Dave Villante	PERSON	0.99+
Amazons	ORGANIZATION	0.99+
Googles	ORGANIZATION	0.99+
Jean	PERSON	0.99+
Facebooks	ORGANIZATION	0.99+
Phillips	ORGANIZATION	0.99+
USAA	ORGANIZATION	0.99+
Center for Information Systems Research	ORGANIZATION	0.99+
Apples	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ferovial	ORGANIZATION	0.99+
Digital Equipment Corp	ORGANIZATION	0.99+
2001	DATE	0.99+
1974	DATE	0.99+
two day	QUANTITY	0.99+
two	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
D.O.J	ORGANIZATION	0.99+
yesterday	DATE	0.99+
eBay	ORGANIZATION	0.99+
40 billion dollar	QUANTITY	0.99+
MIT	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
today	DATE	0.99+
10 years	QUANTITY	0.99+
ten years	QUANTITY	0.99+
Today	DATE	0.99+
One	QUANTITY	0.99+
CISR	ORGANIZATION	0.98+
MIT CISR	ORGANIZATION	0.98+
Seeing Digital	TITLE	0.98+
two things	QUANTITY	0.98+
single	QUANTITY	0.97+
Ubered	ORGANIZATION	0.97+
LinkedIn	ORGANIZATION	0.97+
Windows	TITLE	0.96+
OAuth.in	TITLE	0.96+
one	QUANTITY	0.94+
Wang and D.G	ORGANIZATION	0.94+
CDO Information Quality Conference	EVENT	0.94+
D.O.J	PERSON	0.87+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for two shots: