Predictions 2022: Top Analysts See the Future of Data

(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)

Published Date : Jan 7 2022

SUMMARY :

and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well

ENTITIES

Entity	Category	Confidence
Dave Menninger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Doug Henschen	PERSON	0.99+
David	PERSON	0.99+
Brad Shimmin	PERSON	0.99+
Doug	PERSON	0.99+
Tony Baer	PERSON	0.99+
Dave Velannte	PERSON	0.99+
Tony	PERSON	0.99+
Carl	PERSON	0.99+
Brad	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2014	DATE	0.99+
Sanjeev Mohan	PERSON	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
January of 2022	DATE	0.99+
three	QUANTITY	0.99+
381 databases	QUANTITY	0.99+
IDC	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
2021	DATE	0.99+
Google	ORGANIZATION	0.99+
Omdia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
SanjMo	ORGANIZATION	0.99+
79%	QUANTITY	0.99+
second question	QUANTITY	0.99+
last week	DATE	0.99+
15 data stores	QUANTITY	0.99+
100%	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+

Red Hat AnsibleFest Panel 2021

(smooth upbeat music) >> Hello, everybody, John Walls here. Welcome to "theCUBE," in our continuing coverage of Ansible Fest 2021. We now welcome onto "theCUBE," three representatives from Red Hat. Joining us is Ashesh Badani. Who's the Senior Vice President of Products at Red Hat. Ashesh, thank you for joining us today. >> Thanks for having me, John. >> You bet. Also with us Stefanie Chiras, who is the Senior Vice President of the Platforms Business Group also at Red Hat. And Stefanie, how are you doing? >> Good, thanks, it's great to be here with you, John. >> Excellent, thanks for joining us. And last, but certainly not least, Joe Fitzgerald, who is the Vice President and General Manager of the Ansible Business Unit at Red Hat. Joe, good to see you today, thanks for being with us. >> Good to see you again John, thanks for having us. >> It's like, like the big three at Red Hat. I'm looking forward to this. Stefanie, let's just jump in with you and let's talk about what's going on in terms of automation in the hybrid cloud environment these days. A lot of people making that push, making their way in that direction. Everybody trying to drive more value out of the hybrid cloud environment. How is automation making that happen? How's it making it work? >> We have been focused at Red Hat for a number of years now on the value of open hybrid cloud. We really believe in the value of being able to give your applications flexibility, to use the best technology, where you want it, how you need it, and pulling all of that together. But core to that value proposition is making sure that it is consistent, it is secure and it is able to scale. And that's really where automation has become a core space. So as we continue to work our portfolio and our ecosystems and our partnerships to make sure that that open hybrid cloud has accessibility to everything that's new and relevant in this changing market we're in, the automation space that Ansible drives is really about making sure that it can be done in a way that is predictable. And that is really essential as you start to move your workloads around and start to leverage the diversity that an open hybrid cloud can deliver. >> When you're bringing this to a client, and Joe, perhaps you can weigh in on this as well. I would assume that as you're talking about automation, there's probably a lot of, successful head-nodding this way, but also some kind of this way too. There's a little bit of fear, right? And maybe just, they have these legacy systems, there's maybe a little distrust, I don't want to give away control, all these things. So how do you all answer those kinds of concerns when you're talking to the client about this great value that you can drive, but you got to get them there, right? You have to bring them along a bit. >> It's a great question, John, and look, everybody wants to get the hybrid cloud, as Stefanie mentioned. That journey is a little complicated. And if you had silos and challenges before you went to a hybrid cloud, you're going to have more when you got there. We work with a lot of customers, and what we see is this sort of shift from, I would call it low-level task automation to much more of a strategic focus on automation, but there's also the psychology of automation. One of the analysts recently did some research on that. And imagine just getting in your car and letting the car drive you down the street to work. People are still not quite comfortable with that level of automation, they sort of want to be able to trust, but verify, and maybe have their hands near the wheel. You couldn't take the wheel away from them. We see the same thing with automation. They need automation and a lot automation, or they need to be able to verify what it is doing, what they do, what it's going to do. And once they build that confidence, then they tend to do it at scale. And we're working with a lot of customers in that area. >> Joe, you're talking about a self-driving car, that'll never work, right? (laughs) You us bring an interesting point though. Again, I get that kind of surrendering control a little bit and Ashesh, I would assume in the product development world, that's very much your focus, right? You're looking for products that people, not only can use, but they're also comfortable with. That they can accept and they can integrate, and there's buy-in, not only on the engineering level, but also on the executive level. So maybe walk us through that product development, staging or phases, however you want to put it, that you go through in terms of developing products that you think people, not only need, but they'll also accept. >> I think that's absolutely right. You know, I think both Stefanie and Joe, led us off here. I talked about hybrid cloud and Joe, started talking about moving automation forward and getting people comfortable. I think a lot of this is, meeting customers where they are and then helping them get on the journey, right? So we're seeing that today, right? So traditional configuration management on premise, but at the same time, starting to think about, how do we take them out into the cloud, bringing greater automation to bear there. But so that's true for us across our existing customer base, as well as the new customers that we see out there. So doing that in a way that Joe talked about, right? Ensuring the trust, but verify is in play, is critical. And then there's another area which I'm sure we'll talk a little bit more about, right? Is ensuring that security implications are taken into account as we go through it. >> Well, let's just jump into security, that's one of the many considerations these days. About ensuring that you have the secure operation, you're doing some very complex tasks here, right? And you're blending multi-vendor environments and multi-domain environments. I mean you've got a lot, you're juggling a lot. So I guess to that extent, how much of a consideration is security and those multiple factors today, for you. And again, I don't know which one of the three of you might want to jump on this, but I would assume, this is a high priority, if not the highest priority, because of the headlines that security and those challenges are garnering these days. >> Well, there's the general security question and answer, right? So this is the whole, shift-left DevSecOps, sort of security concerns, but I think specific to this audience, perhaps I can turn over to Joe to talk a little bit about how Ansible has been playing in the security domain. >> Now, it's a great way to start, Ashesh. People are trying to shift left, which means move, sort of security earlier on in the process where people are thinking about it and development process, right? So we've worked with a lot of customers who were trying to do DevSecOps, right? And to provide security, automation capabilities during application build and deployment. Then on the operational side, you have this ongoing issue of some vulnerability gets identified, how fast can I secure my environment, right? There's a whole new area of security, orchestration, automation, or remediation that's involved, and the challenge people have is just like with networking or other areas, they've got dozens in some cases, hundreds of different systems across their enterprise that they have to integrate with, in order to be able to close a vulnerability, whether it's deploying a patch or closing a port, or changing firewall configuration, this is really complicated and they're being measured by, okay, there's this vulnerability, how fast can we get secure? And that comes down to automation, it has to. >> Now, Joe, you mentioned customers, if you would maybe elaborate a little bit about the customers that we've been hearing from on the stage, the virtual stage, if you will, at Ansible Fest this year and maybe summarize for our audience, what you're hearing from those customers, and some of those stories when we're talking about the actual use of the platform. >> Yeah, so Ansible Fest is our annual, automation event, right? For Ansible users. And I think it's really important to hear from the customers. We're vendors, we can tell you anything you want and try and get you to believe it. Customers they're actually doing stuff, right? And so, at Ansible Fest, we've got a great mix of customers that are really pushing the envelope. I'll give you one example, JP Morgan Chase. They're talking about how in their environment with focus over the past couple of years, they've now gotten to a level of maturity with automation, where they have over 50,000 people that are using Ansible automation. They've got a community of practice where they've got people in over twenty-two countries, right? That are sharing over 10,000 playbooks, right? I mean, they've taken automation strategically and embraced it and scaled it out at a level that most other organizations are envious of, right? Another one, and I'm not going to go through the list, but another one I'll mention is Discover, which sort of stepped back and looked at automation strategically and said, we need to elevate this to a strategic area for the company. And they started looking at across all different areas, not just IT automation, business process automation, on their other practices internally. And they're doing a presentation on how to basically analyze where you are today and how to take your automation initiatives forward in a strategic way. Those are usually important to other organizations that maybe aren't as far along or aren't on a scale of that motivation. >> Yeah, so Stefanie, I see you nodding your head and you're talking about, when Joe was just talking about assessment, right? You have to kind of see where are we, how mature are we on our journey right now? So maybe if you could elaborate on that a little bit, and some of the key considerations that you're seeing from businesses, from clients and potential clients, in terms of the kind of thought process they're going through on their journey, on their evolution. >> I think there's a lot of sort of values that customers are looking for when they're on their automation journey. I think efficiency is clearly one. I think one that ties back to the security discussion that we talked about. And I use the term consistency, but it's really about predictability. And I think I have a lot of conversations with customers that if they know that it's consistently deployed, particularly as we move out and are working with customers at the edge, how do they know that it's done the same way every time and that it's predictable? There's a ton of security and confidence built into that. And I think coming back to Joe's point, it is a journey providing transparency and visibility is step one, then taking action on that is then step two. And I think as we look at the customers who are on this automation journey, it's them understanding what's the value they're looking for? Are they looking for consistency in the deployments? Are they looking for efficiency across their deployments? Are they looking for ways to quickly migrate between areas in the open hybrid cloud? What is the value they're looking for? And then they look at how do they start to build in confidence in how they deliver that. And I think it starts with transparency. The next step is starting to move into taking action, and this is a space where Joe and the whole team, along with the community have really focused on pulling together things like collections, right? Playbooks that folks can count on and deploy. We've looked within the portfolio, we're leveraging the capabilities of this type of automation into our products itself with Red Hat enterprise Linux, we've introduced systems roles. And we're seeing a lot of by pulling in that Ansible capability directly into the product, it provides consistency of how it gets deployed and that delivers a ton of confidence to customers. >> So, Ashesh I mean, Stefanie was talking about, the customers and obviously developing, I guess, cultural acceptance and political acceptance, within the ranks there. Where are we headed here, past what know now in terms of the traditional applications and traditional automations and whatever. Kind of where is this going, if you would give me your crystal ball a bit about automation and what's going to happen here in the next 12-18 months. >> So what I'm going to do, John, is try to marry two ideas. So we talked about hybrid cloud, right? Stefanie started talking about joining a hybrid cloud. I'm going to marry automation with containers, right? On this journey of hybrid cloud, right? And give you two examples, both some successful progress we've been making on that front, right? Number one, especially for the group here, right? Check out the Ansible collection for Kubernetes, it's been updated for Python Three, of course, with the end-of-life for Python Two, but more important, right? It's the focus on improving performance for large automation tasks, right? Huge area where Ansible shines, then taking advantage of turbo mode, where instead of the default being a single connection to a Culebra API, for every request that's out there with turbo mode turned on, the API connection gets reused significantly and obviously improving performance. Huge other set of enhancements as well, right? So I think that's an interesting area for the Ansible community to leverage and obviously to grow. And the second one that I wanted to call out was just kind of the, again, back to this sort of your notion of the marriage of automation with containers, right? Is the work that's going on, on the front of the integration, the tight integration between Ansible as well as Red Hat's, advanced cluster management, right? Which is helping to manage Kubernetes clusters at scale. So now Red Hat's ACM technology can help our monthly trigger Ansible playbooks, upon key lifecycle actions that have happened. And so taking advantage of technologies like operators, again, core Kubernetes construct for the hybrid cloud environment. This integration between advanced cluster management and Ansible, allows for much more efficient execution of tasks, right? So I think that's really powerful. So wrapping that up, right? This world of hybrid cloud really can be brought together by just a tighter integration between working Ansible as well as the work that's going on on the container plant. >> Great, well, thank you. Ashesh, Stefanie, Joe, thank you all for sharing the time here. Part of our Ansible Fest coverage here, enjoy the conversation and continuous success at Red Hat. Thank you for the time today. >> Thank you so much John. >> Thank you. >> You bet. I'm joined here by three executives at Red Hat, talking about our Ansible Fest 2021 coverage. I'm John Walls, and you're watching "theCUBE." (bright music)

Published Date : Sep 16 2021

SUMMARY :

Who's the Senior Vice President of the Platforms Business to be here with you, John. of the Ansible Business Unit at Red Hat. Good to see you again in the hybrid cloud And that is really essential as you start and Joe, perhaps you can and letting the car drive but also on the executive level. on the journey, right? because of the headlines that security in the security domain. And that comes down to on the stage, the virtual And I think it's really important to hear and some of the key And I think coming back to Joe's point, in terms of the traditional applications for the Ansible community to for sharing the time here. I'm John Walls, and

ENTITIES

Entity	Category	Confidence
Stefanie	PERSON	0.99+
Joe	PERSON	0.99+
John	PERSON	0.99+
Joe Fitzgerald	PERSON	0.99+
Stefanie Chiras	PERSON	0.99+
Ashesh Badani	PERSON	0.99+
Ashesh	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Python Three	TITLE	0.99+
Python Two	TITLE	0.99+
two ideas	QUANTITY	0.99+
Ansible	ORGANIZATION	0.99+
over 10,000 playbooks	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
over twenty-two countries	QUANTITY	0.99+
two examples	QUANTITY	0.99+
over 50,000 people	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
three executives	QUANTITY	0.98+
this year	DATE	0.98+
three	QUANTITY	0.98+
three representatives	QUANTITY	0.97+
Ansible Fest 2021	EVENT	0.97+
dozens	QUANTITY	0.97+
Ansible Fest	EVENT	0.96+
theCUBE	ORGANIZATION	0.96+
second one	QUANTITY	0.96+
Platforms Business Group	ORGANIZATION	0.95+
one example	QUANTITY	0.95+
Red Hat	EVENT	0.94+
Linux	TITLE	0.94+
step two	QUANTITY	0.92+
step one	QUANTITY	0.91+
DevSecOps	TITLE	0.91+
single connection	QUANTITY	0.9+
Ansible Fest 2021	EVENT	0.89+
Discover	ORGANIZATION	0.88+
Ansible Business Unit	ORGANIZATION	0.86+
hundreds of different systems	QUANTITY	0.82+
Number one	QUANTITY	0.81+

Aileen Black, Collibra and Marco Temaner, U.S. Army | AWS PS Partner Awards 2021

>>Mhm. Yes one. >>Hello and welcome. Today's session of the 2021 AWS Global Public Sector Partner Awards. I am pleased to introduce our very next guests. Their names are a lean black S. V. P. Public sector at culebra and Marco Timon are Chief Enterprise Architect at the HQ. D. A. Office of business transformation at the U. S. Army. I'm your host Natalie ehrlich, we're going to be discussing the award for best partner transformation. Best data led migration. Thank you both for joining the program. >>Thank you for having us. >>Thank you. Glad to be here. >>Well, a lien, why is it important to have a data driven migration? >>You know, migrations to the cloud that are simply just a lift and ship does take advantage of the elasticity of the cloud but not really about how to innovate and leverage what truly the AWS cloud has to offer. Um so a data led migration allows agencies to truly innovate and really kind of almost reimagine how they make their mission objectives and how they leverage the cloud, you know, the government has, let's face it mountains of data, right? I mean every single day there's more and more data and you you can't pick up a trade magazine that doesn't talk about how data is the new currency or data is the new oil. Um, so you know, data to have value has to be usable, right? So you to turn your data into knowledge. You really need to have a robust data intelligence platform which allows agencies to find understand and trust or data data intelligence platform like culebra is the system of record for their data no matter where it may reside. Um no strategy is complete without a strong data, governments platform and security and privacy baked in from the very start, data has to be accessible to the average data. Citizen people need to be able to better collaborate to make data driven decisions. Organizations need to be united by data. This is how a technology and platform like cal Ibra really allows agencies to leverage the data as a strategic asset. >>Terrific. Well, why is it more important than ever to do this than ever before? >>Well, you know, there's just the innovation of technology like Ai and Ml truly to be truly leveraged. Um you know, they need to be able to have trust the data that they're using it. If it if the model is trained with only a small set of data, um it's not going to really produce the trusted results they want. ML models deliver faster results at scale, but the results can be only precise when data feeding them is of high quality. And let's say Gardner just came out with a study that said data quality is the number one obstacle for adoption of A. I. Um when good data and good models find a unified scalable platform with superior collaboration capabilities, you're A I. M. L. Opportunities to truly be leveraged and you can truly leverage data as a strategic asset. >>Terrific. Well marco what does the future look like for the army and data >>so and let me play off. Do you think that Allen said so in terms of the future um obviously data's uh as you mentioned the data volumes are growing enormously so. Part of the future has to do with dealing with those data volumes just from a straight >>technological >>perspective. But as the data volumes grow and as we have to react to things that we need to react to the military, we're not just trying to understand the quantity of data but what it is and not just the quality but the nature of it. So understanding authoritative nous. Being able to identify what data we need to solve certain problems or answer certain questions. I mean a major theme in terms of what we're doing with data governance and having a data governance platform and a data catalog is having immediate knowledge of what data is, where what quality and confidence we have in the data. Sometimes it's more important to have data that's approximately correct than truly correct as quickly as possible, you know. So not all data needs to be of perfect quality at all times you need to understand what's authoritative, what the quality is, how current the information is. So as the data volumes grow and grow and grow. Keeping up with that. Not just from the standpoint of can we scale we know how to scale pretty well in terms of containing data volume but keeping up what it is, the knowledge of the data itself, understand authoritative nous quality, providence etcetera, uh that's a whole enterprise to keep keeping up with and that's what we're doing right now with this, with this project. >>Yeah. And I'd like to also follow up with that, how has leveraging palabras data intelligence platform enabled the army to accelerate its overall mission. >>So there's uh there's sort of interplay between, you know, just having a technology does something doesn't mean you're going to use it to do that something, but often having a place to do work of governance, work of knowledge management can be the precipitating functions or the stimulus to do so. So it's not and if you build it they will come. But if you don't have a place to play ball, you're not going to play ball to kind of run with that metaphor. So having technology that can do these things is a precursor to being able to. But then of course we, as an organization have to do it. So the interplay between making a selection of technology and doing the implementation from a technical perspective that plays off of an urgency, we've made the decision to use a technology, so then that helped accelerate getting roles, responsibilities of our ceo of our missionary data. Officers of data Stewart's the folks that have to be doing the work. Um, when you educate system owners in cataloging and giving a central environment, the information is needed. If you say here's a place to put it, then it's very tangible, especially in the military where work is done in a very uh, concrete task based way. If you have a place to do things, then it's easier to tell people to do things. So the technology is great and works for us. But the choice to to move with the technology has then been a productive interplay with with the doing of the things that need to be done to take advantage of the technology, if that makes >>sense? Well, >>yeah, that's really great to hear. I mean, speaking of taking advantage of the technology, a lien can collaborate, help your other public sector customers take advantage of A. I and machine learning. >>Well, people need to be able to collaborate and take advantage of their most strategic asset data to make those data driven decisions. It gives them the agility to be able to act 2020 was a great lesson around the importance of having your data house in order. Let's face it, the pandemic, we watched organizations that, you know, had a strong data governance framework who had looked at and understood where their data were and they were very able to very quickly assess the situation in react and others were not in such a good situation. So, you know, being able to have that data governance framework, being able to have that data quality, being able to have the right information and being able to trust it allows people to be effective and quickly to react to situations >>fascinating. Um do you have any insight on that marco, would you like to weigh in? >>Well, definitely concur. Um I think our strategy, like I said has been to um use the technology to highlight the need to put governance into place and to focus on increasing data quality the data sources. And I would say this has also helped us uh I mean things that we weren't doing before that have to do with just educating the populace, you know all the way from the folks operators of systems to the most senior executives. Being conversant in the principles that we're talking about this whole discipline is a bit arcane and kind of back office and kind of I. T. But it's actually not. If you don't have the data to make, if you don't know where to get the data to make a decision then you're going to make a decision based on incorrect data and and you know that's pretty important in the military to not get wrong. So definitely concur and we're taking that approach as well. >>I'd like to take it one step further. If if you're speaking the same language then so if you have an understanding what the data governments framework is you can understand what the data is, where it is. Sometimes there's duplicate data and there's duplicate data for a reason, but understanding where it came from and what the linage is associated with, it really gives you the power of being able to shop for data and get the right information at the right time and give it the right perspective. And I think that's the power of what has laid the foundation for the work that the army and MArco has done to really set the stage for what they can do in the future. >>Terrific and marco, if you could comment a little bit about data storage ship and how it can positively dry future outcomes. >>Yeah, So um data stewardship for us um has a lot to do with the functional, so the people that were signing as a senior data Stewart's are the senior functional in the respective organizations, logistics, financial management, training, readiness, etcetera. So the idea of the folks who know really everything about those functional domains, um looking at things from the perspective of the data that's needed to support those functions, logistics, human resources, etcetera. Um and being, you know, call it the the most authoritative subject matter experts. So the governance that we're doing is coming much more from a functional perspective than a technical perspective, so that when a when a system is being built, if we're talking about data migration, if we're talking about somebody driving analytics, the knowledge that were associated with the data comes from the functional. So our data stewardship is less about the technical side and more about making sure that the understanding from functional perspective of what the data is for, what the provenance is, not from a technical perspective, but what it means in terms of sources of information, sources of personnel, sources of munitions et cetera um is available to the folks using it. So they basically know what it is. So the emphasis is on that functional infusion of knowledge into the metadata so that then people who are trying to use that day to have a way of understanding what it really is and what the meaning is. And that's what really what data stewardship means from were actually very good at stewarding data. From a technical perspective. We know how to run systems very well. We know how to scale, We're good at that, but making sure that people know what it is and why and when to use it. Um that's where it's maybe we have some catching up to do, which is what this efforts about. >>Terrific. Well, fantastic insights from you both. I really appreciate you taking the time uh to tell all our viewers about this. That was Eileen Black and Marco Timoner and that, of course, was our section for the AWS Global Public Partner Sector Awards. Thanks for watching. I'm your host, Natalie Early. Thank you. >>Yeah. Mm.

Published Date : Jun 22 2021

SUMMARY :

I am pleased to introduce our very next guests. Glad to be here. the elasticity of the cloud but not really about how to innovate and leverage Well, why is it more important than ever to do this than ever before? Um you know, they need to be able to have Well marco what does the future look like for the army and data Part of the future has to do with dealing with those data volumes just from a straight needs to be of perfect quality at all times you need to understand what's authoritative, enabled the army to accelerate its overall mission. doing of the things that need to be done to take advantage of the technology, if that makes I mean, speaking of taking advantage of the technology, Well, people need to be able to collaborate and take advantage of their most strategic asset Um do you have any insight on that marco, would you like to weigh in? that have to do with just educating the populace, you know all the way from the folks operators of systems from and what the linage is associated with, it really gives you the power of being able to shop for data Terrific and marco, if you could comment a little bit about data storage ship and the perspective of the data that's needed to support those functions, logistics, human resources, I really appreciate you taking the time uh to

ENTITIES

Entity	Category	Confidence
Eileen Black	PERSON	0.99+
Marco Timoner	PERSON	0.99+
Natalie ehrlich	PERSON	0.99+
Marco Timon	PERSON	0.99+
Natalie Early	PERSON	0.99+
Marco Temaner	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Aileen Black	PERSON	0.99+
Collibra	PERSON	0.99+
Stewart	PERSON	0.99+
Allen	PERSON	0.98+
U. S. Army	ORGANIZATION	0.98+
both	QUANTITY	0.98+
Gardner	PERSON	0.98+
Today	DATE	0.96+
2021 AWS Global Public Sector Partner Awards	EVENT	0.96+
MArco	ORGANIZATION	0.94+
pandemic	EVENT	0.93+
one	QUANTITY	0.93+
AWS Global Public Partner Sector Awards	EVENT	0.92+
S. V. P.	ORGANIZATION	0.9+
AWS	EVENT	0.9+
2020	DATE	0.86+
U.S. Army	ORGANIZATION	0.8+
single day	QUANTITY	0.75+
PS Partner Awards 2021	EVENT	0.75+
D. A. Office	ORGANIZATION	0.69+
culebra	ORGANIZATION	0.65+
Ai	ORGANIZATION	0.62+
cal	TITLE	0.62+
Ibra	TITLE	0.38+

Ankit Goel, Aravind Jagannathan, & Atif Malik

>>From around the globe. It's the cube covering data citizens. 21 brought to you by Colibra >>Welcome to the cubes coverage of Collibra data citizens 21. I'm Lisa Martin. I have three guests with me here today. Colibra customer Freddie Mac, please welcome JAG chief data officer and vice president of single family data and decisions. Jog. Welcome to the cube. >>Thank you, Lisa. Look forward to be, >>Uh, excellent on Kiko LSU as well. Vice president data transformation and analytics solution on Kay. Good to have you on the program. >>Thank you, Lisa. Great to be here and >>A teeth Malik senior director from the single family division at Freddie Mac is here as well. A team welcome. So we have big congratulations in order. Uh, pretty Mac was just announced at data citizens as the winners of the Colibra excellence award for data program of the year. Congratulations on that. We're going to unpack that. Talk about what that means, but I'd love to get familiar with the 3d Jack. Start with you. Talk to me a little bit about your background, your current role as chief data officer. >>Appreciate it, Lisa, thank you for the opportunity to share our story. Uh, my name is Arvind calls me Jack. And as you said, I'm just single-family chief data officer at Freddie Mac, but those that don't know, Freddie Mac is a Garland sponsored entity that supports the U S housing finance system and single family deals with the residential side of the marketplace, as CDO are responsible for our managed content data lineage, data governance, business architecture, which Cleaver plays a integral role, uh, in, in depth, that function as well as, uh, support our shared assets across the enterprise and our data monetization efforts, data, product execution, decision modeling, as well as our business intelligence capabilities, including AI and ML for various use cases as a background, starting my career in New York and then moved to Boston and last 20 years of living in the Northern Virginia DC area and fortunate to have been responsible for business operations, as well as led and, um, executed large transformation efforts. That background has reinforced the power of data and how, how it's so critical to meeting our business objectives. Look forward to our dialogue today, Lisa, once again. >>Excellent. You have a great background and clearly not a dull moment in your job with Freddy, Matt. And tell me a little bit about your background, your role, what you're doing at Freddie >>Mac. Definitely. Um, hi everyone. I'm,, I'm vice president of data transformation and analytics solutions. And I worked for JAG. I'm responsible for many of the things he said, including leading our transformation to the cloud and migrating all our existing data assets front of that transformation journey. I'm also responsible for our business information and business data architecture, decision modeling, business intelligence, and some of the analytics and artificial intelligence. I started my career back in the day as a computer engineer, but I've always been in the financial industry up in New York. And now in the Northern Virginia area, I called myself that bridge between business and technology. And I would say, I think over the last six years with data found that perfect spot where business and technology actually come together to solve real problems and, and really lead, um, you know, businesses to the next stage of, so thank you Lisa for the opportunity today. Excellent. >>And we're going to unpack you call yourself the bridge between business and it that's always such an important bridge. We're going to talk about that in just a minute, but I want to get your background, tell our audience about you. >>Uh, I'm Alec Malek, I'm senior director of business, data architecture, data transformation, and Freddie Mac. Uh, I'm responsible for the overall business data architecture and transformation of the existing data onto the cloud data lake. Uh, my team is responsible for the Kleberg platform and the business analysts that are using and maintaining the data in Libra and also driving the data architecture in close collaboration with our engineering teams. My background is I'm a engineer at heart. I still do a lot of development. This is my first time as of crossing over onto the bridge onto business side of maintaining data and working with data teams. >>Jan, let's talk about digital transformation. Freddie Mac is a 50 year old and growing company. I always love talking with established businesses about digital transformation. It's pretty challenging. Talk to me about your initial plan and what some of the main challenges were that you were looking to solve. >>Uh, great question, Lisa, and, uh, it's definitely pertinent as you say, in our digital world or figuring out how we need to accomplish it. If I look at our data, modernization is it is a major program and, uh, effort, uh, in, in our, in our division, what started as a reducing cost or looking at an infrastructure play, moving from physical data assets to the cloud, as well as enhancing our resiliency as quickly morphed into meeting business demand and objectives, whether it be for sourcing, servicing or securitization of our loan products. So where are we as we think about creating this digital data marketplace, we are, we are basically forming, empowering a new data ecosystem, which Columbia is definitely playing a major role. It's more than just a cloud native data lake, but it's bringing in some of our current assets and capabilities into this new data landscape. >>So as we think about creating an information hub, part of the challenges, as you say, 50 years of having millions of loans and millions of data across multiple assets, it's frigging out that you still have to care and feed legacy while you're building the new highway and figuring out how you best have to transform and translate and move data and assets to this new platform. What we've been striving for is looking at what is the business demand or what is the business use case, and what's the value to help prioritize that transformation. Exciting part is, as you think about new uses of acquiring and distribution of data, as well as news new use cases for prescriptive and predictive analytics, the power of what we're building in our daily, this new data ecosystem, we're feeling comfortable, we'll meet the business demand, but as any CTO will tell you demand is always, uh, outpaces our capacity. And that's why we want to be very diligent in terms of our execution plan. So we're very excited as to what we've accomplished so far this year and looking forward as we offered a remainder year. And as you go into 2022. Excellent, >>Thanks JAG. Uh, two books go to you. As I mentioned in the intro of that Freddie Mac has won the Culebra excellence award for data program of the year. Again, congratulations on that, but I'd love to understand the Kleber center of excellence that you're building at Freddie Mac. First of all, define what a center of excellence is to Freddie Mac and then what you're specifically building. Yeah, sure. >>So the Cleaver center of excellence provides us the overall framework from a people and process standpoint to focus in on our use of Colibra and for adopting best practices. Uh, we can have teams that are focused just on developing best practices and implementing workflows and lineage within Collibra and implementing and adopting a number of different aspects of Libra. It provides the central hub of people being domain experts on the tool that can then be leveraged by different groups within the organization to maintain, uh, the tool. >>Put another follow on question a T for you. How does Freddie Mac define, uh, dated citizens as anybody in finance or sales or marketing or operations? What does that definition of data citizen? >>It's really everyone it's within the organization. They all consume data in different ways and we provide a way of governing data and for them to get a better understanding of data from Collibra itself. So it's really everyone within the organization that way. >>Excellent. Okay. Let's go over to you a big topic at data citizens. 21 is collaboration. That's probably a word that we used a ton in the last 15 plus months or so it was every business really pivoted quickly to figure out how do we best collaborate. But something that you talked about in your intro is being the bridge between business and it, I want to understand from your perspective, how can data teams help to drive improved collaboration between business and it, >>The collaboration between business and technology have been a key focus area for us over the last few years, we actually started an agile transformation journey two years ago that we called modern delivery. And that was about moving away from project teams to persistent product teams that brought business and technology together. And we've really been able to pioneer that in the data space within Freddie Mac, where we have now teams with product owners coming from the data team and then full stack ID developers with them creating these combined teams to meet the business needs. We found that bringing these teams together really remove the barriers that were there in the interaction and the employee satisfaction has been high. And like you said, over the last 16 months with the pandemic, we've actually seen the productivity stay same or even go up because the teams were all working together, they work as a unit and they all have the sense of ownership versus working on a project that has a finite end date to fail. So we've, um, you know, we've been really lucky with having started this two years ago. Well, and >>That's great. And congratulations about either maintaining productivity or having it go up during the last 16 months, which had been incredibly challenging. Jack. I want to ask you what does winning this award from Collibra what does this mean to you and your team and does this signify that you're really establishing a data first culture? >>Great question, Lisa again. Um, I think winning the award, uh, just from a team standpoint, it's a great honor. Uh, Kleber has been a fantastic partner. And when I think about the journey of going from spread sheets, right, that all of us had in the past to now having all our business class returns lineage, and really being at the forefront of our data monetization. So as we think about moving to the cloud Beliebers step in step with us in terms of our integral part of that holistic delivery model, when I ultimately, as a CDO, it's really the team's honor and effort, cause this has been a multi-year journey to get here. And it's great that Libra as a, as a partner has helped us achieve some of these goals, but also recognized, um, where we are in terms of, uh, as looking at data as a product and some of our, um, leading forefront and using that holistic delivery, uh, to, uh, to meet our business objectives. So overall poorly jazzed when, uh, we've been found that we wanted the data program here at Collibra and very honored, um, uh, to, to win this award. That's >>Where we got to bring back I'm jazzed. I liked that jug sticking with you, let's unpack a little bit, some of those positive results, those business outcomes that you've seen so far from the data program. What are those? >>Yeah. So again, if you were thinking about a traditional CDO model, what were the terms that would have been used few years ago? It was around governance and may have been viewed as an oversight. Um, maybe less talking, um, monetization of what it was, the business values that you needed to accomplish collectively. It's really those three building blocks managing content. You got to trust the source, but ultimately it's empowering the business. So the best success that I could say at Freddy, as you're moving to this digital world, it's really empowering the business to figure out the new capabilities and demand and objectives that we're meeting. We're not going to be able to transform the mortgage industry. We're not going to be able or any, any industry, if we're still stuck in old world thinking, and ultimately data is going to be the blood that has to enable those capabilities. >>So if you tell me the business best success, we're no longer talking a okay, I got my data governance, what do we have to do? It's all embedded together. And as I alluded to that partnership between business and it informing that data is a product where you now you're delivering capabilities holistically from program teams all across data. It's no longer an afterthought. As I said, a few minutes ago, you're able to then meet the demand what's current. And how do we want to think about going forward? So it's no longer buzzwords of digital data marketplace. What is the value of that? And that's what the success, I think if our group collectively working across the organization, it's just not one team it's across the organization. Um, and we have our partners, our operations, everyone from business owners, all swimming in the same direction with, and I would say critical management support. So top of the house, our, our head of business, my, my boss was the COO full supportive in terms of how we're trying to execute and I've makes us, um, it's critical because when there is a potential, trade-offs, we're all looking at it collectively as an organization, >>Right. And that's the best viewpoint to have is that sort of centralized unified vision. And as you say, JAG, the support from, from up top, uh, I'd see if I want to ask you, you establish the Culebra center of excellence. What are you focused on now? >>So we really focused in allowing our users to consume data and understand data and really democratizing data so that they can really get a better understanding of that. So that's a lot of our focus and engaging with Collibra and getting them to start to define things in Colibra law form. That's a lot of focus right now. >>Excellent. Want to stay with you one more question and take that I'm gonna ask to all of you, what are you most excited about a lot of success that you've talked about transforming a legacy institution? What are you most excited about and what are the next steps for the data program? Uh, teak what's are your thoughts? >>Yeah, so really modernizing onto, uh, onto a cloud data lake and allowing all of the users and, uh, Freddie Mac to consume data with the level of governance that we need around. It is a exciting proposition for me. >>What would you say is most exciting to you? >>I'm really looking forward to the opportunities that artificial intelligence has to offer, not just in the augmented analytics space, but in the overall data management life cycle. There's still a lot of things that are manual in the data management space. And, uh, I personally believe, uh, artificial intelligence has a huge role to play there. And Jackson >>Question to you, it seems like you have a really strong collaborative team. You have a very collaborative relationship with management and with Collibra, what are you excited about? What's coming down the pipe. >>So Lisa, if I look at it, you know, we sit back here June, 2021, where were we a year ago? And you think about a lot of the capabilities and some of the advancements that we may just in a year sitting virtually using that word jazzed or induced or feeling really great about. We made a lot of accomplishments. I'm excited what we're going to be doing for the next year. So there's other use cases, and I could talk about AIML and OCHA talks about, you know, our new ecosystem. Seeing those use cases come to fruition so that we're, we are contributing to value from a business standpoint. The organization is what really keeps me up. Uh, keeps me up at night. It gets me up in the morning and I'm really feeling dues for the entire division. Excellent. >>Well, thank you. I want to thank all three of you for joining me today. Talking about the successes that Freddie Mac has had transforming in partnership with Colibra again, congratulations on the Culebra excellence award for the data program. It's been a pleasure talking to all three of you. I'm Lisa Martin. You're watching the cubes coverage of Collibra data citizens 21.

Published Date : Jun 17 2021

SUMMARY :

21 brought to you by Colibra Welcome to the cubes coverage of Collibra data citizens 21. Good to have you on the program. but I'd love to get familiar with the 3d Jack. has reinforced the power of data and how, how it's so critical to And tell me a little bit about your background, your role, what you're doing at Freddie to solve real problems and, and really lead, um, you know, businesses to the next stage of, We're going to talk about that in just a minute, but I want to get your background, tell our audience about you. Uh, I'm responsible for the overall business data architecture and transformation Talk to me about your initial plan and what some of the main challenges were that Uh, great question, Lisa, and, uh, it's definitely pertinent as you say, building the new highway and figuring out how you best have to transform and translate As I mentioned in the intro of that Freddie Mac has won So the Cleaver center of excellence provides us the overall framework from a people What does that definition of data citizen? So it's really everyone within the organization is being the bridge between business and it, I want to understand from your perspective, over the last 16 months with the pandemic, we've actually seen the productivity this award from Collibra what does this mean to you and your team and the past to now having all our business class returns lineage, I liked that jug sticking with you, let's unpack a little bit, it's really empowering the business to figure out the new capabilities and demand and objectives that we're meeting. And as I alluded to And as you say, JAG, the support from, from up top, uh, I'd see if I want to ask you, So that's a lot of our focus and engaging with Collibra and getting them to Want to stay with you one more question and take that I'm gonna ask to all of you, what are you most excited all of the users and, uh, Freddie Mac to consume data with the I'm really looking forward to the opportunities that artificial intelligence has to offer, with Collibra, what are you excited about? So Lisa, if I look at it, you know, we sit back here June, 2021, where were we a year ago? congratulations on the Culebra excellence award for the data program.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Atif Malik	PERSON	0.99+
Lisa	PERSON	0.99+
Alec Malek	PERSON	0.99+
June, 2021	DATE	0.99+
Boston	LOCATION	0.99+
Ankit Goel	PERSON	0.99+
New York	LOCATION	0.99+
Jack	PERSON	0.99+
Freddie Mac	ORGANIZATION	0.99+
50 years	QUANTITY	0.99+
Arvind	PERSON	0.99+
Aravind Jagannathan	PERSON	0.99+
JAG	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2022	DATE	0.99+
Kay	PERSON	0.99+
Jackson	PERSON	0.99+
two books	QUANTITY	0.99+
Matt	PERSON	0.99+
Northern Virginia DC	LOCATION	0.99+
Freddie	ORGANIZATION	0.99+
Northern Virginia	LOCATION	0.99+
three guests	QUANTITY	0.99+
today	DATE	0.99+
next year	DATE	0.99+
two years ago	DATE	0.99+
a year ago	DATE	0.98+
Colibra	TITLE	0.98+
first time	QUANTITY	0.98+
this year	DATE	0.97+
Freddy	ORGANIZATION	0.97+
pandemic	EVENT	0.97+
OCHA	ORGANIZATION	0.97+
three	QUANTITY	0.97+
three building blocks	QUANTITY	0.97+
Kleber	ORGANIZATION	0.96+
CDO	ORGANIZATION	0.96+
Freddy	PERSON	0.94+
last 16 months	DATE	0.94+
Mac	ORGANIZATION	0.94+
Colibra	ORGANIZATION	0.93+
one more question	QUANTITY	0.93+
First	QUANTITY	0.93+
50 year old	QUANTITY	0.92+
Kleber	PERSON	0.91+
millions of data	QUANTITY	0.9+
millions of loans	QUANTITY	0.9+
single	QUANTITY	0.89+
few years ago	DATE	0.89+
AIML	ORGANIZATION	0.86+
Culebra excellence award	TITLE	0.85+
Cleaver	PERSON	0.83+
one team	QUANTITY	0.83+
few minutes ago	DATE	0.82+
Freddie Mac	ORGANIZATION	0.81+
3d	QUANTITY	0.81+
Culebra	ORGANIZATION	0.8+
Libra	TITLE	0.8+
U	LOCATION	0.8+
last six years	DATE	0.78+
Garland	ORGANIZATION	0.78+
Columbia	LOCATION	0.74+
Malik	PERSON	0.74+
Kleberg	ORGANIZATION	0.73+
Libra	ORGANIZATION	0.72+

Jacklyn Osborne, Bank Of America | Collibra Data Citizens'21

>>from >>around the globe. >>It's the cube >>covering Data Citizens >>21 brought to you by culebra. >>Well how everybody john Wallace here as we continue our coverage here on the cube of Data Citizens 21 it is a pleasure of ours to welcome in an award winner here at Data Citizens 21 were with Jacqueline Osborne who is the Managing Director and risk and Finance technology executive at Bank of America and she is also the data citizen of the Year, one of the culebra Excellence Award winners. And Jacqueline congratulations on the honor. Well deserved, I'm sure. >>Thank you so much. It is a true honor and I am so happy to be here and I'm looking forward to our conversation today. Yeah, what is it? >>It's all about just the concept of being a data citizen um in your mind um what is all that about? What are those pillars in terms of being a good data sets? And that gets to the point that you are the data citizen of the year? >>I think that's such a good question and actually is something that I don't even know if I know everything because it's constantly evolving. Being a data citizen yesterday is not what it is today and it's not what it means tomorrow because this field is evolving but with that said I think to me being a data citizen is being is having that awareness that data matters is driving to that data as an asset and really trying to lay the foundation to ensure its the right data in the right place at the right time. >>Yeah, let's talk about that high wire act because it's becoming increasingly more complex as you know, you've been in This realm if you will for what 15 years now? I believe it has evolved dramatically right in terms of capabilities but also complexity. So let's talk about that, about making finding the relevance of data and delivering it on time to the right people within your organization. >>How much >>More challenging is that now than it was maybe five or just you know, 10 years ago? >>I mean it's kind of crazy. There are some areas that make it so much easier and then for your question in some areas that make it so much harder. But if I can, let's start with the easier because I think this is something that really is important is when I started this, I've been in data my entire professional career, I've been achieved data officer since 2013. Um and when I started, I used to joke that I was a used car salesman. I was selling selling something this idea of data quality, data governance that nobody wanted. But now, so the shift of your question is the good if I am now a luxury car salesman selling a product that everybody wants, but shift to the bad nobody wants to pay for. So the complexity of it as data becomes bigger as we talk about big data and unstructured data and social media and facebook feeds. That is hard. It is complex. And the ability to truly manage and govern data to the degree of that perfection is really hard. So the more data we get, the more complexity, the more challenge, the more there is a need to really prioritize align with business strategy and ensure that you are embedding into the culture and the DNA of the corporate and not do it in the silos. >>You know, delivering that data to in the secure environment obviously, critically important for any enterprise, but even more so to put a finer point on financial services in terms of your work in that regard. So, so let's add that layer into this to not only internal, all the communication you have to do in the collaboration, you have to have but you have these external stakeholders to write, you have me, you know, a boa client if you will um that you've got to be aware of and have to communicate with. So so let's talk about that, that kind of merger if you will of not only having to work internally but also externally and making sure that with all the data you've got now that it works >>indeed. And you're kind of moving towards this new one of the newer dimensions, which is privacy, I mean G D p R was the first regulation in the UK, but now you have the C C P A and the California and it's coming and that that right to be forgotten or more importantly, as you said, as a customer of financial issues, that right to understand where your data is is very important because customers do want to know that their information is understood, trusted protected and going to be taken care of. So that ability to really transform back that you have a solid basis and that you are taking the measures and the necessary steps to ensure that that data is air quotes govern is so important. And it really again that shift from that used car salesman to a luxury car salesman. Your question is another example of how that shift is happening. It's no longer a should do or could do. Data governance is really becoming a must do and why you are seeing so many more. Chief data officers. Chief analytics officers, data management professionals. The profession is growing. I mean, incrementally every single day. >>What about the balancing act that you do? Let's just do with the internal audiences that you have to contend with. I shouldn't say content, content has that pejorative term to that you that you that you deal with, you collaborate with. Um you know, governance is also critically important because you want to make data available to the right people at the right time, but only the right people. Right. So what kind of practices or procedures are you putting in a place at B. O. A to make sure that that data is delivered to the right folks, but only to the right people and trying I guess to educate people within your organization as to the need for these strict governance processes. >>Sure. I tend to refer to them as the foundational pillars and if I was to take a step back and say what they are and how we use them. So the first one is metadata management and it is really around that. What data do you have? It's that understanding the information. So I used to refer to it or I still refer to it as when we were going to the library and you used to have to look at the card catalog That metadata manages very similar to the card catalogue for books. It tells you all the information. What's the genre? Who's the author, what the section is, where it is in the library and that is a core pieces. If you don't understand your data you can govern it. So that's kind of Pillar one. Metadata management. Pillar two is what's often referred to his data lineage. But I do think the new buzzword is that a providence? It's really that access low. It's understanding where data comes from the movement along the journey and where it's going. If you don't understand that horizontal front to back you can't govern the information as well because it can be changing hands, it can be altering and so it's that that end to end look at things. This pillar to pillar three is data quality and that's really that measurement of is it the right data and it is made up of a series of data quality dimensions, accuracy, completeness, validity, timeliness, conformity, reasonable nous etcetera. And it's really that fit freezes the data that I have the right data as I said earlier and then last but not least is issue management. At the end of the day there will be problems, there is too much data. It is in too many hands. So it's not we're not trying to remove all data issues but having a process where you can actually log prioritized and ultimately remediate is that that last and final pillar of the data management I would call circle because it has to all come back together and it's rinse and repeat. >>Yeah. And and so you you raise a point, a great point about things are going to go wrong. You know, eventually something happens. We know nothing is foolproof, nothing is bulletproof. Uh and we're certainly seeing that in terms of security now right with breaches pretty well publicized with invasions, ransom, where you name it, right, all kinds of flavors of that. Unfortunately. So from your perspective in terms of being that this data data guardian, if you will um how much of your concerns now have been amplified in terms of security and privacy and and that kind of internal uh communication you have to have or or I guess by in you know to understand the need to make this data ultra secure and ultra private, especially in this environment where the bad actors you know are are prolific, so kind of talk about that it's a struggle but maybe that challenge That you have in this environment here in 2021. >>Yeah, I think what you know the way I would do it is the struggle is again that that need or the desire to to protect everything and at the end of the day that's hard. And so the struggle right now that I have ri faces the prioritization. How do we differentiate what we call the critical few some call it cds chris critical data elements that they call it Katie key dad elements there, there's there's a term but really as that need and that demand grows whether it's for security or privacy or even data democratization, which hopefully we do talk about at some point, all these things are reliance on the right data because like statistics garbage in garbage out. So whether it's because you need the right information because of your analytics and your models or as you talked about its prevention and defensive security reasons that defensive and offensive isn't going away. So the real struggle is not around the driver, but the prioritization. How do you focus to ensure you're spending your time on the right areas and more importantly in alignment with the business priorities? Because one of the things that's critically important for me is ensuring that it's not metadata or data governance or data quality for the sake of it, it is in alignment with that business priority. >>And and and a big part of that is is strategy for the future, right strategy going forward. you know, where you're going to go in the next 18, 24 months and so from uh without, you know, revealing state secrets here. How do you how do you see this playing out in terms of this continual digital transformation? If you will from the B O a side of the fence? Um, you know, what what do you see as being important or in terms of what you would like to accomplish over the next year and a half, two years >>for me? I think it's that and I'm glad you asked that question, cause I wanted to mention that that data democratization I think. And if we if we debunk that or look into that, what do I mean by democratization? It's that real time access, but it's not real time access to the wrong information or to the wrong people as we talked about, it really is ensuring almost like an amazon model that I can simply search for the information I need, I can put it in my shopping cart and I can check out and I am able to that's that data driven, I'm able to use that information knowing it's the right data in the right hands for the right reasons and that's really my future mind where I'm getting to is how do I enable that? How do I democratize it? So data is truly and does become that enterprise asset that everybody and anybody can access, but they can do so in a way that has all of those defensive controls in place, going back to that right data, right place the right time because the shiny toys of ai machine learning all those things is if you're building models off of the wrong data from the wrong place or in the wrong hands, it's going to bite you in about whether it's today, tomorrow, the future. >>Well, exactly. I love that analogy and on that I'm going to thank you for the time. So I'm gonna call you a luxury data salesperson, not a car car salesman. But uh it certainly has paid off and we certainly congratulate you as well on the award that you wanna hear from calabria. >>Thank you so much and thank you for the time. Hopefully you've enjoyed our conversations as much as I have. >>I certainly have. Thank you very much Jacqueline Osborn, joining us on the Bank of America, the data citizen of the Year. Her data citizens 2021. I'm john walls and you've been watching the cube >>mm

Published Date : Jun 17 2021

SUMMARY :

data citizen of the Year, one of the culebra Excellence Award winners. Thank you so much. that data matters is driving to that data as an asset and about making finding the relevance of data and delivering it on time to the right people within your that you are embedding into the culture and the DNA of the corporate and not so let's add that layer into this to not only internal, all the communication back that you have a solid basis and that you are taking the measures I shouldn't say content, content has that pejorative term to that you that you that you deal with, And it's really that fit freezes the data that I have the right data as I said earlier in terms of being that this data data guardian, if you will um So whether it's because you need the right information because of your analytics and your models or as you talked about And and and a big part of that is is strategy for the future, right strategy going forward. or in the wrong hands, it's going to bite you in about whether it's today, I love that analogy and on that I'm going to thank you for the time. Thank you so much and thank you for the time. Thank you very much Jacqueline Osborn, joining us on the Bank of America, the data citizen

ENTITIES

Entity	Category	Confidence
Jacqueline	PERSON	0.99+
Jacqueline Osborne	PERSON	0.99+
Jacqueline Osborn	PERSON	0.99+
Jacklyn Osborne	PERSON	0.99+
john Wallace	PERSON	0.99+
2021	DATE	0.99+
amazon	ORGANIZATION	0.99+
UK	LOCATION	0.99+
15 years	QUANTITY	0.99+
Bank of America	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
today	DATE	0.99+
2013	DATE	0.99+
yesterday	DATE	0.99+
facebook	ORGANIZATION	0.98+
john walls	PERSON	0.98+
Bank Of America	ORGANIZATION	0.98+
first one	QUANTITY	0.98+
one	QUANTITY	0.98+
10 years ago	DATE	0.97+
Katie	PERSON	0.97+
18	QUANTITY	0.91+
24 months	QUANTITY	0.89+
culebra	TITLE	0.88+
California	LOCATION	0.88+
C C P A	TITLE	0.87+
next year and a half	DATE	0.87+
Data Citizens 21	ORGANIZATION	0.85+
first regulation	QUANTITY	0.85+
Excellence Award	TITLE	0.78+
two years	QUANTITY	0.78+
Collibra Data	ORGANIZATION	0.76+
single day	QUANTITY	0.73+
chris	PERSON	0.69+
pillar three	OTHER	0.68+
five	QUANTITY	0.59+
Pillar two	OTHER	0.5+
hands	QUANTITY	0.49+
21	QUANTITY	0.45+
Pillar	PERSON	0.3+

Stijn "Stan" Christiaens | Collibra Data Citizens'21

>>From around the globe. It's the Cube covering data citizens 21 brought to you by culebra. Hello everyone john walls here as we continue our cube conversations here as part of Data citizens 21 the conference ongoing caliber at the heart of that really at the heart of data these days and helping companies and corporations make sense. All of those data chaos that they're dealing with, trying to provide new insights, new analyses being a lot more efficient and effective with your data. That's what culebra is all about and their founder and their Chief data Citizen if you will stand christians joins us today and stan I love that title. Chief Data Citizen. What is that all about? What does that mean? >>Hey john thanks for having me over and hopefully we'll get to the point where the chief data citizen titlists cleaves to you. Thanks by the way for giving us the opportunity to speak a little bit about what we're doing with our Chief Data Citizen. Um we started the community, the company about 13 years ago, uh 2008 and over those years as a founder, I've worn many different hats from product presales to partnerships and a bunch of other things. But ultimately the company reaches a certain point, a certain size where systems and processes become absolutely necessary if you want to scale further for us. This is the moment in time when we said, okay, we probably need a data office right now ourselves, something that we've seen with many of our customers. So he said, okay, let me figure out how to lead our own data office and figure out how we can get value out of data using our own software at Clear Bright Self. And that's where it achieved. That a citizen role comes in on friday evening. We like to call that, drinking our own champagne monday morning, you know, eating our own dog food. But essentially um this is what we help our customers do build out the offices. So we're doing this ourselves now when we're very hands on. So there's a lot of things we're learning again, just like our customers do. And for me at culebra, this means that I'm responsible as achieved data citizen for our overall data strategy, which talks a lot about data products as well as our data infrastructure, which is needed to power data problems now because we're doing this in the company and also doing this in a way that is helpful to our customers. Were also figuring out how do we translate the learning that we have ourselves and give them back to our customers, to our partners, to the broader ecosystem as a whole. And that's why uh if you summarize the strategy, I like the sometimes refer to it as Data office 2025, it's 2025. What is the data office looked like by then? And we recommend to our customers also have that forward looking view just as well. So if I summarize the the answer a little bit it's very similar to achieve their officer role but because it has the external evangelization component helping other data leaders we like to refer to it as the chief data scientist. >>Yeah that that kind of uh you talk about evangelizing obviously with that that you're talking about certain kinds of responsibilities and obligations and when I think of citizenship in general I think about privileges and rights and about national citizenship. You're talking about data citizenship. So I assume that with that you're talking about appropriate behaviors and the most uh well defined behaviors and kind of keep it between the lanes basically. Is that is that how you look at being a data citizen. And if not how would you describe that to a client about being a data citizen? >>It's a very good point as a citizen. You have the rights and responsibilities and the same is exactly true for a day to citizens. For us, starting with what it is right for us. The data citizen is somebody who uses data to do their job. And we've purposely made that definition very broad because today we believe that everyone in some way uses data, do their job. You know, data universal. It's critical to business processes and its importance is only increasing and we want all the data citizens to have appropriate access to data and and the ability to do stuff with data but also to do that in the right way. And if you think about it, this is not just something that applies to you and your job but also extends beyond the workplace because as a data citizen, you're also a human being. Of course. So the way you do data at home with your friends and family, all of this becomes important as well. Uh and we like to think about it as informed privacy. Us data citizens who think about trust in data all the time because ultimately everybody's talking today about data as an asset and data is the new gold and the new oil and the new soil. And there is a ton of value uh data but it's not just organizations themselves to see this. It's also the bad actors out there were reading a lot more about data breaches for example. So ultimately there is no value without rescue. Uh so as the data citizen you can achieve value but you also have to think about how do I avoid these risks? And as an organization, if you manage to combine both of those, that's when you can get the maximum value out of data in a trusted manner. >>Yeah, I think this is pretty interesting approach that you've taken here because obviously there are processes with regard to data, right? I mean you know that's that's pretty clear but there are there's a culture that you're talking about here that not only are we going to have an operational plan for how we do this certain activity and how we're going to uh analyze here, input here action uh perform action on that whatever. But we're gonna have a mindset or an approach mentally that we want our company to embrace. So if you would walk me through that process a little bit in terms of creating that kind of culture which is very different then kind of the X's and oh's and the technical side of things. >>Yeah, that's I think where organizations face the biggest challenge because you know, maybe they're hiring the best, most unique data scientists in the world, but it's not about what that individual can do, right? It's about what the combination of data citizens across the organization can do. And I think there it starts first by thinking as an individual about universal goal Golden rule, treat others as you would want to be treated yourself right the way you would ethically use data at your job. Think about that. There's other people and other companies who you would want to do the same thing. Um now from our experience and our own data office at cordoba as well as what we see with our customers, a lot of that personal responsibility, which is where culture starts, starts with data literacy and you know, we talked a little bit about Planet Rock and small statues in brussels Belgium where I'm from. But essentially um here we speak a couple of languages in Belgium and for organizations for individuals, Data literacy is very similar. You know, you're able to read and write, which are pretty essential for any job today. And so we want all data citizens to also be able to speak and read and write data fluently if I if I can express it this way. And one of the key ways of getting that done and establishing that culture around data uh is lies with the one who leads data in the organization, the Chief Petty Officer or however the roll is called. They play a very important role in this. Um, the comparison maybe that I always make there is think about other assets in your organization. You know, you're you're organized for the money asset for the talent assets with HR and a bunch of other assets. So let's talk about the money asset for a little bit, right? You have a finance department, you have a chief financial officer. And obviously their responsibility is around managing that money asset, but it's also around making others in the organization think about that money asset and they do that through established processes and responsibilities like budgeting and planning, but also ultimately to the individual where, you know, through expense sheets that we all off so much they make you think about money. So if the CFO makes everyone in the company thinks about think about money, that data officer or the data lead has to think has to make everyone think uh in the company about data as a as it just as well and and those rights those responsibilities um in that culture, they also change right today. They're set this and this way because of privacy and policy X. And Y. And Z. But tomorrow for example as with the european union's new regulation around the eye, there's a bunch of new responsibilities you have to think about. >>Mhm. You know you mentioned security and about value and risk which is certainly um they are part and parcel right? If I have something important, I gotta protect it because somebody else might want to um to create some damage, some harm uh and and steal my value basically. Well that's what's happening as you point out in the data world these days. So so what kind of work are you doing in that regard in terms of reinforcing the importance of security, culture, privacy culture, you know this kind of protective culture within an organization so that everybody fully understands the risks. But also the huge upsides if you do enforce this responsibility and these good behaviors that that obviously the company can gain from and then provide value to their client base. So how do you reinforce that within your clients to spread that culture if you will within their organizations? >>Um spreading a culture is not always an easy thing. Um especially a lot of organizations think about the value around data but to your point, not always about the risks that come associated with it sometimes just because they don't know about it yet. Right? There's new architecture is that come into play like the clouds and that comes with a whole bunch of new risk. That's why one of the things that we recommend always to our uh customers and to data officers and our customers organizations is that next to establishing that that data literacy, for example, and working on data products is that they also partners strongly with other leaders in their organization. On the one hand, for example, the legal uh folks, where typically you find the aspects around privacy and on the other hand, um the information security folks, because if you're building up a sort of map of your data, look at it like a castle, right that you're trying to protect. Uh if you don't have a map of your castle with the strong points and weak points and you know, where people can build, dig a hole under your wall or what have you, then it's very hard to defend. So you have to be able to get a map of your data. A data map if you will know what data is out there with being used by and and why and how and then you want to prioritize that data which is the most important, what are the most important uses and put the appropriate protections and controls in place. Um and it's fundamental that you do that together with your legal and information security partners because you may have as a data leader you may have the data module data expertise, but there's a bunch of other things that come into play when you're trying to protect, not just the data but really your company on its data as a whole. >>You know you were talking about 2025 a little bit ago and I think good for you. That's quite a crystal ball that you have you know looking uh with the headlights that far down the road. But I know you have to be you know that kind of progressive thinking is very important. What do you see in the long term for number one? You're you're kind of position as a chief data citizen if you will. And then the role of the chief data officer which you think is kind of migrating toward that citizenship if you will. So maybe put on those long term vision uh goggles of yours again and and tell me what do you see as far as these evolving roles and and these new responsibilities for people who are ceos these days? >>Um well 2025 is closer than we think right? And obviously uh my crystal ball is as Fuzzy as everyone else's but there's a few things that trends that you can easily identify and that we've seen by doing this for so long at culebra. Um and one is the push around data I think last year. Um the years 2020, 2020 words uh sort of Covid became the executive director of digitalization forced everyone to think more about digital. And I expect that to continue. Right. So that's an important aspect. The second important aspect that I expect to continue for the next couple of years, easily. 2025 is the whole movement to the cloud. So those cloud native architecture to become important as well as the, you know, preparing your data around and preparing your false, he's around it, et cetera. I also expect that privacy regulations will continue to increase as well as the need to protect your data assets. Um And I expect that a lot of achieved that officers will also be very busy building out those data products. So if you if you think that that trend then okay, data products are getting more important for t data officers, then um data quality is something that's increasingly important today to get right otherwise becomes a garbage in garbage out kind of situation where your data products are being fed bad food and ultimately their their outcomes are very tricky. So for us, for the chief data officers, Um I think there was about one of them in 2002. Um and then in 2019 ISH, let's say there were around 10,000. So there's there's plenty of upside to go for the chief data officers, there's plenty of roles like that needed across the world. Um and they've also evolved in in responsibility and I expect that their position, you know, it it is really a sea level position today in most organizations expect that that trend will also to continue to grow. But ultimately, those achieved that officers have to think about the business, right? Not just the defensive and offensive positions around data like policies and regulations, but also the support for businesses who are today shifting very fast and we'll continue to uh to digital. So those Tv officers will be seen as heroes, especially when they can build out a factory of data products that really supports the business. Um, but at the same time, they have to figure out how to um reach and always branch to their technical counterparts because you cannot build that factory of data products in my mind, at least without the proper infrastructure. And that's where your technical teams come in. And then obviously the partnerships with your video and information security folks, of course. >>Well heroes. Everybody wants to be the hero. And I know that uh you painted a pretty clear path right now as far as the Chief data officer is concerned and their importance and the value to companies down the road stan. We thank you very much for the time today and for the insight and wish you continued success at the conference. Thank you very much. >>Thank you very much. Have a nice day healthy. >>Thank you very much Dan Christians joining us talking about chief data citizenship if you will as part of data citizens 21. The conference being put on by caliber. I'm John Wall's thanks for joining us here on the Cube. >>Mhm.

Published Date : Jun 17 2021

SUMMARY :

citizens 21 brought to you by culebra. So if I summarize the the answer a little bit it's very similar to achieve And if not how would you describe that to a client about being a data So the way you do data So if you would walk me through that process a little bit in terms of creating the european union's new regulation around the eye, there's a bunch of new responsibilities you have But also the huge upsides if you do enforce this the legal uh folks, where typically you find the And then the role of the chief data officer which you think is kind of migrating toward that citizenship responsibility and I expect that their position, you know, it it is really a And I know that uh you painted a pretty Thank you very much. Thank you very much Dan Christians joining us talking about chief data citizenship if you

ENTITIES

Entity	Category	Confidence
Belgium	LOCATION	0.99+
2002	DATE	0.99+
2008	DATE	0.99+
John Wall	PERSON	0.99+
european union	ORGANIZATION	0.99+
john walls	PERSON	0.99+
Clear Bright Self	ORGANIZATION	0.99+
last year	DATE	0.99+
2019	DATE	0.99+
tomorrow	DATE	0.99+
both	QUANTITY	0.99+
culebra	ORGANIZATION	0.99+
today	DATE	0.98+
john	PERSON	0.98+
first	QUANTITY	0.98+
2025	DATE	0.98+
Stijn "Stan" Christiaens	PERSON	0.98+
one	QUANTITY	0.98+
2020	DATE	0.98+
Dan Christians	PERSON	0.98+
monday morning	DATE	0.97+
friday evening	DATE	0.97+
Covid	PERSON	0.97+
Collibra	ORGANIZATION	0.97+
around 10,000	QUANTITY	0.97+
next couple of years	DATE	0.92+
about 13 years ago	DATE	0.9+
brussels	LOCATION	0.85+
second important aspect	QUANTITY	0.8+
cordoba	ORGANIZATION	0.78+
christians	ORGANIZATION	0.62+
uh	ORGANIZATION	0.61+
Planet Rock	LOCATION	0.61+
Data	PERSON	0.58+
Data citizens 21	EVENT	0.56+
about	DATE	0.54+
ISH	ORGANIZATION	0.46+
21	ORGANIZATION	0.41+

Jim Cushman Product strategy vision | Data Citizens'21

>>Hi everyone. And welcome to data citizens. Thank you for making the time to join me and the over 5,000 data citizens like you that are looking to become United by data. My name is Jim Cushman. I serve as the chief product officer at Collibra. I have the benefit of sharing with you, the product, vision, and strategy of Culebra. There's several sections to this presentation, and I can't wait to share them with you. The first is a story of how we're taking a business user and making it possible for him or her data, use data and gain. And if it and insight from that data, without relying on anyone in the organization to write code or do the work for them next I'll share with you how Collibra will make it possible to manage metadata at scales, into the billions of assets. And again, load this into our software without writing any code third, I will demonstrate to you the integration we have already achieved with our newest product release it's data quality that's powered by machine learning. >>Right? Finally, you're going to hear about how Colibra has become the most universally available solution in the market. Now, we all know that data is a critical asset that can make or break an organization. Yet organizations struggle to capture the power of their data and many remain afraid of how their data could be misused and or abused. We also observe that the understanding of and access to data remains in the hands of just a small few, three out of every four companies continue to struggle to use data, to drive meaningful insights, all forward looking companies, looking for an advantage, a differentiator that will set them apart from their peers and competitors. What if you could improve your organization's productivity by just 5%, even a modest 5% productivity improvement compounded over a five-year period will make your organization 28% more productive. This will leave you with an overwhelming advantage over your competition and uniting your data. >>Litter employees with data is the key to your success. And dare I say, sorry to unlock this potential for increased productivity, huge competitive advantage organizations need to enable self-service access to data for everyday to literate knowledge worker. Our ultimate goal at Cleaver has always been to enable this self-service for our customers to empower every knowledge worker to access the data they need when they need it. But with the peace of mind that your data is governed insecure. Just to imagine if you had a single integrated solution that could deliver a seamless governed, no code user experience of delivering the right data to the right person at the right time, just as simply as ordering a pair of shoes online would be quite a magic trick and one that would place you and your organization on the fast track for success. Let me introduce you to our character here. >>Cliff cliff is that business analyst. He doesn't write code. He doesn't know Julian or R or sequel, but is data literate. When cliff has presented with data of high quality and can actually help find that data of high-quality cliff knows what to do with it. Well, we're going to expose cliff to our software and see how he can find the best data to solve his problem of the day, which is customer churn. Cliff is going to go out and find this information is going to bring it back to him. And he's going to analyze it in his favorite BI reporting tool. Tableau, of course, that could be Looker, could be power BI or any other of your favorites, but let's go ahead and get started and see how cliff can do this without any help from anyone in the organization. So cliff is going to log into Cleaver and being a business user. >>The first thing he's going to do is look for a business term. He looks for customer churn rate. Now, when he brings back a churn rate, it shows him the definition of churn rate and various other things that have been attributed to it such as data domains like product and customer in order. Now, cliff says, okay, customer is really important. So let me click on that and see what makes up customer definition. Cliff will scroll through a customer and find out the various data concepts attributes that make up the definition of customer and cliff knows that customer identifier is a really important aspect to this. It helps link all the data together. And so cliff is going to want to make sure that whatever source he brings actually has customer identifier in it. And that it's of high quality cliff is also interested in things such as email address and credit activity and credit card. >>But he's now going to say, okay, what data sets actually have customer as a data domain in, and by the way, why I'm doing it, what else has product and order information? That's again, relevant to the concept of customer churn. Now, as he goes on, he can actually filter down because there's a lot of different results that could potentially come back. And again, customer identifier was very important to cliff. So cliff, further filters on customer identifier any further does it on customer churn rate as well. This results in two different datasets that are available to cliff for selection, which one to use? Well, he's first presented with some data quality information you can see for customer analytics. It has a data quality score of 76. You can see for sales data enrichment dataset. It has a data quality score of 68. Something that he can see right at the front of the box of things that he's looking for, but let's dig in deeper because the contents really matter. >>So we see again the score of 76, but we actually have the chance to find out that this is something that's actually certified. And this is something that has a check mark. And so he knows someone he trusts is actually certified. This is a dataset. You'll see that there's 91 columns that make up this data set. And rather than sifting through all of that information, cliff is going to go ahead and say, well, okay, customer identifier is very important to me. Let me search through and see if I can find what it's data quality scores very quickly. He finds that using a fuzzy search and brings back and sees, wow, that's a really high data quality score of 98. Well, what's the alternative? Well, the data set is only has 68, but how about, uh, the customer identifier and quickly, he discovers that the data quality for that is only 70. >>So all things being equal, customer analytics is the better data set for what cliff needs to achieve. But now he wants to look and say, other people have used this, what have they had to say about it? And you can see there are various reviews for different reviews from peers of his, in the organization that have given it five stars. So this is encourages cliffs, a confidence that this is great data set to use. Now cliff wants to look a little bit more detailed before he finally commits to using this dataset. Cliff has the opportunity to look at it in the broader set. What are the things can I learn about customer analytics, such as what else is it related to? Who else uses it? Where did it come from? Where does it go and what actually happens to it? And so within our graph of information, we're able to show you a diagram. >>You can see the customer analytics actually comes from the CRM cloud system. And from there you can inherit some wonderful information. We know exactly what CRM cloud is about as an overall system. It's related to other logical models. And here you're actually seeing that it's related to a policy policy about PII or personally identifiable information. This gets cliff almost the immediate knowledge that there's going to be some customer information in this PII information that he's not going to be able to see given his user role in the organization. But cliff says, Hey, that's okay. I actually don't need to see somebody's name and social security number to do my work. I can actually work with other information in the data file. That'll actually help me understand why our customers churning in, what can I actually do about it. If we dig in deeper, we can see what is personally identifiable information that actually could cause issues. >>And as we scroll down and take a little bit of a focus on what we call or what you'll see here is customer phone, because we'll show that to you a little bit later, but these show the various information that once cliff actually has it fulfilled and delivered to him, he will see that it's actually massed and or redacted from his use. Now cliff might drive in deeper and see more information. And he says, you know what? Another piece that's important to me in my analysis is something called is churned. This is basically suggesting that has a customer actually churned. It's an important flag, of course, because that's the analysis that he's performing cliff sees that the score is a mere 65. That's not exactly a great data quality score, but cliff has, is kind of in a hurry. His bosses is, has come back and said, we need to have this information so we can take action. >>So he's not going to wait around to see if they can go through some long day to quality project before he pursues, but he is going to come up and use it. The speed of thinking. He's going to create a suggestion, an issue. He's going to submit this as a work queue item that actually informs others that are responsible for the quality of data. That there's an opportunity for improvement to this dataset that is highly reviewed, but it may be, it has room for improvement as cliff is actually typing in his explanation that he'll pass along. We can also see that the data quality is made up of multiple components, such as integrity, duplication, accuracy, consistency, and conformity. Um, we see that we can submit this, uh, issue and pass it through. And this will go to somebody else who can actually work on this. >>And we'll show that to you a little bit later, but back to cliff, cliff says, okay, I'd like to, I'd like to work with this dataset. So he adds it to his data basket. And just like if he's shopping online, cliff wants that kind of ability to just say, I want to just click once and be done with it. Now it is data and there's some sensitivity about it. And again, there's an owner of this data who you need to get permission from. So cliff is going to provide information to the owner to say, here's why I need this data. And how long do I need this data for starting on a certain date and ending on a certain date and ultimately, what purpose am I going to have with this data? Now, there are other things that cliff can choose to run. This one is how do you want this day to deliver to you? >>Now, you'll see down below, there are three options. One is borrow the other's lease and others by what does that mean? Well, borrow is this idea of, I don't want to have the data that's currently in this CRM, uh, cloud database moved somewhere. I don't want it to be persistent anywhere else. I just want to borrow it very short term to use in my Tablo report and then poof be gone. Cause I don't want to create any problems in my organization. Now you also see lease. Lease is a situation where you actually do need to take possession of the data, but only for a time box period of time, you don't need it for an indefinite amount of time. And ultimately buy is your ability to take possession of the data and have it in perpetuity. So we're going to go forward with our bar use case and cliff is going to submit this and all the fun starts there. >>So cliff has actually submitted the order and the owner, Joanna is actually going to receive the request for the order. Joanna, uh, opens up her task, UCS there's work to perform. It says, oh, okay, here's this there's work for me to perform. Now, Joanna has the ability to automate this using incorporated workflow that we have in Colibra. But for this situation, she's going to manually review that. Cliff wants to borrow a specific data set for a certain period of time. And he actually wants to be using in a Tablo context. So she reviews. It makes an approval and submits it this in turn, flips it back to cliff who says, okay, what obligations did I just take on in order to work for this data? And he reviews each of these data sharing agreements that you, as an organization would set up and say, what am I, uh, what are my restrictions for using this data site? >>As cliff accepts his notices, he now has triggered the process of what we would call fulfillment or a service broker. And in this situation we're doing a virtualization, uh, access, uh, for the borrow use case. Cliff suggests Tablo is his preferred BI and reporting tool. And you can see the various options that are available from power BI Looker size on ThoughtSpot. There are others that can be added over time. And from there, cliff now will be alerted the minute this data is available to them. So now we're running out and doing a distributed query to get the information and you see it returns back for raw view. Now what's really interesting is you'll see, the customer phone has a bunch of X's in it. If you remember that's PII. So it's actually being massed. So cliff can't actually see the raw data. Now cliff also wants to look at it in a Tablo report and can see the visualization layer, but you also see an incorporation of something we call Collibra on the go. >>Not only do we bring the data to the report, but then we tell you the reader, how to interpret the report. It could be that there's someone else who wants to use the very same report that cliff helped create, but they don't understand exactly all the things that cliff went through. So now they have the ability to get a full interpretation of what was this data that was used, where did it come from? And how do I actually interpret some of the fields that I see on this report? Really a clever combination of bringing the data to you and showing you how to use it. Cliff can also see this as a registered asset within a Colibra. So the next shopper comes through might actually, instead of shopping for the dataset might actually shop for the report itself. And the report is connected with the data set he used. >>So now they have a full bill of materials to run a customer Shern report and schedule it anytime they want. So now we've turned cliff actually into a creator of data assets, and this is where intelligent, it gets more intelligence and that's really what we call data intelligence. So let's go back through that magic trick that we just did with cliff. So cliff went into the software, not knowing if the source of data that he was looking for for customer product sales was even available to him. He went in very quickly and searched and found his dataset, use facts and facets to filter down to exactly what was available. Compare to contrast the options that were there actually made an observation that there actually wasn't enough data quality around a certain thing was important to him, created an idea, or basically a suggestion for somebody to follow up on was able to put that into his shopping basket checkout and have it delivered to his front door. >>I mean, that's a bit of a magic trick, right? So, uh, cliff was successful in finding data that he wanted and having it, deliver it to him. And then in his preferred model, he was able to look at it into Tableau. All right. So let's talk about how we're going to make this vision a reality. So our first section here is about performance and scale, but it's also about codeless database registration. How did we get all that stuff into the data catalog and available for, uh, cliff to find? So allow us to introduce you to what we call the asset life cycle and some of the largest organizations in the world. They might have upwards of a billion data assets. These are columns and tables, reports, API, APIs, algorithms, et cetera. These are very high volume and quite technical and far more information than a business user like cliff might want to be engaged with those very same really large organizations may have upwards of say, 20 to 25 million that are critical data sources and data assets, things that they do need to highly curate and make available. >>But through that as a bit of a distillation, a lifecycle of different things you might want to do along that. And so we're going to share with you how you can actually automatically register these sources, deal with these very large volumes at speed and at scale, and actually make it available with just a level of information you need to govern and protect, but also make it available for opportunistic use cases, such as the one we presented with cliff. So as you recall, when cliff was actually trying to look for his dataset, he identified that the is churned, uh, data at your was of low quality. So he passed this over to Eliza, who's a data steward and she actually receives this work queue in a collaborative fashion. And she has to review, what is the request? If you recall, this was the request to improve the data quality for his churn. >>Now she needs to familiarize herself with what cliff was observing when he was doing his shopping experience. So she digs in and wants to look at the quality that he was observing and sure enough, as she goes down and it looks at his churn, she sees that it was a low 65% and now understands exactly what cliff was referring to. She says, aha, okay. I need to get help. I need to decide whether I have a data quality project to fix the data, or should I see if there's another data set in the organization that has better, uh, data for this. And so she creates a queue that can go over to one of her colleagues who really focuses on data quality. She submits this request and it goes over to, uh, her colleague, John who's really familiar with data quality. So John actually receives the request from Eliza and you'll see a task showing up in his queue. >>He opens up the request and finds out that Eliza's asking if there's another source out there that actually has good is churned, uh, data available. Now he actually knows quite a bit about the quality of information sturdiness. So he goes into the data quality console and does a quick look for a dataset that he's familiar with called customer product sales. He quickly scrolls down and finds out the one that's actually been published. That's the one he was looking for and he opens it up to find out more information. What data sets are, what columns are actually in there. And he goes down to find his churned is in fact, one of the attributes in there. It actually does have active rules that are associated with it to manage the quality. And so he says, well, let's look in more detail and find out what is the quality of this dataset? >>Oh, it's 86. This is a dramatic improvement over what we've seen before. So we can see again, it's trended quite nicely over time each day, it hasn't actually degraded in performance. So we actually responds back to realize and say, this data set, uh, is actually the data set that you want to bring in. It really will improve. And you'll see that he refers to the refined database within the CRM cloud solution. Once he actually submits this, it goes back to Eliza and she's able to continue her work. Now when Eliza actually brings this back open, she's able to very quickly go into the database registration process for her. She very quickly goes into the CRM cloud, selects the community, to which she wants to register this, uh, data set into the schemas community. And the CRM cloud is the system that she wants to load it in. >>And the refined is the database that John told her that she should bring in. After a quick description, she's able to click register. And this triggers that automatic codeless process of going out to the dataset and bringing back its metadata. Now metadata is great, but it's not the end all be all. There's a lot of other values that she really cares about as she's actually registering this dataset and synchronizing the metadata she's also then asked, would you like to bring in quality information? And so she'll go out and say, yes, of course, I want to enable the quality information from CRM refined. I also want to bring back lineage information to associate with this metadata. And I also want to select profiling and classification information. Now when she actually selects it, she can also say, how often do you want to synchronize this? This is a daily, weekly, monthly kind of update. >>That's part of the change data capture process. Again, all automated without the require of actually writing code. So she's actually run this process. Now, after this loads in, she can then open up this new registered, uh, dataset and actually look and see if it actually has achieved the problem that cliff set her out on, which was improved data quality. So looking into the data quality for the is churn capability shows her that she has fantastic quality. It's at a hundred, it's exactly what she was looking for. So she can with confidence actually, uh, suggest that it's done, but she did notice something and something that she wants to tell John, which is there's a couple of data quality checks that seem to be missing from this dataset. So again, in a collaborative fashion, she can pass that information, uh, for validity and completeness to say, you know what, check for NOLs and MPS and send that back. >>So she submits this onto John to work on. And John now has a work queue in his task force, but remember she's been working in this task forklift and because she actually has actually added a much better source for his churn information, she's going to update that test that was sent to her to notify cliff that the work has actually been done and that she actually has a really good data set in there. In fact, if you recall, it was 100% in terms of its data quality. So this will really make life a lot easier for cliff. Once he receives that data and processes, the churn report analysis next time. So let's talk about these audacious performance goals that we have in mind. Now today, we actually have really strong performance and amazing usability. Our customers continue to tell us how great our usability is, but they keep asking for more well, we've decided to present to you. >>Something you can start to bank on. This is the performance you can expect from us on the highly curated assets that are available for the business users, as well as the technical and lineage assets that are more available for the developer uses and for things that are more warehoused based, you'll see in Q1, uh, our Q2 of this year, we're making available 5 million curated assets. Now you might be out there saying, Hey, I'm already using the software and I've got over 20 million already. That's fair. We do. We have customers that are actually well over 20 million in terms of assets they're managing, but we wanted to present this to you with zero conditions, no limitations we wouldn't talk about, well, it depends, et cetera. This is without any conditions. That's what we can offer you without fail. And yes, it can go higher and higher. We're also talking about the speed with which you can ingest the data right now, we're ingesting somewhere around 50,000 to a hundred thousand records per and of course, yes, you've probably seen it go quite a bit faster, but we are assuring you that that's the case, but what's really impressive is right now, we can also, uh, help you manage 250 million technical assets and we can load it at a speed of 25 million for our, and you can see how over the next 18 months about every two quarters, we show you dramatic improvements, more than doubling of these. >>For most of them leading up to the end of 2022, we're actually handling over a billion technical lineage assets and we're loading at a hundred million per hour. That sets the mark for the industry. Earlier this year, we announced a recent acquisition Al DQ. LDQ brought to us machine learning based data quality. We're now able to introduce to you Collibra data quality, the first integrated approach to Al DQ and Culebra. We've got a demo to follow. I'm really excited to share it with you. Let's get started. So Eliza submitted a task for John to work on, remember to add checks for no and for empty. So John picks up this task very quickly and looks and sees what's what's the request. And from there says, ah, yes, we do have a quality check issue when we look at these churns. So he jumps over to the data quality console and says, I need to create a new data quality test. >>So cliff is able to go in, uh, to the solution and, uh, set up quick rules, automated rules. Uh, he could inherit rules from other things, but it starts with first identifying what is the data source that he needs to connect to, to perform this. And so he chooses the CRM refined data set that was most recently, uh, registered by Lysa. You'll see the same score of 86 was the quality score for the dataset. And you'll also see, there are four rules that are associated underneath this. Now there are various checks that, uh, that John can establish on this, but remember, this is a fairly easy request that he receives from Eliza. So he's going to go in and choose the actual field, uh, is churned. Uh, and from there identify quick rules of, uh, an empty check and that quickly sets up the rules for him. >>And also the null check equally fast. This one's established and analyzes all the data in there. And this sets up the baseline of data quality, uh, for this. Now this data, once it's captured then is periodically brought back to the catalog. So it's available to not only Eliza, but also to cliff next time he, uh, where to shop in the environment. As we look through the rules that were created through that very simple user experience, you can see the one for is empty and is no that we're set up. Now, these are various, uh, styles that can be set up either manually, or you can set them up through machine learning again, or you can inherit them. But the key is to track these, uh, rule creation in the metrics that are generated from these rules so that it can be brought back to the catalog and then used in meaningful context, by someone who's shopping and the confidence that this has neither empty nor no fields, at least most of them don't well now give a confidence as you go forward. >>And as you can see, those checks have now been entered in and you can see that it's a hundred percent quality score for the Knoll check. So with confidence now, John can actually respond back to Eliza and say, I've actually inserted them they're up and running. And, uh, you're in good status. So that was pretty amazing integration, right? And four months after our acquisition, we've already brought that level of integration between, uh, Colibra, uh, data intelligence, cloud, and data quality. Now it doesn't stop there. We have really impressive and high site set early next year. We're getting introduced a fully immersive experience where customers can work within Culebra and actually bring the data quality information all the way in as well as start to manipulate the rules and generate the machine learning rules. On top of it, all of that will be a deeply immersive experience. >>We also have something really clever coming, which we call continuous data profiling, where we bring the power of data quality all the way into the database. So it's continuously running and always making that data available for you. Now, I'd also like to share with you one of the reasons why we are the most universally available software solutions in data intelligence. We've already announced that we're available on AWS and Google cloud prior, but today we can announce to you in Q3, we're going to be, um, available on Microsoft Azure as well. Now it's not just these three cloud providers that were available on we've also become available on each of their marketplaces. So if you are buying our software, you can actually go out and achieve that same purchase from their marketplace and achieve your financial objectives as well. We're very excited about this. These are very important partners for, uh, for our, for us. >>Now, I'd also like to introduce you our system integrators, without them. There's no way we could actually achieve our objectives of growing so rapidly and dealing with the demand that you customers have had Accenture, Deloitte emphasis, and even others have been instrumental in making sure that we can serve your needs when you need them. Uh, and so it's been a big part of our growth and will be a continued part of our growth as well. And finally, I'd like to actually introduce you to our product showcases where we can go into absolute detail on many of the topics I talked about today, such as data governance with Arco or data privacy with Sergio or data quality with Brian and finally catalog with Peter. Again, I'd like to thank you all for joining us. Uh, and we really look forward to hearing your feedback. Thank you..

Published Date : Jun 17 2021

SUMMARY :

I have the benefit of sharing with you, We also observe that the understanding of and access to data remains in the hands of to imagine if you had a single integrated solution that could deliver a seamless governed, And he's going to analyze it in his favorite BI reporting tool. And so cliff is going to want to make sure that are available to cliff for selection, which one to use? And rather than sifting through all of that information, cliff is going to go ahead and say, well, okay, Cliff has the opportunity to look at it in the broader set. knowledge that there's going to be some customer information in this PII information that he's not going to be And as we scroll down and take a little bit of a focus on what we call or what you'll see here is customer phone, We can also see that the data quality is made up of multiple components, So cliff is going to provide information to the owner to say, case and cliff is going to submit this and all the fun starts there. So cliff has actually submitted the order and the owner, Joanna is actually going to receive the request for the order. in a Tablo report and can see the visualization layer, but you also see an incorporation of something we call Collibra Really a clever combination of bringing the data to you and showing you how to So now they have a full bill of materials to run a customer Shern report and schedule it anytime they want. So allow us to introduce you to what we call the asset life cycle and And so we're going to share with you how you can actually automatically register these sources, And so she creates a queue that can go over to one of her colleagues who really focuses on data quality. And he goes down to find So we actually responds back to realize and say, this data set, uh, is actually the data set that you want And the refined is the database that John told her that she should bring in. So again, in a collaborative fashion, she can pass that information, uh, So she submits this onto John to work on. We're also talking about the speed with which you can ingest the data right We're now able to introduce to you Collibra data quality, the first integrated approach to Al So cliff is able to go in, uh, to the solution and, uh, set up quick rules, So it's available to not only Eliza, but also to cliff next time he, uh, And as you can see, those checks have now been entered in and you can see that it's a hundred percent quality Now, I'd also like to share with you one of the reasons why we are the most And finally, I'd like to actually introduce you to our product showcases where we can go into

ENTITIES

Entity	Category	Confidence
Joanna	PERSON	0.99+
John	PERSON	0.99+
Brian	PERSON	0.99+
Jim Cushman	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Eliza	PERSON	0.99+
Accenture	ORGANIZATION	0.99+
cliff	PERSON	0.99+
Arco	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
5 million	QUANTITY	0.99+
250 million	QUANTITY	0.99+
20	QUANTITY	0.99+
65	QUANTITY	0.99+
28%	QUANTITY	0.99+
25 million	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
98	QUANTITY	0.99+
Cliff	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
first section	QUANTITY	0.99+
68	QUANTITY	0.99+
first	QUANTITY	0.99+
76	QUANTITY	0.99+
One	QUANTITY	0.99+
five stars	QUANTITY	0.99+
Culebra	ORGANIZATION	0.99+
LDQ	ORGANIZATION	0.99+
91 columns	QUANTITY	0.99+
today	DATE	0.99+
Al DQ	ORGANIZATION	0.99+
Cleaver	ORGANIZATION	0.99+
86	QUANTITY	0.99+
one	QUANTITY	0.98+
three	QUANTITY	0.98+
end of 2022	DATE	0.98+
each day	QUANTITY	0.98+
each	QUANTITY	0.98+
over 20 million	QUANTITY	0.98+
Cliff cliff	PERSON	0.98+
next year	DATE	0.98+
Q1	DATE	0.98+
70	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
Tableau	TITLE	0.98+

Data Citizens '21 Preview with Felix Van de Maele, CEO, Collibra

>>At the beginning of the last decade, the technology industry was a buzzing because we were on the cusp of a new era of data. The promise of so-called big data was that it would enable data-driven organizations to tap a new form of competitive advantage. Namely insights from data at a much lower cost. The problem was data became plentiful, but insights. They remained scarce, a rash of technical complexity combined with a lack of trust due to conflicting data sources and inconsistent definitions led to the same story that we've heard for decades. We spent a ton of time and money to create a single version of the truth. And we're further away than we've ever been before. Maybe as an industry, we should be approaching this problem differently. Perhaps it should start with the idea that we have to change the way we serve business users. I E those who understand data context, and with me to discuss the evolving data space, his company, and the upcoming data citizens conference is Felix van de Mala, the CEO and founder of Collibra. Felix. Welcome. Great to see you. >>Great to see you. Great to be here. >>So tell us a little bit about Collibra and the problem that you're solving. Maybe you could double click on my upfront narrative. >>Yeah, I think you said it really well. Uh, we've seen so much innovation over the last couple of years in data, the exploding volume complexity of data. We've seen a lot of innovation of how to store and process that data, that, that volume of data more effectively or more cost-effectively, but fundamentally the source of the problem as being able to really derive insights from that data effectively when it's for an AI model or for reporting, it's still as difficult as it was, let's say 10 years ago. And if only in a way it's only become more, uh, more difficult. And so what we fundamentally believe is that next to that innovation on the infrastructure side of data, you really need to look at the people on process side of data. There's so many more people that today consume and produce data to do their job. >>That's why we talk about data citizens. They have to make it easier for them to find the right data in a way that they can trust that there's confidence in that data to be able to make decisions and to be able to trust the output of that, uh, of that model. And that's really what is focused on initially around governance. Uh, how do you make sure people actually are companies know what data they have and make sure they can trust it and they can use it in a compliant way. And now we've extended that into the only data intelligence platform today in the industry where we just make it easier for organizations to truly unite around the data across the whole organization, wherever that data is stored on premise and the cloud, whoever is actually using or consuming data. Uh, that's why we talk about data citizens. I >>Think you're right. I think it is more complex. There's just more of it. And there's more pressure on individuals to get advantage from it. But I, to ask you what sets Culebra apart, because I'd like you to explain why you're not just another data company chasing a problem with w it's going to be an incremental solution. It's really not going to change anything. What, what sets Collibra apart? >>Yeah, that's a really good question. And I think what's fundamentally sets us apart. What makes us unique is that we look at data or the problem around data as truly a business problem and a business function. So we fundamentally believe that if you believe that data is an asset, you really have to run it as a, as a, as a strategic business functions, just like your, um, uh, your HR function, your people function, your it functioning says a marketing function. You have a system to run that function. Now you have Salesforce to run sales and marketing. You have service now to run your, it function. You have Workday to run your people function, but you need the same system to really run your data from. And that's really how we think about GDPR. So we not another kind of faster, better database we know than other data management tool that makes the life of a single individual easier, which really a business application that focuses on how do we bring people together and effective rate so that they can collaborate around the data. It creates efficiency. So you don't have to do things ad hoc. You can easily find the right information. You can collaborate effectively. And it creates the confidence to actually be able to do something with the outcomes of it, the results of all of that work. And so fundamentally I'm looking at the problem as a, as a business function that needs a business system. We call it the system of record or system of engagement for the, for the data function, I think is absolutely critical and, and really unique in the, in our approach. So >>Data citizens are big user conference, data citizens, 21, it's coming up June 16th and 17th, the cubes stoked because we love talking about data. This is the first time we're bringing the cube to that event. So we're really gearing up for it. And I wonder if it could tell us a little bit about the history and the evolution of the data citizens conference? >>Absolutely. I think the first one is set at six years ago where we had a small event at a hotel downtown New York. Uh, most of the customers as their user conference, a lot of the banks, which are at the time of the main customers at 60 people. So very small events, and it exploded ever since, uh, this year we expect over 5,000 people. So it's really expanded beyond just the user conference to really become more of almost the community conference and the industry, um, the conference. So we're really excited, a big part of what we do, why we care so much about the conference. That's an opportunity to build that data citizens community. That's what we hear from our customers, from all attendees that come to the conference, uh, bring those people to get us all care about the same topic and are passionate about doing more at data, uh, being able to connect, uh, connect people together as a big part of that. So we've always, uh, we're always looking forwards, uh, through the event, uh, from that perspective >>Competition, of course, for virtual events these days with them, what's in it for me, what, who should attend and what can attendees expect from data citizens? 21. >>Yeah, absolutely. The good thing about the virtual event, uh, event is that everybody can attend. It's free, it's open from across the road, of course, but what we want for people to take away as attendees is that you learn something at pragmatics or the next day on the job, you can do something. You've learned something very specific. We've also been, um, um, excited and looked at what is possible from an innovation perspective. And so that's how we look at the events. We bring a lot of, um, uh, customers on my realization that they're going to share their best practices, very specifically, how they are, how they are handling data governance, how they're doing data, data, cataloging, how they're doing data privacy. So very specific best practices and tips on how to be successful, but then also industry experts that can paint the picture of where we going as an industry, what are the best practices? >>What do we need to think about today to be ready for what's going to come tomorrow? So that's a big focus. We, of course, we're going to talk about and our product. What are we, what do we have in store from a product roadmap and innovation perspective? How are we helping these organizations get their foster and not aspect as we were being in a lot of partners as well? Um, and so that's a big part of that broader ecosystem, uh, which is, which is really interesting. And I finally, like I said, it's really around the community, right? And that's what we hear continuously from the attendees. Just being able to make these connections, learn new people, learn what they're doing, how they've, uh, kind of, um, solved certain challenges. We hear that's a really big part of, uh, of the value proposition. So as an attendee, uh, the good thing is you can, you can join from anywhere. Uh, all of the content is going to be available on demand. So later it's going to be available for you to have to look at as well. Plus you're going to be farther out. You're going to become part of that data, citizens community, which has a really thriving and growing community where you're going to find a lot of like-minded people with the same passion, the same interest that McConnell learned the most from, well, I'd rather >>Like the term data citizen. I consider myself a data citizen, and it has implications just in terms of putting data in the hands of, of business users. So it's sort of central to this event, obviously. W what is a data citizen to Collibra? >>Yeah, it's, it's a really core part of our mission and our vision that we believe that today everyone needs data to do their job. Everyone in that sense has become a data citizen in the sense that they need to be able to easily access trustworthy data. We have to make it easy for people to easily find the right data that they can trust that they can understand. And I can do something like with and make their job easier. On the other hand, like a citizen, you have rights and you have responsibilities as a data citizen. You also have the responsibility to treat that data in the right way to make sure from a privacy and security perspective, that data is a as again, like I said, treated in the right way. And so that combination of making it easy, making it accessible, democratizing it, uh, but also making sure we treat data in the right way is really important. And that's a core part of what we believe that everyone is going to become a data citizen. And so, um, that's a big part of our mission. I like that >>We're to enter into a contract, I'll do my part and you'll give me access to that data. I think that's a great philosophy. So the call to action here, June 16th and 17th, go register@citizensdotcollibra.com go register because it's not just the normal mumbo jumbo. You're going to get some really interesting data. Felix, I'll give you the last word. >>No, like I said, it's like you said, go register. It's a great event. It's a great community to be part of June 16 at 17, you can block it in your calendar. So go to citizens up pretty bad outcome. It's going to be a, it's going to be a great event. Thanks for helping >>Us preview. Uh, this event is going to be a great event that really excited about Felix. Great to see you. And we'll see you on June 16th and 17th. Absolutely. All right. Thanks for watching everybody. This is Dave Volante for the cube. We'll see you next time.

Published Date : May 12 2021

SUMMARY :

At the beginning of the last decade, the technology industry was a buzzing because we were on Great to be here. So tell us a little bit about Collibra and the problem that you're solving. effectively or more cost-effectively, but fundamentally the source of the problem as being able to to be able to trust the output of that, uh, of that model. But I, to ask you what sets Culebra apart, And it creates the confidence to actually be able to do something with the the cubes stoked because we love talking about data. So it's really expanded beyond just the user conference to really become more of almost the community Competition, of course, for virtual events these days with them, what's in it for me, what, it's open from across the road, of course, but what we want for people to take Uh, all of the content is going to be available on demand. So it's sort of central to this event, You also have the responsibility to treat So the call to action here, June 16th and 17th, go register@citizensdotcollibra.com It's a great community to be part of June Uh, this event is going to be a great event that really excited about Felix.

ENTITIES

Entity	Category	Confidence
Felix van de Mala	PERSON	0.99+
Felix Van de Maele	PERSON	0.99+
Dave Volante	PERSON	0.99+
Felix	PERSON	0.99+
June 16	DATE	0.99+
June 16th	DATE	0.99+
17th	DATE	0.99+
60 people	QUANTITY	0.99+
tomorrow	DATE	0.99+
register@citizensdotcollibra.com	OTHER	0.99+
today	DATE	0.99+
Collibra	ORGANIZATION	0.99+
McConnell	PERSON	0.99+
six years ago	DATE	0.99+
over 5,000 people	QUANTITY	0.98+
single	QUANTITY	0.97+
Culebra	ORGANIZATION	0.96+
this year	DATE	0.96+
GDPR	TITLE	0.95+
first one	QUANTITY	0.95+
10 years ago	DATE	0.95+
first time	QUANTITY	0.95+
last decade	DATE	0.93+
17	DATE	0.92+
New York	LOCATION	0.89+
21	DATE	0.88+
single version	QUANTITY	0.88+
decades	QUANTITY	0.86+
data citizens	EVENT	0.75+
next day	DATE	0.72+
double	QUANTITY	0.66+
last couple of years	DATE	0.64+
Data Citizens	TITLE	0.63+
more people	QUANTITY	0.61+
ton	QUANTITY	0.53+
Salesforce	ORGANIZATION	0.51+
'21	DATE	0.44+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for culebra: