Jon Loyens, data.world | Snowflake Summit 2022
>>Good morning, everyone. Welcome back to the Cube's coverage of snowflake summit 22 live from Caesar's forum in Las Vegas. Lisa Martin, here with Dave Valante. This is day three of our coverage. We've had an amazing, amazing time. Great conversations talking with snowflake executives, partners, customers. We're gonna be digging into data mesh with data.world. Please welcome John loins, the chief product officer. Great to have you on the program, John, >>Thank you so much for, for having me here. I mean, the summit, like you said, has been incredible, so many great people, so such a good time, really, really nice to be back in person with folks. >>It is fabulous to be back in person. The fact that we're on day four for, for them. And this is the, the solution showcase is as packed as it is at 10 11 in the morning. Yeah. Is saying something >>Yeah. Usually >>Chopping at the bit to hear what they're doing and innovate. >>Absolutely. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, especially >>In Vegas. This is impressive. Talk to the audience a little bit about data.world, what you guys do and talk about the snowflake relationship. >>Absolutely data.world is the only true cloud native enterprise data catalog. We've been an incredible snowflake partner and Snowflake's been an incredible partner to us really since 2018. When we became the first data catalog in the snowflake partner connect experience, you know, snowflake and the data cloud make it so possible. And it's changed so much in terms of being able to, you know, very easily transition data into the cloud to break down those silos and to have a platform that enables folks to be incredibly agile with data from an engineering and infrastructure standpoint, data out world is able to provide a layer of discovery and governance that matches that agility and the ability for a lot of different stakeholders to really participate in the process of data management and data governance. >>So data mesh basically Jamma, Dani lays out the first of all, the, the fault domains of existing data and big data initiatives. And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams that you have to go through and it just slows everything down and it doesn't scale. They don't have domain context. So she came up with four principles if I may, yep. Domain ownership. So push it out to the businesses. They have the context they should own the data. The second is data as product. We're certainly hearing a lot about that today this week. The third is that. So that makes it sounds good. Push out the, the data great, but it creates two problems. Self-serve infrastructure. Okay. But her premises infrastructure should be an operational detail. And then the fourth is computational governance. So you talked about data CA where do you fit in those four principles? >>You know, honestly, we are able to help teams realize the data mesh architecture. And we know that data mesh is really, it's, it's both a process in a culture change, but then when you want to enact a process in a culture change like this, you also need to select the appropriate tools to match the culture that you're trying to build the process in the architecture that you're trying to build. And the data world data catalog can really help along all four of those axes. When you start thinking first about, let's say like, let's take the first one, you know, data as a product, right? We even like very meta of us from metadata management platform at the end of the day. But very meta of us. When you talk about data as a product, we track adoption and usage of all your data assets within your organization and provide program teams and, you know, offices of the CDO with incredible evented analytics, very detailed that gives them the right audit trail that enables them to direct very scarce data engineering, data architecture resources, to make sure that their data assets are getting adopted and used properly. >>On the, on the domain driven side, we are entirely knowledge graph and open standards based enabling those different domains. We have, you know, incredible joint snowflake customers like Prologis. And we chatted a lot about this in our session here yesterday, where, because of our knowledge graph underpinnings, because of the flexibility of our metadata model, it enables those domains to actually model their assets uniquely from, from group to group, without having to, to relaunch or run different environments. Like you can do that all within one day catalog platform without having to have separate environments for each of those domains, federated governance. Again, the amount of like data exhaust that we create that really enables ambient governance and participatory governance as well. We call it agile data governance, really the adoption of agile and open principles applied to governance to make it more inclusive and transparent. And we provide that in a way that Confederate across those means and make it consistent. >>Okay. So you facilitate across that whole spectrum of, of principles. And so what in the, in the early examples of data mesh that I've studied and actually collaborated with, like with JPMC, who I don't think is who's not using your data catalog, but hello, fresh who may or may not be, but I mean, there, there are numbers and I wanna get to that. But what they've done is they've enabled the domains to spin up their own, whatever data lakes, data, warehouses, data hubs, at least in, in concept, most of 'em are data lakes on AWS, but still in concept, they wanna be inclusive and they've created a master data catalog. And then each domain has its sub catalogue, which feeds into the master and that's how they get consistency and governance and everything else is, is that the right way to think about it? And or do you have a different spin on that? >>Yeah, I, I, you know, I have a slightly different spin on it. I think organizationally it's the right way to think about it. And in absence of a catalog that can truly have multiple federated metadata models, multiple graphs in one platform, I, that is really kind of the, the, the only way to do it, right with data.world. You don't have to do that. You can have one platform, one environment, one instance of data.world that spans all of your domains, enable them to operate independently and then federate across. So >>You just answered my question as to why I should use data.world versus Amazon glue. >>Oh, absolutely. >>And that's a, that's awesome that you've done now. How have you done that? What, what's your secret >>Sauce? The, the secret sauce era is really an all credit to our CTO. One of my closest friends who was a true student of knowledge graph practices and principles, and really felt that the right way to manage metadata and knowledge about the data analytics ecosystem that companies were building was through federated linked data, right? So we use standards and we've built a, a, an open and extensible metadata model that we call costs that really takes the best parts of existing open standards in the semantics space. Things like schema.org, DCA, Dublin core brings them together and models out the most typical enterprise data assets providing you with an ontology that's ready to go. But because of the graph nature of what we do is instantly accessible without having to rebuild environments, without having to do a lot of management against it. It's, it's really quite something. And it's something all of our customers are, are very impressed with and, and, and, and, you know, are getting a lot of leverage out of, >>And, and we have a lot of time today, so we're not gonna shortchange this topic. So one last question, then I'll shut up and let you jump in. This is an open standard. It's not open source. >>No, it's an open built on open standards, built on open standards. We also fundamentally believe in extensibility and openness. We do not want to vertically like lock you into our platform. So everything that we have is API driven API available. Your metadata belongs to you. If you need to export your graph, you know, instantly available in open machine readable formats. That's really, we come from the open data community. That was a lot of the founding of data.world. We, we worked a lot in with the open data community and we, we fundamentally believe in that. And that's enabled a lot of our customers as well to truly take data.world and not have it be a data catalog application, but really an entire metadata management platform and extend it even further into their enterprise to, to really catalog all of their assets, but also to build incredible integrations to things like corporate search, you know, having data assets show up in corporate Wiki search, along with all the, the descriptive metadata that people need has been incredibly powerful and an incredible extension of our platform that I'm so happy to see our customers in. >>So leasing. So it's not exclusive to, to snowflake. It's not exclusive to AWS. You can bring it anywhere. Azure GCP, >>Anytime. Yeah. You know where we are, where we love snowflake, look, we're at the snowflake summit. And we've always had a great relationship with snowflake though, and really leaned in there because we really believe Snowflake's principles, particularly around cloud and being cloud native and the operating advantages that it affords companies that that's really aligned with what we do. And so snowflake was really the first of the cloud data catalogs that we ultimately or say the cloud data warehouses that we integrated with and to see them transition to building really out the data cloud has been awesome. >>Talk about how data world and snowflake enable companies like per lodges to be data companies. These days, every company has to be a data company, but they, they have to be able to do so quickly to be competitive and to, to really win. How do you help them if we like up level the conversation to really impacting the overall business? >>That's a great question, especially right now, everybody knows. And pro is a great example. They're a logistics and supply chain company at the end of the day. And we know how important logistics and supply chain is nowadays and for them and for a lot of our customers. I think one of the advantages of having a data catalog is the ability to build trust, transparency and inclusivity into their data analytics practice by adopting agile principles, by adopting a data mesh, you're able to extend your data analytics practice to a much broader set of stakeholders and to involve them in the process while the work is getting done. One of the greatest things about agile software development, when it became a thing in the early two thousands was how inclusive it was. And that inclusivity led to a much faster ROI on software projects. And we see the same thing happening in data analytics, people, you know, we have amazing data scientists and data analysts coming up with these insights that could be business changing that could make their company significantly more resilient, especially in the face of economic uncertainty. >>But if you have to sit there and argue with your business stakeholders about the validity of the data, about the, the techniques that were used to do the analysis, and it takes you three months to get people to trust what you've done, that opportunity's passed. So how do we shorten those cycles? How do we bring them closer? And that's, that's really a huge benefit that like Prologis has, has, has realized just tightening that cycle time, building trust, building inclusion, and making sure ultimately humans learn by doing, and if you can be inclusive, it, even, it even increases things like that. We all want to, to, to, to help cuz Lord knows the world needs it. Things like data literacy. Yeah. Right. >>So data.world can inform me as to where on the spectrum of data quality, my data set lives. So I can say, okay, this is usable, shareable, you know, exactly of gold standard versus fix this. Right. Okay. Yep. >>Yep. >>That's yeah. Okay. And you could do that with one data catalog, not a bunch of >>Yeah. And trust trust is really a multifaceted and multi multi-angle idea, right? It's not just necessarily data quality or data observability. And we have incredible partnerships in that space, like our partnership with, with Monte Carlo, where we can ingest all their like amazing observability information and display that in a really like a really consumable way in our data catalog. But it also includes things like the lineage who touch it, who is involved in the process of a, can I get a, a, a question answered quickly about this data? What's it been used for previously? And do I understand that it's so multifaceted that you have to be able to really model and present that in a way that's unique to any given organization, even unique within domains within a single organization. >>If you're not, that means to suggest you're a data quality. No, no supplier. Absolutely. But your partner with them and then that you become the, the master catalog. >>That's brilliant. I love it. Exactly. And you're >>You, you just raised your series C 15 million. >>We did. Yeah. So, you know, really lucky to have incredible investors like Goldman Sachs, who, who led our series C it really, I think, communicates the trust that they have in our vision and what we're doing and the impact that we can have on organization's ability to be agile and resilient around data analytics, >>Enabling customers to have that single source of truth is so critical. You talked about trust. That is absolutely. It's no joke. >>Absolutely. >>That is critical. And there's a tremendous amount of business impact, positive business impact that can come from that. What are some of the things that are next for data.world that we're gonna see? >>Oh, you know, I love this. We have such an incredibly innovative team. That's so dedicated to this space and the mission of what we're doing. We're out there trying to fundamentally change how people get data analytics work done together. One of the big reasons I founded the company is I, I really truly believe that data analytics needs to be a team sport. It needs to go from, you know, single player mode to team mode and everything that we've worked on in the last six years has leaned into that. Our architecture being cloud native, we do, we've done over a thousand releases a year that nobody has to manage. You don't have to worry about upgrading your environment. It's a lot of the same story that's made snowflake. So great. We are really excited to have announced in March on our own summit. And we're rolling this suite of features out over the course of the year, a new package of features that we call data.world Eureka, which is a suite of automations and, you know, knowledge driven functionality that really helps you leverage a knowledge graph to make decisions faster and to operationalize your data in, in the data ops way with significantly less effort, >>Big, big impact there. John, thank you so much for joining David, me unpacking what data world is doing. The data mesh, the opportunities that you're giving to customers and every industry. We appreciate your time and congratulations on the news and the funding. >>Ah, thank you. It's been a, a true pleasure. Thank you for having me on and, and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. Thank you. >>We will. All right. For our guest and Dave ante, I'm Lisa Martin. You're watching the cubes third day of coverage of snowflake summit, 22 live from Vegas, Dave and I will be right back with our next guest. So stick around.
SUMMARY :
Great to have you on the program, John, I mean, the summit, like you said, has been incredible, It is fabulous to be back in person. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, what you guys do and talk about the snowflake relationship. And it's changed so much in terms of being able to, you know, very easily transition And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams about, let's say like, let's take the first one, you know, data as a product, We have, you know, incredible joint snowflake customers like Prologis. governance and everything else is, is that the right way to think about it? And in absence of a catalog that can truly have multiple federated How have you done that? of knowledge graph practices and principles, and really felt that the right way to manage then I'll shut up and let you jump in. an incredible extension of our platform that I'm so happy to see our customers in. It's not exclusive to AWS. first of the cloud data catalogs that we ultimately or say the cloud data warehouses but they, they have to be able to do so quickly to be competitive and to, thing happening in data analytics, people, you know, we have amazing data scientists and data the data, about the, the techniques that were used to do the analysis, and it takes you three So I can say, okay, this is usable, shareable, you know, That's yeah. that you have to be able to really model and present that in a way that's unique to any then that you become the, the master catalog. And you're that we can have on organization's ability to be agile and resilient Enabling customers to have that single source of truth is so critical. What are some of the things that are next for data.world that we're gonna see? It needs to go from, you know, single player mode to team mode and everything The data mesh, the opportunities that you're giving to customers and every industry. and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. So stick around.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Jon Loyens | PERSON | 0.99+ |
Monte Carlo | ORGANIZATION | 0.99+ |
John loins | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
March | DATE | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
Vegas | LOCATION | 0.99+ |
Goldman Sachs | ORGANIZATION | 0.99+ |
yesterday | DATE | 0.99+ |
three months | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
one platform | QUANTITY | 0.99+ |
one day | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
two problems | QUANTITY | 0.99+ |
fourth | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
2018 | DATE | 0.99+ |
15 million | QUANTITY | 0.98+ |
Dani | PERSON | 0.98+ |
second | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
third day | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
Snowflake | ORGANIZATION | 0.98+ |
DCA | ORGANIZATION | 0.98+ |
one last question | QUANTITY | 0.98+ |
data.world. | ORGANIZATION | 0.97+ |
Prologis | ORGANIZATION | 0.97+ |
JPMC | ORGANIZATION | 0.97+ |
each domain | QUANTITY | 0.97+ |
today this week | DATE | 0.97+ |
Jamma | PERSON | 0.97+ |
both | QUANTITY | 0.97+ |
first data catalog | QUANTITY | 0.95+ |
Snowflake Summit 2022 | EVENT | 0.95+ |
each | QUANTITY | 0.94+ |
today | DATE | 0.94+ |
single | QUANTITY | 0.94+ |
data.world | ORGANIZATION | 0.93+ |
day three | QUANTITY | 0.93+ |
one | QUANTITY | 0.93+ |
one instance | QUANTITY | 0.92+ |
over a thousand releases a year | QUANTITY | 0.92+ |
day four | QUANTITY | 0.91+ |
Snowflake | TITLE | 0.91+ |
four | QUANTITY | 0.91+ |
10 11 in the morning | DATE | 0.9+ |
22 | QUANTITY | 0.9+ |
one environment | QUANTITY | 0.9+ |
single organization | QUANTITY | 0.88+ |
four principles | QUANTITY | 0.86+ |
agile | TITLE | 0.85+ |
last six years | DATE | 0.84+ |
one data catalog | QUANTITY | 0.84+ |
Eureka | ORGANIZATION | 0.83+ |
Azure GCP | TITLE | 0.82+ |
Caesar | PERSON | 0.82+ |
series C | OTHER | 0.8+ |
Cube | ORGANIZATION | 0.8+ |
data.world | OTHER | 0.78+ |
Lord | PERSON | 0.75+ |
thousands | QUANTITY | 0.74+ |
single source | QUANTITY | 0.74+ |
Dublin | ORGANIZATION | 0.73+ |
snowflake summit 22 | EVENT | 0.7+ |
Wiki | TITLE | 0.68+ |
schema.org | ORGANIZATION | 0.67+ |
early two | DATE | 0.63+ |
CDO | TITLE | 0.48+ |
Analyst Predictions 2022: The Future of Data Management
[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you
SUMMARY :
the end of the day need to speak you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
381 databases | QUANTITY | 0.99+ |
2014 | DATE | 0.99+ |
2022 | DATE | 0.99+ |
2021 | DATE | 0.99+ |
january of 2022 | DATE | 0.99+ |
100 users | QUANTITY | 0.99+ |
jamal dagani | PERSON | 0.99+ |
last week | DATE | 0.99+ |
dave meninger | PERSON | 0.99+ |
sanji | PERSON | 0.99+ |
second question | QUANTITY | 0.99+ |
15 converged data stores | QUANTITY | 0.99+ |
dave vellante | PERSON | 0.99+ |
microsoft | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
sanjeev | PERSON | 0.99+ |
2023 | DATE | 0.99+ |
15 data stores | QUANTITY | 0.99+ |
siliconangle.com | OTHER | 0.99+ |
last year | DATE | 0.99+ |
sanjeev mohan | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
carl | PERSON | 0.99+ |
tony | PERSON | 0.99+ |
carl olufsen | PERSON | 0.99+ |
six years | QUANTITY | 0.99+ |
david | PERSON | 0.99+ |
carlos specter | PERSON | 0.98+ |
both sides | QUANTITY | 0.98+ |
2010s | DATE | 0.98+ |
first backlash | QUANTITY | 0.98+ |
five years | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
dave | PERSON | 0.98+ |
each | QUANTITY | 0.98+ |
three quarters | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
single platform | QUANTITY | 0.98+ |
lake house | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
doug | PERSON | 0.97+ |
one word | QUANTITY | 0.97+ |
this year | DATE | 0.97+ |
wikibon.com | OTHER | 0.97+ |
one platform | QUANTITY | 0.97+ |
39 | QUANTITY | 0.97+ |
about 600 percent | QUANTITY | 0.97+ |
two analysts | QUANTITY | 0.97+ |
ten years | QUANTITY | 0.97+ |
single platform | QUANTITY | 0.96+ |
five | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
three quarters | QUANTITY | 0.96+ |
california | LOCATION | 0.96+ |
ORGANIZATION | 0.96+ | |
single | QUANTITY | 0.95+ |
Predictions 2022: Top Analysts See the Future of Data
(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)
SUMMARY :
and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Menninger | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Brad Shimmin | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Dave Velannte | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Carl | PERSON | 0.99+ |
Brad | PERSON | 0.99+ |
Carl Olofson | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
2014 | DATE | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Ventana Research | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
January of 2022 | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
381 databases | QUANTITY | 0.99+ |
IDC | ORGANIZATION | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Sanjeev | PERSON | 0.99+ |
2021 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
Omdia | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
SanjMo | ORGANIZATION | 0.99+ |
79% | QUANTITY | 0.99+ |
second question | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
15 data stores | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
SAP | ORGANIZATION | 0.99+ |
Mike Flasko, Microsoft | Microsoft Ignite 2018
>> Live from Orlando, Florida it's theCUBE, covering Microsoft Ignite. Brought to you by Cohesity and theCUBE's eco-system partners. >> Welcome back everyone to theCUBE's live coverage of Microsoft Ignite. I'm your host Rebecca Knight along with my co-host Stu Miniman. We are joined by Mike Flasko. He is the Principal Group Product Manager here at Microsoft. Thanks so much for returning to theCUBE, you are a CUBE alumni. >> I am, yeah thanks for having me back. I appreciate it. >> So you oversee a portfolio of products. Can you let our viewers know what are you workin' on right now? >> Sure, yeah. I work in the area of data integration and governance at Microsoft, so everything around data integration, data acquisition, transformation and then pushing into the governance angles of, you know, once you acquire data and analyze it are you handling it properly as per industry guidelines or enterprise initiatives as you might have? >> You mentioned the magic word, transformation. I would love to have you define. It's become a real buzz word in this industry. How do you define digital transformation? >> Sure, I think it's a great discussion because we're talking about this all the time, but what does that really mean? And for us, the way I see it is starting to make more and more data driven decisions all the time. And so it's not like a light switch, where you weren't and then you were. Typically what happens is as we start working with customers they find new and interesting ways to use more data to help them make a more informed decision. And it starts from a big project or a small project and then just kind of takes off throughout the company. And so really, I think it boils down to using more data and having that guide a lot of the decisions you're making and typically that starts with tapping into a bunch of data that you may already have that just hasn't been part of your kind of traditional data warehousing or BI loop and thinking about how you can do that. >> Mike bring us inside the portfolio a little bit, you know, everybody knows Microsoft. We think about our daily usage of all the Microsoft product that my business data runs through, but when you talk about your products they're specific around the data. Help us walk through that a little bit. >> Sure, yeah. So we have a few kind of flagship products in the space, if you will. The first is something called Azure Data Factory and the purpose of that product is fairly simple. It's really for data professionals. They might be integrators or warehousing professionals et cetera and its to facilitate making it really easy to acquire data from wherever it is. Your business data on-prem from other clouds, SAS applications and allow a really easy experience to kind of bring data into your cloud, into our cloud for analytics and then build data processing pipelines that take that raw data and transform it into something useful, whatever your business domain requires. Whether that's training a machine learning model or populating your warehouse based on more data than you've had before. So first one, data factory all about data integration kind of a modern take on it. Built for the cloud, but fundamentally supports hybrid scenarios. And then other products we've got are things like Azure Data Catalog, which are more in the realm of aiding the discovery and governance of data. So once you start acquiring all this data and using it more productively, you start to have a lot and how do you connect those who want to consume data with the data professionals or data scientists that are producing these rich data sets. So how do you connect your information workers with your data scientists or your data engineers that are producing data sets? Data catalog's kind of the glue between the two. >> Mike wondering if you can help connect the dots to some of the waves we've been seeing. There was a traditonal kind of BI and data warehousing then we went through a kind of big data, the volumes of data and how can I, even if I'm not some multi-national or global company, take advantage of the data? Now there's machine intelligence. Machine learning, AI and all these pieces. What's the same and what's different about the trend and the products today? >> Sure, I think the first thing I've learnt through this process and being in our data space for a while and then working our big data projects is that, for a while we used to talk about them as different things. Like you do data warehousing and now that kind of has an old kind of connotation feeling to it. It's got an old feel to it, right? And then we talk about big data and you have a big data project and I think the realization that we've got is it's really those two things starting to come together and if you think about it, like everybody has been doing some form of analytics and warehousing for a while. And if we start to think about what the Brick Data Technologies has brought is a couple of things, in my opinion that kind of bring these two things together is with big data we started to be able to acquire data of significantly larger size and varying shape, right? But at the end of the day, the task is often acquire that data, shape that data into something useful and then connect it up to our business decision makers that need to leverage that data from a day to day basis. We've been doing that process in warehousing forever. It's really about how easily can we marry big data processing with the traditional data warehousing processes so that our warehouses, our decision making can kind of scale to large data and different shapes of data. And so probably what you'll see actually, at Ignite conference in a lot of our sessions, you'll hear our speakers talking about something called a modern data warehousing and like, it really doesn't matter what the label is associated with it. But it's really about how do you use big data technologies like Spark and Data Bricks naturally alongside warehousing technologies and integration technologies so they really form the modern data warehouse that does naturally handle big data, that does naturally bring in data of all shapes and sizes and provides kind of an experimentation ground as well, for data science. I think that's the last one that kind of comes in is once you've got big data and warehousing kind of working together to expand your analytics beyond kind of traditional approaches the next is opening up some of that data earlier in its life cycle for experimentation by data science. It's kind of the new angle and we think about this notion of kind of modern data warehousing as almost one thing supporting them all going forward. I think the challenge we've had is when we try to separate these into kind of net new deliverables, net new projects where we're starting to kind of bifurcate, if you will, the data platform to some degree. And things were getting a little too complex and so I think what we're seeing is that people are learning what these tools are good at and what they're not good at and now how to bring them together to really get back some of the productivity that we've had in the past. >> I want to ask you about those business decision makers that you referenced. I mean there's an assumption that every organization wants to become more data driven. And I think that most companies would probably say yes, but then there's another set of managers who really want to go by their gut. I mean have you found that being a conflict in terms of how you are positioning the products and services? >> Yeah absolutley. In a number of customer engagements we've had where you start to bring in more data, you start to evolve kind of the analytics practice. There is a lot of resistance at times that, you know, we've done it this way for 20 years, business is pretty good. What are we really fixing here? And so what we've found is the best path through this and in a lot of cases the required path has been show people the art of the possible, run experiments, show them side by side examples and typically with that comes a comfort level in what's possible sometimes it exposes new capabilities and options, sometimes it also shows that there's some other ways to arrive at decisions, but we've certainly seen that and almost like anything, you kind of have to start small, create a proving ground and be able to do it in a kind of side by side manner to show comparison as we go, but it's a conversation that I think is going to carry forward for the next little while especially as some of the work in AI and machine learning is starting to make it's way into business critical settings, right? Pricing your products. Product placement. All of this stuff that directly affects bottom lines you're starting to see these models do a really good job. And I think what we've found is it's all about experimentation. >> Mike when we listen to (mumbles) and to Dell and we talk about, you know, how things are developed inside of Microsoft, usually hear things like open and extensible, you got to have APIs in any of these modern pieces. It was highlighted in the Keynote on Monday, talking about the open data initiative got companies like Adobe and SAP out there, they have a lot of data, so the question is, of course, Microsoft has a lot of data that customers flow through, but there's also this very large eco-system we see at this show. What's the philosophy? Is it just, you know, oh, I've got some APIs and people plug into it? How does all the data get so that the customers can use it? >> Yeah it's a great question. That one I work a lot on and I think there's a couple of angles to it. One is, I think as big data's taken off, a lot of the integration technology that we've used in the past really wasn't made for this era. Where you've got data coming from everywhere. It's different shapes and it's different sizes and so at least within some of our products, we've been investing a lot into how do we make it really easy to acquire all the data you need because, you know, like you hear in all these cases, you can have the best model in the world if you don't have the best data sets it doesn't matter. Digital transformation starts with getting access to more data than you had before and so I think we've been really focused on this, we call it the ingestion of data. Being able to really easily connect and acquire all of the data and that's the starting point. The next thing that we've seen from companies have kind of gone down that journey with us is once you've acquired it all, you quickly have to understand it and you have to be able to kind of search over it and understand it through the lens of potentially business terms if you're a business user trying to understand what is all these data sets? What do they mean? And so I think this is where you're starting to see the rise of data cataloging initiatives not necessarily master data, et cetera, of the past, but this idea of, wow, I'm acquiring all of this data, how do I make sense of it? How do I catalog it? How does all of my workers or my employees easily find what they need and search for the data through the lens that makes sense to them. Data scientists are going to search through a very technical lens. Your business users through business glossary, business domain terms in that way and, so for me it all starts with the acquisition. I think it still far too hard and then becomes kind of a cataloging initiative and then the last step is how do we start to get some form of standards or agreement around the semantics of the data itself? Like this is a customer, this is a place. This is what, you know, a rating and I think with that you're going to start to see a whole eco-system of apps start to develop and one of the things that we're pretty excited about with the open data partnerships is how can we bring in data and to some degree auto-classify it into a set of terms that allow you to just get on with the business logic as opposed to spend all the time in the acquisition phase that most companies do today. >> You mentioned that AI is becoming increasingly important and mission critical or at least, bottom line critical in business models. What are some of the most exciting new uses of AI that you're seeing and that you hope expands into the larger industry? >> Sure. It really does cross a number of domains. We work with a retailer, ASOS. Every time we get to chat with them it's a very interesting use on how they have completely customized the shopping experience from how they layout the page based on your interest and preference through to how the search terms come back based on seasonality of what you're looking at based on what they've learnt about your purchase patterns over time, your sex, et cetera. And so I think this notion of like, intensely customized customer experiences is playing out everywhere. We've seen it on the other side in engine design and preventative maintenance. Where we've got certain customers now that are selling engine hours as opposed to engines themselves. And so if there's an engine hour that they can't provide that's a big deal and so they want to get ahead of any maintenance issue they can and they're using models to predict when a particular maintenance event is going to be required and getting ahead of that through to athletes and injury prevention. We're now seeing all the way down to connected clothing and athletic gear where all the way down, not just at the professional level, but it's starting to come down to the club level on athletes as they're playing, starting to realize that, oh, something's not quite right, I want to get ahead of this before I have a more serious injury. And so we've seen it in a number of domains almost every new customer I'm talking with. I'm excited by what they're doing in this area. >> Well, you bring up an interesting challenges. I've heard Microsoft is really I guess verticalizing around certain industries to put solutions together. One of the challenges we saw, you know, we saw surveys of big data. The number use case came back was always custom and it was like, oh, okay, well how do I templatize and allow hundreds of customers to do this not every single project is a massive engagement. What are you seeing that we're learning from the past and it feels like we're getting over that hump a little bit faster now than we were a few years ago. >> Yeah, so if I heard you correctly, it's a little bit loud so you're saying everything started at custom? And how do we get past that? And I think it actually goes back to what we're talking about earlier with this notion of a common understanding of data because what was happening is everybody felt they had bespoke data or we had data that was speaking about the same domains and terms, but we didn't agree on anything, so we spent a ton of time in the bespoke or custom arena of integrating, cleaning, transforming, before we could even get to model building or before we could get to any kind of innovation on the data itself and so I think one of the things is realizing that a lot of these domains we're trying to solve similar problems, we all have similar data. The more we can get to a common understanding of the data that we have, the more you can see higher level re-usable components being built, saying, "Ah, I know how to work on customer data" "I know how to work on sales data" "I know how to work on, you know, oil and gas data" whatever it might be, you'll probably start to see things come up in industry verticals as well. And I think it's that motion, like we had the same problem years ago when we talked about log files. Before there was logging standards, everything was a custom solution, right? Now we have very rich solutions for understanding IT infrastructure et cetera that usually became because we had a better base line for the understanding of the data we had. >> Great. Mike Thank you so much for coming on theCUBE. It was a pleasure having you. >> Thank you for having me. >> I'm Rebecca Knight for Stu Miniman, we will have more of theCUBE's live coverage of Microsoft Ignite coming up just after this. (techno music)
SUMMARY :
Brought to you by Cohesity He is the Principal Group Product Manager I am, yeah thanks for having me back. what are you workin' on right now? of, you know, once you I would love to have you define. of the decisions you're making of all the Microsoft product in the space, if you will. and the products today? the data platform to some degree. that you referenced. and in a lot of cases the and we talk about, you know, all the data you need because, you know, that you hope expands and getting ahead of that One of the challenges we saw, you know, of the data that we have, Mike Thank you so much of Microsoft Ignite
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Mike Flasko | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Mike | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
20 years | QUANTITY | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
ASOS | ORGANIZATION | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Orlando, Florida | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
CUBE | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
Brick Data Technologies | ORGANIZATION | 0.99+ |
theCUBE | ORGANIZATION | 0.98+ |
Monday | DATE | 0.98+ |
first | QUANTITY | 0.98+ |
Cohesity | ORGANIZATION | 0.98+ |
One | QUANTITY | 0.97+ |
hundreds of customers | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
Ignite | EVENT | 0.96+ |
Data Bricks | ORGANIZATION | 0.96+ |
one | QUANTITY | 0.96+ |
first one | QUANTITY | 0.95+ |
Azure Data Factory | ORGANIZATION | 0.95+ |
Azure Data Catalog | TITLE | 0.92+ |
few years ago | DATE | 0.91+ |
Spark | ORGANIZATION | 0.9+ |
first thing | QUANTITY | 0.85+ |
SAP | ORGANIZATION | 0.8+ |
single project | QUANTITY | 0.77+ |
Microsoft Ignite | ORGANIZATION | 0.72+ |
years | DATE | 0.72+ |
SAS | TITLE | 0.7+ |
Ignite 2018 | EVENT | 0.52+ |
ton | QUANTITY | 0.41+ |
Jerry Held, Informatica | Informatica World 2018
>> Announcer: Live, from Las Vegas, it's theCUBE! Covering Informatica World 2018. Brought to you by Informatica. >> Hello welcome back everyone, we're here for exclusive CUBE coverage here at The Venetian in Las Vegas, day one, getting setup for the exclusive pavilion opening kick-off party, we've been here all day, for two days, I'm John Furrier your host of theCUBE, with Peter Burris, analyst at SiliconANGLE, we keep on theCUBE, our next guest, Board Member of Informatica for 10 years now, CUBE alumni Jerry Held, industry legend, veteran, been there done that, seen all the ways of innovation, Jerry great to see you, thanks for coming on! >> Great to be here, nice to see you guys again! >> Ten years at the board for Informatica, a lot, that's like, how many waves can happen in 10 years, what's been the journey, what's been your view? You're in all the board meetings approving all the, all the hires and stock option grants, and all the action, you see in the front row what's happening, what's the story? >> Well, it's been a great ride, it's an interesting company, I've been on a lot of boards, I've lost count of how many, both startups and then big public company boards, but Informatica's been a really fun ride. When I joined, we're goin' through super growth, my really good friend Sohaib Abbasi was running the place, he had a phenomenal 10 year run, I think 36 quarters of record growth in profit, just unbelievable, took it from an ETL company back when it started, to a full data integration company, kind of went from the first phase of data to the second phase where it was more than just moving data to a data warehouse, but all phases of data integration, and that was terrific, and then we got a point where it was time for another phase, and lot of things were happening, not only in terms of where the company was going, but where the industry was going. >> And what year was that, when that happened? >> So that was about two and a half, three years ago, when we decided that the best route was actually to go private, because some of the transitions were going to be pretty profound, for instance, just the model of selling software going from license to subscription, requires a dip in revenue, it requires restructuring your field, a lot of changes. >> John: A lot of product work? >> Yeah, yeah, so, we did go very successfully, we went private, and I don't know for some reason, they asked me to stay on the board through that transition, (John and Peter laughing) >> and it's been interesting being on the private side, now that it's a private company, it's run differently, we have some great private equity firms who are the investors, owners of the company, the board makeup is completely different, and we have a lot of people with a financial look at the company, but they're growth investors, they're some PE firms, that come in, and take a company apart, just try to get the most they can out of it. Luckily the investors that we've found believed in the future of the company, it's a growth company, it just needed to go through some restructuring. We're also really fortunate, when Sohaib decided it's time to retire, to promote Anil to be the CEO, and he's turned out to be fantastic, and we've had a number of changes really bringing in some fresh blood, new people into positions, really strengthen the team, and in the last couple years of Sohaib's tenure, he put a real focus on innovation, because we had gone down a path of requiring a number of pieces, putting them all together, but the innovation had sort of slowed down, well he started the process, and it really picked up speed through this transition, so the company has come out with a series of really new, innovative, products. So now, Informatica's like one of the hottest pre-IPO companies in the industry, if you think about enterprise software companies, >> Yeah, and we talk, I mean we've been here, watchin' from three, four years ago, we talked to Anil before he was the CEO, and he was doing the products, they brought in some product people, and they did the work, they buckled down. Okay, so I got to step back, before we came, before you came on, you and I were talking about waves, and you've seen waves, the relational database wave, and our comment was, people tend to poo-poo them, well that's never going to happen, eh, it's never going to happen. So I got to ask you, take us through what the wave is right now, what're you excited about because certainly, there's no doubt that commerce on this scalable cloud has opened up a new, kind of a new aperture, if you will, of opportunity, and it's impacting everybody, data is not just a category, it's fundamental in the fabric of this next big wave, what is your vision of this wave, what's exciting you? Take us through that. >> Well as we were saying, I've been in this data management business for 50 years now, so I've seen a lot of waves, 'about every 10 years you get a big wave, and I was there back at the birth of the relational database client server, and web, and cloud, and SaaS, and all these, each one, when they start, people poof--ooh, it's very cool, but, never take off, and there's a lot of people who miss great opportunities at the beginning of a wave, now we're clearly well into the cloud wave, as I think most people realize it's for real, and there's a lot happening. The one that I'm most excited about right now is in what I would call, you know we've had DBMS, Database Management Systems, I'll make up a new term right here, I've never used this before, so this is a first on your show >> John: Exclusive. >> Instead of DBMS, how about DAMS, Data Asset Management System? That's what we need. We have got such a proliferation of data, relational databases, no sequel databases, hadoop databases, we've got structured, unstructured, text, image, every kind of data, but it's proliferating at an amazing rate, right, we've got all kinds of types of data, sources of data, users of data, people now want to use data, not just the IT people but end-users, but it's out of control. We have this asset, and everybody talks about it, you can see here at these session, what's going to transform your business? Data, data disruption. But it's out of control. Nobody knows where the data is. Ask the CFO where all the financial assets, they can show you the spreadsheets, they can show you the reports, ask the Chief Information Officer, Chief Data Officer, they can't tell you. So what we need to do is manage the data asset, and how are we going to do that? As far as I'm concerned the single-most exciting thing coming out of Informatica, and there's a lot of exciting things at this conference, far-and-away to me the most exciting thing Enterprise Data Catalog. That is a Data Asset Management System, it allows you to look across every type of data in the enterprise, on-prem, in the cloud, all kinds of data, and get your hands around it, and you need to do it for two big reasons. One: Risk reduction, and two is: Reward enhancement, in other words, you have a way to reduce risk, improve governance, and, where you can just look at the news everyday, Facebook, GDPR, which is coming-- >> Friday. >> this week, this week is very timely, Europe is way ahead of us here, they're forcing companies to get their act together, but how do you do it? You need to get your act together on managing the data asset it's not managing the actual data, it's managing the metadata, where is the data, who has access to it, what's the security, how many copies do you have, how many different views of a customer do you have that are inconsistent? The way you need to do that is through an enterprise data catalog, and Informatica has a super exciting product, the most exciting products in the 10 years I've been on the board, this is the single most exciting product the company's come out with. >> Sounds like your bullish on this one, so we'll put that as a check-mark on that one. Let me ask you a question just to kind of take that to the next level. Jerry, what is this order of magnitude impact, in your opinion, obviously it's a big wave, can you kind of just give us a perspective, waves have multi-year lives, sometimes 10 plus years, Pat Gelsinger, former intel, would always talks about waves, sometimes they're 10, 20 year waves, what is the impact of this one, specifically around the catalog, what's it going to impact, order of magnitude, share your color commentary on how you see it shaping out. >> Well it's going to have these two huge impacts, let's just talk about on the risk reduction side, on the governance side, I mean, think about the potential impact, to Facebook, of losing control of their data, that company could well get split up. I mean there's a lot of talk about splitting up, how big an impact is that? Pretty damn big, right? >> Pretty big, yeah. >> I mean, it's huge. >> Yeah, billions, trillions. >> Yeah, and those kinds of risks are out there, and they've reached a point where the public, the government, is no longer willing to put up with it. Now think about the rewards side of it, the positive side. If you can get control over your data, and now you're doin' all this great analytics, people create data lakes, you know what's in those data lakes? Most of them are data swamps. They put a lot of data in there, but they don't know what's there. If you could take all that data in the data lakes, plus the stuff you have in the cloud, plus stuff you have other places, and now you want to answer that hard question. Get your analysts to be way more productive. How important is it when you get that insight, how do you measure the business value? I'm sure on your show you've had dozens of people give you a specific instance of oh, look what I did with Tableau, great product, I did all this stuff, and I discovered this, and I changed my business, right? You've had that? >> Oh, insights, come out of the woodwork, everywhere. >> Okay, however, ask the question, How many insights didn't come out, because these analysts didn't know where the data is, they didn't have access to all this data? They did find something, but think about what they could have found if they had a complete view of all the enterprise data, and how it related to all the other data coming from social media and everything. So, what's the value of an enterprise data catalog? I think it's enormous, enormous! >> Peter: But Jerry it's, so I think that's an interesting game, thought experiment, but if I were to combine that with another thing that excites me about what I'm hearing this week, the reality is there aren't enough analysts in the world to find it all. When we start applying machine learning to the process of creating, maintaining, sustaining, the understanding of the data assets, reforming, reforging data assets, ensuring that we are, not dependent on a manual processes in a catalog, it's that combination that makes it possible to actually augment the way that human beings look at these things. Ultimately these types of systems are going to provide options to the business. >> Yeah, and you hit it on an absolutely key point, what does it take to have a great data catalog? There are a number of companies that are trying to do data catalogs, some of 'em are doing small pieces, cataloging bits and pieces of the enterprise, interesting, but the word enterprise is key, you need something that spans the entire enterprise. And when you get that complicated, the human brain can't deal with it, so, you've hit on maybe on the most important points, you must have an enterprise data catalog that's based around a AI, machine learning, at least tool assistant. You're going to still have people that are going to be curating, you're going to have people that are going to be adding glossaries and all kinds of things, but at the core, there's so much data that you need to take the machine learning technology that's moving along quite quickly, and try to figure out what are all these relationships? That is at a core component of it. >> So we talked, so I want to throw this at you, you tell me if you agree with me. What that comes down to is, if everybody talks about AI, you talked about it earlier, taking jobs away, doing the work, increasingly I think we're going to look at AI as a technology that provides humans options, better forged, better formulated, well structured options, based on data, and that increasingly the thing about creating data value is, is your system creating new classes of options for pursuing the value of data, and this combination thing, AI, augmenting, by presenting options to human decision makers so that they can look at all that range, all those possible vectors that they could be pursuing, and choose the ones that are most attractive. >> Yeah I think there's two things-- >> Does that make sense? >> So there's two parts to it, one: you're exactly right, you can augment and give choices, but before it does that, it can eliminate a massive amount of just grunge work, most analysts, this is a well documented fact, most analysts spend 80% of their time in data prep, and 20% in analysis, that's pretty well industry standard right now, if you're doin' better than that you're doin' great. And what you can do, if you do the right form of cataloging get the data organized and then you use things like MDM, and data quality to cleanse it. Now you get to the point where the analyst is doing analysis and they're doing things, number one: That are more interesting, number two: That are more productive, and number three: That are going to have a bigger effect on your bottom line. >> Peter: Right, right. >> Let's talk about the role of data when it comes to IOT Edge for instance, in the cloud, okay this is now, 'cause of the scale, you mentioned the scale with AI, that helps with the scale of data coming in, you got that, now a customer's looking at an architectural shift with cloud, multi-cloud, and IOT whether it's Edge, or whatever that's defined as. How does the cataloging and the data vision you put forth, impacted by that, accelerates it, does it change it radically for the buyer, the user, the enterprise, how does that enterprise customer think about--? >> Well, it's another important source, so we have all these different sources of data, and a growing source is going to be IOT data, and if it's streaming in, going in to some repository, it needs to be cataloged, and correlated, with the rest of the data in your enterprise. Right now, a lot of IOT data is just going into some system off to the side, not correlated with the mainstream data. The thing that, I think is the big shift, when you go from DBMS, Database Management System, we're focused typically on a single data, whether it's IOT data, or it could be accounting data, the focus was on just that data, the difference with Data Asset Management System is think about your data as a whole, across your whole enterprise. >> A portfolio. >> The whole, the whole of your data asset, how do you manage that, it's not the bits and bytes, it's the overall thing, it's not the actual data, it's actually the metadata that you're managing. >> Or it's the data as it's being used, and the metadata describes data that's being used, so data, like anything else, you apply it to work, it generates value. Metadata describes how it's being applied, and then the underlying data elements are given context and semantic richness by the metadata. >> Jerry: Exactly, exactly. >> Alright so here, I'll throw out the old, if I'm Joe six-pack out in the street, I hear catalog, I go whoa! >> Yeah, he's talkin' about this stuff all the time! (chuckles) >> I go whoa, catalog? In my mind I get a mental model of a centralized database, I think hacker! 'Cause you know, government and all the hacks goin' on, you know, decentralized data's probably better, distributed data? So I hear catalog, my mind goes centralized, is that the right way to think about it, or obviously, I mean share, because security's critical on this. >> Absolutely, and so as you bring this view of data, just like when you have your financial books, where you have a central view of all your financial assets, there needs to be security, you have to have, allow access for people for the appropriate level of information that they're going to pull out, the data asset is no different. So you want to have a full view of all of your data, and you want to have ways to allow and restrict access to the information, it's not the data, it's just where is the data, and each of the data systems have secondary-- >> So it's not centralized, it's just metadata for visibility and auditing-- >> I think there's an important point, and I want to test this on you, 'cause you're askin' a great question. The information model from IBM, we used to, we've had catalogs with databases, we've had catalogs all over the place, highly stylized processes, stylized data, stuck in a catalog. One of the things that's especially interesting, is not the idea that we're going to start with a whole bunch of designs and put them in the catalog, but we're going to discover stuff about our data, and the catalog will emerge out of the attributes of data, and how it's working and how it's being used. >> If you, let's rewind back-- >> John: So the answer is no not centralized? >> Well, but it's not-- >> Peter: The metadata may be so much centralized, but the data's not. >> It's not a, we're not trying to do a centrally-designed architecture, so let's rewind 50 years, and go back to the beginning of relational databases, we had schemas, and back in the '70s, people were talkin' about, oh, let's come up with the schema for the corporation, we'll have one group go off, and they'll design everything: failed. Then they had data dictionaries where they were going to put it all in place: failed. And all of these things, where there was an attempt to centrally define and control the structure of data around the enterprise: failed. That is not what we're talking about. Data exists in all forms, with all sorts of schemas and definitions, and all types of databases in Oracle and SAP, and everything all we're doing is taking the metadata and relating it-- >> Peter: Allowing it to merge! >> So that we have a view of where everything is, that data's different than this data, it's managed by different software, but we have one view so that now, when an analyst wants to know how do I get the latest information on customer preferences for purchasing this? I can go here, here, and here, and I'll correlate those, and I'll pull 'em together with some tool. >> Final question, final question for you. If you think that to next level, you're implying, or actually saying, that philosophy of a catalog, implies that it's okay to have a zillion databases, I might have a post-risk database on this application, I might have an unstructured database over here, so in the future world, where we're living in a tsunami of data, apps need databases. So the idea of-- >> And they got to be different. >> And they're going to be a zillion, yeah, a lot of different databases proliferating is not a bad thing under your model. >> Absolutely, and we've tried having one answer, it doesn't work. And even if you ever could get a company, a large company you can't do it, but if you get a company that'd get one form, then they do an acquisition, and now they got other forms. That concept just doesn't work, it has to be a heterogeneous world, and you have to have a way to pull the pieces together, and that's why, just as a final point, I think what Informatica has done with this data enterprise data catalog, which is a phenomenal product, still early days, but growing at a phenomenal rate, fastest growth of any product ever. You need a company that's independent, that's not a stack company, it's not an Oracle or an SAP, it's not a cloud company, or an AWS, or an Azure, or Google, it's not a SaaS company, it's somebody who is the Switzerland of data, who can take data from every place, and just collect that metadata, and it has to be a company that understands machine learning and AI, that can use it to pull it together. >> And they got to work with the clouds too, they got to work with all the clouds. >> And it has to be a company that has interfaces to everything, which is what Informatica is, so it's a perfect fit. >> And it's not going to try to then use that to exact significant control over how everything operates. >> Exactly, and it's not trying to sell you an application, or a database, so, you need that Switzerland, and I think that's why, to me, in the 10 years that I've been on the board, I haven't seen a more exciting product, nor have I seen a customer reaction as dramatic as this, every customer's talking about EDC, and if they haven't before this conference, they will after this conference. (laughs) >> And the timing is critical on this too, talk about timing, the tailwinds for this movement right now, more than ever, sometimes timing is-- >> This week is a, I mean GDPR is a big deal, a big deal, >> It's a signal. >> And what's goin' on with Facebook and others is a big deal so, the timing is appropriate, and the product is fantastic, and I think it's going to be, when we look back next year, and we do this show. >> (laughing) That's great, we have nine years of history, you go back and say hey, 'member you said that? Right? Data is the central strategic asset not some corner case, GDPR is a signal, it's a shot across the bow, for all companies to get in the center. We coined the new term Database Asset Management System. >> No Data Asset Management System. >> Data Asset Management System. >> And we actually have research on that from a couple years ago. >> Okay, well we here, exclusive on theCUBE here, Data Asset Management System, asset is data, it's going to be worth money, it's going to be on the balance sheet soon. theCUBE is here, out in the open, Informatica World 2018. Jerry Held, Board Member, bringing his insight, thank you for sharing the data on theCUBE, we'll be back with more, stay with us, after this short break. (bubbly music)
SUMMARY :
Brought to you by Informatica. and then we got a point where it was time for another phase, just the model of selling software and in the last couple years of Sohaib's tenure, it's fundamental in the fabric of this next big wave, is in what I would call, you know we've had DBMS, and you need to do it for two big reasons. it's managing the metadata, where is the data, take that to the next level. on the governance side, I mean, plus the stuff you have in the cloud, and how it related to all the other data the reality is there aren't enough analysts in the world Yeah, and you hit it on an absolutely key point, and that increasingly the thing about get the data organized and then you use things like How does the cataloging and the data vision you put forth, and if it's streaming in, going in to some repository, it's actually the metadata that you're managing. and the metadata describes data that's being used, is that the right way to think about it, or obviously, and each of the data systems have secondary-- and the catalog will emerge out of the attributes of data, but the data's not. and go back to the beginning of relational databases, how do I get the latest information so in the future world, And they're going to be a zillion, yeah, and you have to have a way to pull the pieces together, And they got to work with the clouds too, And it has to be a company And it's not going to try to then use that Exactly, and it's not trying to sell you an application, and I think it's going to be, when we look back next year, Data is the central strategic asset not some corner case, And we actually have research on that asset is data, it's going to be worth money,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Jerry Held | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Jerry | PERSON | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
50 years | QUANTITY | 0.99+ |
Sohaib Abbasi | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
Pat Gelsinger | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
John Furrier | PERSON | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
10 year | QUANTITY | 0.99+ |
second phase | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
two parts | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
Ten years | QUANTITY | 0.99+ |
first phase | QUANTITY | 0.99+ |
two days | QUANTITY | 0.99+ |
80% | QUANTITY | 0.99+ |
10 years | QUANTITY | 0.99+ |
nine years | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
36 quarters | QUANTITY | 0.99+ |
Anil | PERSON | 0.99+ |
10 plus years | QUANTITY | 0.99+ |
Friday | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
single | QUANTITY | 0.99+ |
Joe | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.98+ |
this week | DATE | 0.98+ |
This week | DATE | 0.98+ |
One | QUANTITY | 0.98+ |
four years ago | DATE | 0.98+ |
GDPR | TITLE | 0.98+ |
one answer | QUANTITY | 0.98+ |
first | QUANTITY | 0.97+ |
Switzerland | LOCATION | 0.97+ |
three years ago | DATE | 0.97+ |
one form | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
dozens | QUANTITY | 0.96+ |
SAP | ORGANIZATION | 0.96+ |
Sohaib | PERSON | 0.96+ |
each | QUANTITY | 0.95+ |
10 | QUANTITY | 0.95+ |
three | DATE | 0.95+ |
CUBE | ORGANIZATION | 0.95+ |
theCUBE | ORGANIZATION | 0.93+ |
intel | ORGANIZATION | 0.93+ |
about two and a half | DATE | 0.93+ |
Tableau | ORGANIZATION | 0.92+ |
SiliconANGLE | ORGANIZATION | 0.91+ |
one view | QUANTITY | 0.9+ |
Informatica World 2018 | EVENT | 0.9+ |
billions, | QUANTITY | 0.9+ |
Satyen Sangani, Alation | Big Data SV 2018
>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. (upbeat music) >> Welcome back to theCUBE, I'm Lisa Martin with John Furrier. We are covering our second day of our event Big Data SV. We've had some great conversations, John, yesterday, today as well. Really looking at Big Data, digital transformation, Big Data, plus data science, lots of opportunity. We're excited to welcome back to theCUBE an alumni, Satyen Sangani, the co-founder and CEO of Alation. Welcome back! >> Thank you, it's wonderful to be here again. >> So you guys finish up your fiscal year end of December 2017, where in the first quarter of 2018. You guys had some really strong results, really strong momentum. >> Yeah. >> Tell us what's going on at Alation, how are you pulling this momentum through 2018. >> Well, I think we have had an enterprise focused business historically, because we solve a very complicated problem for very big enterprises, and so, in the last quarter we added customers like American Express, PepsiCo, Roche. And with huge expansions from our existing customers, some of whom, over the course of a year, I think went 12 X from an initial base. And so, we found some just incredible momentum in Q4 and for us that was a phenomenal cap to a great year. >> What about the platform you guys are doing? Can you just take a minute to explain what Alation does again just to refresh where you are on the product side? You mentioned some new accounts, some new use cases. >> Yeah. >> What's the update? Take a minute, talk about the update. >> Absolutely, so, you certainly know, John, but Alation's a data catalog and a data catalog essentially, you can think of it as Yelp or Amazon for data and information side of the enterprise. So if you think about how many different databases there are, how many different reports there are, how many different BI tools there are, how many different APIs there are, how many different algorithms there are, it's pretty dizzying for the average analyst. It's pretty dizzying for the average CIO. It's pretty dizzying for the average chief data officer. And particularly, inside of Fortune 500s where you have hundreds of thousands of databases. You have a situation where people just have too much signal or too much noise, not enough signal. And so what we do is we provide this Yelp for that information. You can come to Alation as a catalog. You can do a search on revenue 2017. You'll get all of the reports, all of the dashboards, all of the tables, all of the people that you might need to be able to find. And that gives you a single place of reference, so you can understand what you've got and what can answer your questions. >> What's interesting is, first of all, I love data. We're data driven, we're geeks on data. But when I start talking to folks that are outside the geek community or nerd community, you say data and they go, "Oh," because they cringe and they say, "Facebook." They see that data issues there. GDPR, data nightmare, where's the store, you got to manage it. And then, people are actually using data, so they're realizing how hard (laughs) it is. >> Yeah >> How much data do we have? So it's kind of like a tropic disillusionment, if you will. Now they got to get their hands on it. They've got to put it to work. >> Yeah. >> And they know that So, it's now becoming really hard (laughs) in their mind. This is business people. >> Yeah. >> They have data everywhere. How do you guys talk to that customer? Because, if you don't have quality data, if you don't have data you can trust, if you don't have the right people, it's hard to get it going. >> Yeah. >> How do you guys solve that problem and how do you talk to customers? >> So we talk a lot about data literacy. There is a lot of data in this world and that data is just emblematic of all of the stuff that's going on in this world. There's lots of systems, there's lots of complexity and the data, basically, just is about that complexity. Whether it's weblogs, or sensors, or the like. And so, you can either run away from that data, and say, "Look, I'm going to not, "I'm going to bury my head in the sand. "I'm going to be a business. "I'm just going to forget about that data stuff." And that's certainly a way to go. >> John: Yeah. >> It's a way to go away. >> Not a good outlook. >> I was going to say, is that a way of going out of business? >> Or, you can basically train, it's a human resources problem fundamentally. You've got to train your people to understand how to use data, to become data literate. And that's what our software is all about. That's what we're all about as a company. And so, we have a pretty high bar for what we think we do as a business and we're this far into that. Which is, we think we're training people to use data better. How do you learn to think scientifically? How do you go use data to make better decisions? How do you build a data driven culture? Those are the sorts of problems that I'm excited to work on. >> Alright, now take me through how you guys play out in an engagement with the customer. So okay, that's cool, you guys can come in, we're getting data literate, we understand we need to use data. Where are you guys winning? Where are you guys seeing some visibility, both in terms of the traction of the usage of the product, the use cases? Where is it kind of coming together for you guys? >> Yeah, so we literally, we have a mantra. I think any early stage company basically wins because they can focus on doing a couple of things really well. And for us, we basically do three things. We allow people to find data. We allow people to understand the data that they find. And we allow them to trust the data that they see. And so if I have a question, the first place I start is, typically, Google. I'll go there and I'll try to find whatever it is that I'm looking for. Maybe I'm looking for a Mediterranean restaurant on 1st Street in San Jose. If I'm going to go do that, I'm going to do that search and I'm going to find the thing that I'm looking for, and then I'm going to figure out, out of the possible options, which one do I want to go to. And then I'll figure out whether or not the one that has seven ratings is the one that I trust more than the one that has two. Well, data is no different. You're going to have to find the data sets. And inside of companies, there could be 20 different reports and there could be 20 different people who have information, and so you're going to trust those people through having context and understanding. >> So, trust, people, collaboration. You mentioned some big brands that you guys added towards the end of calendar 2017. How do you facilitate these conversations with maybe the chief data officer. As we know, in large enterprises, there's still a lot of ownership over data silos. >> Satyen: Yep. >> What is that conversation like, as you say on your website, "The first data catalog designed for collaboration"? How do you help these organizations as large as Coca-Cola understand where all the data are and enable the human resources to extract values, and find it, understand it, and trust it? >> Yeah, so we have a very simple hypothesis, which is, look, people fundamentally have questions. They're fundamentally curious. So, what you need to do as a chief data officer, as a chief information officer, is really figure out how to unlock that curiosity. Start with the most popular data sets. Start with the most popular systems. Start with the business people who have the most curiosity and the most demand for information. And oh, by the way, we can measure that. Which is the magical thing that we do. So we can come in and say, "Look, "we look at the logs inside of your systems to know "which people are using which data sets, "which sources are most popular, which areas are hot." Just like a social network might do. And so, just like you can say, "Okay, these are the trending restaurants." We can say, "These are the trending data sets." And that curiosity allows people to know, what data should I document first? What data should I make available first? What data do I improve the data quality over first? What data do I govern first? And so, in a world where you've got tons of signal, tons of systems, it's totally dizzying to figure out where you should start. But what we do is, we go these chief data officers and say, "Look, we can give you a tool and a catalyst so "that you know where to go, "what questions to answer, who to serve first." And you can use that to expand to other groups in the company. >> And this is interesting, a lot of people you mentioned social networks, use data to optimize for something, and in the case of Facebook, they they use my data to target ads for me. You're using data to actually say, "This is how people are using the data." So you're using data for data. (laughs) >> That's right. >> So you're saying-- >> Satyen: We're measuring how you can use data. >> And that's interesting because, I hear a lot of stories like, we bought a tool, we never used it. >> Yep. >> Or people didn't like the UI, just kind of falls on the side. You're looking at it and saying, "Let's get it out there and let's see who's using the data." And then, are you doubling down? What happens? Do I get a little star, do I get a reputation point, am I being flagged to HR as a power user? How are you guys treating that gamification in this way? It's interesting, I mean, what happens? Do I become like-- >> Yeah, so it's funny because, when you think about search, how do you figure out that something's good? So what Google did is, they came along and they've said, "We've got PageRank." What we're going to do is we're going to say, "The pages that are the best pages are the ones "that people link to most often." Well, we can do the same thing for data. The data sources that are the most useful ones are the people that are used most often. Now on top of that, you can say, "We're going to have experts put ratings," which we do. And you can say people can contribute knowledge and reviews of how this data set can be used. And people can contribute queries and reports on top of those data sets. And all of that gives you this really rich graph, this rich social graph, so that now when I look at something it doesn't look like Greek. It looks like, "Oh, well I know Lisa used this data set, "and then John used it "and so at least it must answer some questions "that are really intelligent about the media business "or about the software business. "And so that can be really useful for me "if I have no clue as to what I'm looking at." >> So the problem that you-- >> It's on how you demystify it through the social connections. >> So the problem that you solve, if what I hear you correctly, is that you make it easy to get the data. So there's some ease of use piece of it, >> Yep. >> cataloging. And then as you get people using it, this is where you take the data literacy and go into operationalizing data. >> Satyen: That's right. >> So this seems to be the challenge. So, if I'm a customer and I have a problem, the profile of your target customer or who your customers are, people who need to expand and operationalize data, how would you talk about it? >> Yeah, so it's really interesting. We talk about, one of our customers called us, sort of, the social network for nerds inside of an enterprise. And I think for me that's a compliment. (John laughing) But what I took from that, and when I explained the business of Alation, we start with those individuals who are data literate. The data scientists, the data engineers, the data stewards, the chief data officer. But those people have the knowledge and the context to then explain data to other people inside of that same institution. So in the same way that Facebook started with Harvard, and then went to the rest of the Ivies, and then went to the rest of the top 20 schools, and then ultimately to mom, and dad, and grandma, and grandpa. We're doing the exact same thing with data. We start with the folks that are data literate, we expand from there to a broader audience of people that don't necessarily have data in their titles, but have curiosity and questions. >> I like that on the curiosity side. You spent some time up at Strata Data. I'm curious, what are some of the things you're hearing from customers, maybe partners? Everyone used to talk about Hadoop, it was this big thing. And then there was a creation of data lakes, and swampiness, and all these things that are sort of becoming more complex in an organization. And with the rise of myriad data sources, the velocity, the volume, how do you help an enterprise understand and be able to catalog data from so many different sources? Is it that same principle that you just talked about in terms of, let's start with the lowest hanging fruit, start making the impact there and then grow it as we can? Or is an enterprise needs to be competitive and move really, really quickly? I guess, what's the process? >> How do you start? >> Right. >> What do people do? >> Yes! >> So it's interesting, what we find is multiple ways of starting with multiple different types of customers. And so, we have some customers that say, "Look, we've got a big, we've got Teradata, "and we've got some Hadoop, "and we've got some stuff on Amazon, "and we want to connect it all." And those customers do get started, and they start with hundreds of users, in some case, they start with thousands of users day one, and they just go Big Bang. And interestingly enough, we can get those customers enabled in matters of weeks or months to go do that. We have other customers that say, "Look, we're going to start with a team of 10 people "and we're going to see how it grows from there." And, we can accommodate either model or either approach. From our prospective, you just have to have the resources and the investment corresponding to what you're trying to do. If you're going to say, "Look, we're going to have, two dollars of budget, and we're not going to have the human resources, and the stewardship resources behind it." It's going to be hard to do the Big Bang. But if you're going to put the appropriate resources up behind it, you can do a lot of good. >> So, you can really facilitate the whole go big or go home approach, as as well as the let's start small think fast approach. >> That's right, and we always, actually ironically, recommend the latter. >> Let's start small, think fast, yeah. >> Because everybody's got a bigger appetite than they do the ability to execute. And what's great about the tool, and what I tell our customers and our employees all day long is, there's only metric I track. So year over year, for our business, we basically grow in accounts by net of churn by 55%. Year over year, and that's actually up from the prior year. And so from my perspective-- >> And what does that mean? >> So what that means is, the same customer gave us 55 cents more on the dollar than they did the prior year. Now that's best in class for most software businesses that I've heard. But what matters to me is not so much that growth rate in and of itself. What it means to me is this, that nobody's come along and says, "I've mastered my data. "I understand all of the information side of my company. "Every person knows everything there is to know." That's never been said. So if we're solving a problem where customers are saying, "Look, we get, and we can find, and understand, "and trust data, and we can do that better last year "than we did this year, and we can do it even more "with more people," we're going to be successful. >> What I like about what you're doing is, you're bringing an element of operationalizing data for literacy and for usage. But you're really bringing this notion of a humanizing element to it. Where you see it in security, you see it in emerging ecosystems. Where there's a community of data people who know how hard it is and was, and it seems to be getting easier. But the tsunami of new data coming in, IOT data, whatever, and new regulators like GDPR. These are all more surface area problems. But there's a community coming together. How have you guys seen your product create community? Have you seen any data on that, 'cause it sounds like, as people get networked together, the natural outcome of that is possibly usage you attract. But is there a community vibe that you're seeing? Is there an internal collaboration where they sit, they're having meet ups, they're having lunches. There's a social aspect in a human aspect. >> No, it's humanal, no, it's amazing. So in really subtle but really, really powerful ways. So one thing that we do for every single data source or every single report that we document, we just put who are the top users of this particular thing. So really subtly, day one, you're like, "I want to go find a report. "I don't even know "where to go inside of this really mysterious system". Postulation, you're able to say, "Well, I don't know where to go, but at least I can go call up John or Lisa," and say, "Hey, what is it that we know about this particular thing?" And I didn't have to know them. I just had to know that they had this report and they had this intelligence. So by just discovering people in who they are, you pick up on what people can know. >> So people of the new Google results, so you mentioned Google PageRank, which is web pages and relevance. You're taking a much more people approach to relevance. >> Satyen: That's right. >> To the data itself. >> That's right, and that builds community in very, very clear ways, because people have curiosity. Other people are in the mechanism why in which they satisfy that curiosity. And so that community builds automatically. >> They pay it forward, they know who to ask help for. >> That's right. >> Interesting. >> That's right. >> Last question, Satyen. The tag line, first data catalog designed for collaboration, is there a customer that comes to mind to you as really one that articulates that point exactly? Where Alation has come in and really kicked open the door, in terms of facilitating collaboration. >> Oh, absolutely. I was literally, this morning talking to one of our customers, Munich Reinsurance, largest reinsurance customer or company in the world. Their chief data officer said, "Look, three years ago, "we started with 10 people working on data. "Today, we've got hundreds. "Our aspiration is to get to thousands." We have three things that we do. One is, we actually discover insights. It's actually the smallest part of what we do. The second thing that we do is, we enable people to use data. And the third thing that we do is, drive a data driven culture. And for us, it's all about scaling knowledge, to centers in China, to centers in North America, to centers in Australia. And they've been doing that at scale. And they go to each of their people and they say, "Are you a data black belt, are you a data novice?" It's kind of like skiing. Are you blue diamond or a black diamond. >> Always ski in pairs (laughs) >> That's right. >> And they do ski in pairs. And what they end up ultimately doing is saying, "Look, we're going to train all of our workforce to become better, so that in three, 10 years, we're recognized as one of the most innovative insurance companies in the world." Three years ago, that was not the case. >> Process improvement at a whole other level. My final question for you is, for the folks watching or the folks that are going to watch this video, that could be a potential customer of yours, what are they feeling? If I'm the customer, what smoke signals am I seeing that say, I need to call Alation? What are some of the things that you've found that would tell a potential customer that they should be talkin' to you guys? >> Look, I think that they've got to throw out the old playbook. And this was a point that was made by some folks at a conference that I was at earlier this week. But they basically were saying, "Look, the DLNA's PlayBook was all about providing the right answer." Forget about that. Just allow people to ask the right questions. And if you let people's curiosity guide them, people are industrious, and ambitious, and innovative enough to go figure out what they need to go do. But if you see this as a world of control, where I'm going to just figure out what people should know and tell them what they're going to go know. that's going to be a pretty, a poor career to go choose because data's all about, sort of, freedom and innovation and understanding. And we're trying to push that along. >> Satyen, thanks so much for stopping by >> Thank you. >> and sharing how you guys are helping organizations, enterprises unlock data curiosity. We appreciate your time. >> I appreciate the time too. >> Thank you. >> And thanks John! >> And thank you. >> Thanks for co-hosting with me. For John Furrier, I'm Lisa Martin, you're watching theCUBE live from our second day of coverage of our event Big Data SV. Stick around, we'll be right back with our next guest after a short break. (upbeat music)
SUMMARY :
brought to you by SiliconANGLE Media Satyen Sangani, the co-founder and CEO of Alation. So you guys finish up your fiscal year how are you pulling this momentum through 2018. in the last quarter we added customers like What about the platform you guys are doing? Take a minute, talk about the update. And that gives you a single place of reference, you got to manage it. So it's kind of like a tropic disillusionment, if you will. And they know that How do you guys talk to that customer? And so, you can either run away from that data, Those are the sorts of problems that I'm excited to work on. Where is it kind of coming together for you guys? and I'm going to find the thing that I'm looking for, that you guys added towards the end of calendar 2017. And oh, by the way, we can measure that. a lot of people you mentioned social networks, I hear a lot of stories like, we bought a tool, And then, are you doubling down? And all of that gives you this really rich graph, It's on how you demystify it So the problem that you solve, And then as you get people using it, and operationalize data, how would you talk about it? and the context to then explain data the volume, how do you help an enterprise understand have the resources and the investment corresponding to So, you can really facilitate the whole recommend the latter. than they do the ability to execute. What it means to me is this, that nobody's come along the natural outcome of that is possibly usage you attract. And I didn't have to know them. So people of the new Google results, And so that community builds automatically. is there a customer that comes to mind to And the third thing that we do is, And what they end up ultimately doing is saying, that they should be talkin' to you guys? And if you let people's curiosity guide them, and sharing how you guys are helping organizations, Thanks for co-hosting with me.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
PepsiCo | ORGANIZATION | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Satyen Sangani | PERSON | 0.99+ |
John | PERSON | 0.99+ |
American Express | ORGANIZATION | 0.99+ |
Alation | ORGANIZATION | 0.99+ |
Roche | ORGANIZATION | 0.99+ |
Satyen | PERSON | 0.99+ |
thousands | QUANTITY | 0.99+ |
Lisa | PERSON | 0.99+ |
55 cents | QUANTITY | 0.99+ |
Australia | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Coca-Cola | ORGANIZATION | 0.99+ |
2018 | DATE | 0.99+ |
10 people | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Yelp | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
Harvard | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
two | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
2017 | DATE | 0.99+ |
55% | QUANTITY | 0.99+ |
second day | QUANTITY | 0.99+ |
North America | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
today | DATE | 0.99+ |
two dollars | QUANTITY | 0.99+ |
20 different people | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
three years ago | DATE | 0.99+ |
first | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
first quarter of 2018 | DATE | 0.99+ |
20 different reports | QUANTITY | 0.99+ |
three things | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
last quarter | DATE | 0.98+ |
DLNA | ORGANIZATION | 0.98+ |
third thing | QUANTITY | 0.98+ |
Three years ago | DATE | 0.98+ |
each | QUANTITY | 0.98+ |
single | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
1st Street | LOCATION | 0.98+ |
Big Bang | EVENT | 0.98+ |
this year | DATE | 0.98+ |
Strata Data | ORGANIZATION | 0.97+ |
12 X | QUANTITY | 0.97+ |
GDPR | TITLE | 0.97+ |
seven ratings | QUANTITY | 0.96+ |
Alation | PERSON | 0.95+ |
this morning | DATE | 0.95+ |
Big Data SV 2018 | EVENT | 0.94+ |
first data | QUANTITY | 0.94+ |
Teradata | ORGANIZATION | 0.93+ |
10 years | QUANTITY | 0.93+ |
Aaron Kalb, Alation | AWS re:Invent
>> Announcer: Live from Las Vegas, it's theCUBE. Covering AWS Reinvent 2017, presented by AWS, intel, and our ecosystem of partners. >> Welcome back to theCUBE's continuing coverage of AWS Reinvent 2017. This is day two for us. Incredible day one. We had great buzz on day two. Great announcements coming out from AWS today. I'm Lisa Martin with my cohost Keith Townsend, and we're excited to be joined by CUBE alumni, Aaron Kalb, the head of product and a founder of Alation. Welcome back to the show. >> Thanks so much for having me. I'm excited to be here. >> So speaking of excitement, you can hear the buzz behind us. Interesting about Alation, the first data catalog designed for human collaboration. What gap did Alation see in the market five years ago when you started? >> That's a great question, Lisa. So, yeah, we're the first data catalog, period, and we're excited to see a lot of other people kind of using that label, I believe it validates this as a space, and I think that everybody needs, and I think our approach, as you said, was to really to approach it from the human side, to say the data might be generated by machines or stored on machines, but it's not meant to ultimately be consumed by machines. Even if there's algorithms that's pulling it in, it's to ultimately serve human interests. So the goal was to design from the human back and really think, what does this data mean? Can I trust it? Is it gonna drive the processes correctly? >> So Aaron, I have seen that term quite a bit, and data catalog, for me, means one specific thing. Can you kind of wrap that up for us? >> What is a data catalog? >> That's a really great question, Keith, and I think what's interesting is we took a lot of inspiration in the early days actually from Amazon.com, right? So Amazon is an amazing modern product catalog. You can go in, type in English and see a variety of products that match that keyword. And for each one you can see whose bought it before, how many stars did they give it? Is it good? So it helps you find, understand, and trust, and get the right product for your need. We want to do that same thing for data. How do you found a trustworthy data asset, understand what it is, and put it to use? So that's exactly the goal. >> So, a simple problem is I've worked with a ton of researchers in the Big Pharma industry, data across the world basically. And a lot of data sets, repetitive. A team in Germany is working with one set of data, team in New Jersey working with another one, how does your solution help those researchers find the data that they're looking for? >> Exactly right. So the problem is many different data sets, many different things claiming to be true. Some of them are just plain wrong. Sometimes the answer might be one thing in Germany but something else elsewhere, and they're both valid. And so you've hit the nail on the head. The way people use data contains a lot of hints about the way you should use data. So just like Amazon, again, because we're here. And it'll say, oh, customers who bought what you're about to buy also bought this, and that can help you discover something useful. We try to expose we call behavior IO. Let the past behavior of the most knowledgeable people in the organization drive the future behavior. That's a big part of what we do. So one of the things I was reading about you guys on your website and some editorials is, a lot of data lakes fail. Why is that? How is Alation different? >> That's a great question. So I think what's interesting about a data lake is it's kind of like having a huge basement, right? And it can make you adopt a hoarder mentality, you say, oh it's so cheap to store everything, we'll just store it, and then when we need it we'll figure it out then. Well, the truth is, it's not always how it goes. Often you store so many things, it's cheap to store it, but when that actual human who has an actual analytical question they want to answer or an actual business process they want to improve, goes looking for the data, all they see are all these unlabeled boxes. Right? So I think the key is to think about how do you make information searchable, discoverable, understandable, trustworthy? And what's great is a lot of people are migrating from their on-premise data lakes to the Clouds, and obviously (mumbles) a big leader in where that's going. It gives you an opportunity to ask, just like when you move houses to say, let me look at what I've got, and can I adopt an approach? You know, what do I actually need? You might keep it all, but what's gonna be in the top shelf? What's gonna be in the basement? And how do you make everything accessible? >> So Aaron, can you talk a little bit about today's announcements? A lot of machine learning, analytics announcements from AWS. However, I don't know what I already have. So how can I make use of that data? Can you help talk about how Alation helps to leverage some of these new tools from AWS? >> Absolutely. So, we've had a bunch of customers on AWS Stack already, and increasingly so. Fundamentally our customers are people who do analysis. A lot of them are using S3, Redshift, the like. And people are hosting on the Cloud increasingly. And it's exactly the problem you described. It's I know I have it somewhere, but I can't get my head around what I already have. What region is it in? >> Aaron: Exactly. >> Is it in a region, is it in my data center, where is it? >> Exactly. so whether that data is in Redshift, in S3, or somewhere else. Maybe it's, you know, in a Postgres or SQL Server or Oracle Server. (mumbles) hosted one. Whatever it is, we crawl and index everything you have, just the way Google crawls and indexes everything out on the web, and we make it searchable, and we put information about who's used it and how good it is front and center, just the way you can say, oh this is a five-star clock on Amazon, I'm gonna go click buy it now. >> So one challenge with data lakes is security around that data. So data catalog, I get meta data around the data that I have, but some of that data is sensitive. How do you guys handle security around the data catalog itself? >> Absolutely. So we respect all the security and privacy settings that exist that are on the data itself, and we just sort of surface those in the catalog. Some of our customers say, look, we want to let people know what exists so they can ask for permission. Others say, even having awareness of this data is too much for us. And you mentioned, Pharma, that'll vary by industry. >> Where do you guys get involved in the customer conversation? You said many customers of yours are already using AWS for different things, but where does Alation come into the conversation? Are you brought in by AWS? Are you brought in by customers? Where are they on this journey towards leveraging the Cloud for the things that they need, agility, the speed, and the cost reduction? >> Absolutely. So our promise is we help you find, understand, and trust your data wherever it lives and whoever you are, democratizing it. So customers choose the right infrastructure for their needs, given cost, given performance. Obviously Amazon is increasingly a part of that. But that's a choice they make, and we resolve to handle that wherever it is. And as of customers, our customers are so smart, we learn so much from them. We're meeting a bunch of CIOs, both the prospects and also talking some current customers like Expedia today here at AWS lunch with our investor Costanoa and another at dinner tonight. And folks like Chegg and Invoice2go who've been longstanding AWS customers using S3, using Redshift, and actually in Chegg's case, they have a lot of homegrown tooling that they developed on the backend, but they said Alation is the best place to surface that and have it be the central portal for business users and analysts who might not be able to otherwise access things that are just available via (mumbles) >> So how are you, Alation, and AWS helping a customer like Chegg extract ROI quickly? >> Yeah, it's a great question, so, AWS is really great for cost containment. You have all this data and all this processing, but you have peaks and you have troughs, and how do you make sure you're not overpaying (mumbles) so it's great for helping with storage and computation. And Alation helps with the human side, how do you get that upside by saying you have this data, that could effect the way you stock your shelves, the way you price your products or who you hire, what markets you go into. And that requires that last step. If you have the data but it isn't in the right hands at the right time or it's interpreted incorrectly, it has no value. So the two of them together (mumbles) end-to-end solution. >> So Aaron, with GDPR coming up quick, the enforcement of that coming up May 2018, customers have to be concerned about having data they shouldn't have. Does Alation help identify some of that data? >> Absolutely. So data catalog is fundamentally an inventory of everything you have, plus information about how it has been and could be consumed. We very much focus on the upside, potential of using that to drive better business choices and better analysis. But we have customers actually saying, oh, we can use that same information about what we have, who's using it, what's in it, to instead make sure that it's used compliantly with a regulation like GDPR to make sure that you aren't holding onto health records longer than you should or PII. And it's absolutely a very big use case for many of our customers. >> So data is touched by a lot of people in an organization. AWS has done a great job of really developing a lot of synergy with the developer community for a long time now. But we're also seeing some trends suggesting they're going up the stack. They want to get more enterprises, enterprises are at the precipice, as Andy Jassey said, of this mass migration to the Cloud. You mentioned, all of your work with AWS and the CIO events that you're having here. Where are you guys in a conversation with customers? Are you more now having to get to that C-suite as now their business are absolutely predicated upon the best use of data to identify ways to monetize new revenue streams. How influential is that C-level in this conversation. >> It's a great question. So I think what is interesting is, all companies, we sort of commoditized a basic business school, consultant, best practice knowledge. Everyone is kind of already doing that. To get to the next level our customers are recently telling us it is only by finding key insights in data that they're gonna beat out the competition and stay relevant. I mean, look what Amazon and Netflix have done to the industries that, they weren't as data driven, and have that kind of agility around data. So everybody wants to do the same thing. So CIOs, CDOs, chief data officers, we're seeing them crop up more and more and being more and more empowered in the organization. Because it's seen as central to hitting revenue targets and making an impact, which is what customers want to do. And I mentioned CISOs as well with the question that you asked, Keith, about security. >> The CISOs, the chief information security officers. >> Aaron: Yeah, absolutely. Yeah, absolutely, so I think usually often a CISO will report into a CIO, often you see it as adjacent to them, there's somebody who needs to have the confidence, as they do, in Alation's process of mirroring what's in the data source, not introducing security holes. Potentially even taking a step forward and saying, as I implement GDPR and other policies, how do I use a comprehensive automated inventory like Alations to make sure that process isn't just started but actually finished and avoid the fines and the adverse events. We absolutely see across the C-suite a lot of interest. >> So let's go one step below the CIO, and I think the CIO understands this. This data is the new oil. Very, very straightforward. But now you're getting into the enterprise architect, the VP of infrastructure, and they have to implement these technologies. What have been some of the rewards and challenges with those conversations? >> That's a great question. Right, so here at AWS Reinvent we have a very technical audience, very infrastructure minded. Those are folks that we love to engage with, but our primary audience is the business. >> Keith: Right. >> Right. And so I think what's interesting is, the problem we solve for the more infrastructure-minded executives is how do I deal with these business users? How do I turn this relationship that feels adversarial, where they're putting strain on my system, they're upset about cost overruns, we don't speak the same language with the same values. Alation can be a great bridge. Because we do all of this automated extraction and tying to the sources where they are, and kind of meet the industry people where they live, but then can communicate the value in a clean interface that demonstrates real business ROI to the business. So we can kid of be an ambassador between those sides of the customer. >> I love that, being an ambassador. Aaron, your passion for Alation, what you do, your engagement with customers is palpable. So we thank you for joining us on theCUBE, and wish you guys the best of luck with what you're doing here at AWS Reinvent. >> Lisa, thank you so much for having me. >> Lisa: Awesome. >> Keith: Great job, Aaron. >> Thank you for watching. We are live at AWS Reinvent 2017 with 42,000 other people. I'm Lisa Martin, for my cohost Keith Townsend and Aaron Kalb, stick around. We'll be right back.
SUMMARY :
and our ecosystem of partners. Aaron Kalb, the head of product and a founder of Alation. I'm excited to be here. What gap did Alation see in the market five years ago and I think our approach, as you said, So Aaron, I have seen that term quite a bit, and get the right product for your need. find the data that they're looking for? So one of the things I was reading about you guys And how do you make everything accessible? So Aaron, can you talk a little bit about And it's exactly the problem you described. just the way you can say, How do you guys handle security that exist that are on the data itself, So our promise is we help you find, that could effect the way you stock your shelves, the enforcement of that coming up May 2018, an inventory of everything you have, and the CIO events that you're having here. and being more and more empowered in the organization. and the adverse events. So let's go one step below the CIO, but our primary audience is the business. and kind of meet the industry people where they live, So we thank you for joining us on theCUBE, Thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Aaron Kalb | PERSON | 0.99+ |
Aaron | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Andy Jassey | PERSON | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Keith | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
May 2018 | DATE | 0.99+ |
Germany | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
New Jersey | LOCATION | 0.99+ |
five-star | QUANTITY | 0.99+ |
Chegg | ORGANIZATION | 0.99+ |
Amazon.com | ORGANIZATION | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
one set | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
today | DATE | 0.99+ |
one thing | QUANTITY | 0.99+ |
Alation | PERSON | 0.98+ |
Alation | ORGANIZATION | 0.98+ |
CUBE | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
tonight | DATE | 0.98+ |
Invoice2go | ORGANIZATION | 0.98+ |
S3 | TITLE | 0.98+ |
one step | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
five years ago | DATE | 0.97+ |
Redshift | TITLE | 0.97+ |
first data catalog | QUANTITY | 0.97+ |
day two | QUANTITY | 0.96+ |
day one | QUANTITY | 0.96+ |
each one | QUANTITY | 0.95+ |
AWS Reinvent | ORGANIZATION | 0.95+ |
Oracle | ORGANIZATION | 0.95+ |
one challenge | QUANTITY | 0.94+ |
theCUBE | ORGANIZATION | 0.94+ |
English | OTHER | 0.92+ |
Alations | ORGANIZATION | 0.91+ |
Costanoa | ORGANIZATION | 0.83+ |
SQL Server | TITLE | 0.82+ |
AWS Reinvent 2017 | EVENT | 0.79+ |
42,000 other | QUANTITY | 0.77+ |
Expedia | ORGANIZATION | 0.77+ |