Mark Terenzoni, AWS | AWS re:Invent 2022

(upbeat music) >> Hello, everyone and welcome back to fabulous Las Vegas, Nevada, where we are here on the show floor at AWS re:Invent. We are theCUBE. I am Savannah Peterson, joined with John Furrier. John, afternoon, day two, we are in full swing. >> Yes. >> What's got you most excited? >> Just got lunch, got the food kicking in. No, we don't get coffee. (Savannah laughing) >> Way to bring the hype there, John. >> No, there's so many people here just in Amazon. We're back to 2019 levels of crowd. The interest levels are high. Next gen, cloud security, big part of the keynote. This next segment, I am super excited about. CUBE Alumni, going back to 2013, 10 years ago he was on theCUBE. Now, 10 years later we're at re:Invent, looking forward to this guest and it's about security, great topic. >> I don't want to delay us anymore, please welcome Mark. Mark, thank you so much for being here with us. Massive day for you and the team. I know you oversee three different units at Amazon, Inspector, Detective, and the most recently announced, Security Lake. Tell us about Amazon Security Lake. >> Well, thanks Savannah. Thanks John for having me. Well, Security Lake has been in the works for a little bit of time and it got announced today at the keynote as you heard from Adam. We're super excited because there's a couple components that are really unique and valuable to our customers within Security Lake. First and foremost, the foundation of Security Lake is an open source project we call OCFS, Open Cybersecurity Framework Schema. And what that allows is us to work with the vendor community at large in the security space and develop a language where we can all communicate around security data. And that's the language that we put into Security Data Lake. We have 60 vendors participating in developing that language and partnering within Security Lake. But it's a communal lake where customers can bring all of their security data in one place, whether it's generated in AWS, they're on-prem, or SaaS offerings or other clouds, all in one location in a language that allows analytics to take advantage of that analytics and give better outcomes for our customers. >> So Adams Selipsky big keynote, he spent all the bulk of his time on data and security. Obviously they go well together, we've talked about this in the past on theCUBE. Data is part of security, but this security's a little bit different in the sense that the global footprint of AWS makes it uniquely positioned to manage some security threats, EKS protection, a very interesting announcement, runtime layer, but looking inside and outside the containers, probably gives extra telemetry on some of those supply chains vulnerabilities. This is actually a very nuanced point. You got Guard Duty kind of taking its role. What does it mean for customers 'cause there's a lot of things in this announcement that he didn't have time to go into detail. Unpack all the specifics around what the security announcement means for customers. >> Yeah, so we announced four items in Adam's keynote today within my team. So I'll start with Guard Duty for EKS runtime. It's complimenting our existing capabilities for EKS support. So today Inspector does vulnerability assessment on EKS or container images in general. Guard Duty does detections of EKS workloads based on log data. Detective does investigation and analysis based on that log data as well. With the announcement today, we go inside the container workloads. We have more telemetry, more fine grain telemetry and ultimately we can provide better detections for our customers to analyze risks within their container workload. So we're super excited about that one. Additionally, we announced Inspector for Lambda. So Inspector, we released last year at re:Invent and we focused mostly on EKS container workloads and EC2 workloads. Single click automatically assess your environment, start generating assessments around vulnerabilities. We've added Lambda to that capability for our customers. The third announcement we made was Macy sampling. So Macy has been around for a while in delivering a lot of value for customers providing information around their sensitive data within S3 buckets. What we found is many customers want to go and characterize all of the data in their buckets, but some just want to know is there any sensitive data in my bucket? And the sampling feature allows the customer to find out their sensitive data in the bucket, but we don't have to go through and do all of the analysis to tell you exactly what's in there. >> Unstructured and structured data. Any data? >> Correct, yeah. >> And the fourth? >> The fourth, Security Data Lake? (John and Savannah laughing) Yes. >> Okay, ocean theme. data lake. >> Very complimentary to all of our services, but the unique value in the data lake is that we put the information in the customer's control. It's in their S3 bucket, they get to decide who gets access to it. We've heard from customers over the years that really have two options around gathering large scale data for security analysis. One is we roll our own and we're security engineers, we're not data engineers. It's really hard for them to build these distributed systems at scale. The second one is we can pick a vendor or a partner, but we're locked in and it's in their schemer and their format and we're there for a long period of time. With Security Data Lake, they get the best of both worlds. We run the infrastructure at scale for them, put the data in their control and they get to decide what use case, what partner, what tool gives them the most value on top of their data. >> Is that always a good thing to give the customers too much control? 'Cause you know the old expression, you give 'em a knife they play with and they they can cut themselves, I mean. But no, seriously, 'cause what's the provisions around that? Because control was big part of the governance, how do you manage the security? How does the customer worry about, if I have too much control, someone makes a mistake? >> Well, what we finding out today is that many customers have realized that some of their data has been replicated seven times, 10 times, not necessarily maliciously, but because they have multiple vendors that utilize that data to give them different use cases and outcomes. It becomes costly and unwieldy to figure out where all that data is. So by centralizing it, the control is really around who has access to the data. Now, ultimately customers want to make those decisions and we've made it simple to aggregate this data in a single place. They can develop a home region if they want, where all the data flows into one region, they can distribute it globally. >> They're in charge. >> They're in charge. But the controls are mostly in the hands of the data governance person in the company, not the security analyst. >> So I'm really curious, you mentioned there's 60 AWS partner companies that have collaborated on the Security lake. Can you tell us a little bit about the process? How long does it take? Are people self-selecting to contribute to these projects? Are you cherry picking? What does that look like? >> It's a great question. There's three levels of collaboration. One is around the open source project that we announced at Black Hat early in this year called OCSF. And that collaboration is we've asked the vendor community to work with us to build a schema that is universally acceptable to security practitioners, not vendor specific and we've asked. >> Savannah: I'm sorry to interrupt you, but is this a first of its kind? >> There's multiple schemes out there developed by multiple parties. They've been around for multiple years, but they've been built by a single vendor. >> Yeah, that's what I'm drill in on a little bit. It sounds like the first we had this level of collaboration. >> There's been collaborations around them, but in a handful of companies. We've really gone to a broad set of collaborators to really get it right. And they're focused around areas of expertise that they have knowledge in. So the EDR vendors, they're focused around the scheme around EDR. The firewall vendors are focused around that area. Certainly the cloud vendors are in their scope. So that's level one of collaboration and that gets us the level playing field and the language in which we'll communicate. >> Savannah: Which is so important. >> Super foundational. Then the second area is around producers and subscribers. So many companies generate valuable security data from the tools that they run. And we call those producers the publishers and they publish the data into Security Lake within that OCSF format. Some of them are in the form of findings, many of them in the form of raw telemetry. Then the second one is in the subscriber side and those are usually analytic vendors, SIM vendors, XDR vendors that take advantage of the logs in one place and generate analytic driven outcomes on top of that, use cases, if you will, that highlight security risks or issues for customers. >> Savannah: Yeah, cool. >> What's the big customer focus when you start looking at Security Lakes? How do you see that planning out? You said there's a collaboration, love the open source vibe on that piece, what data goes in there? What's sharing? 'Cause a big part of the keynote I heard today was, I heard clean rooms, I've cut my antenna up. I'd love to hear that. That means there's an implied sharing aspect. The security industry's been sharing data for a while. What kind of data's in that lake? Give us an example, take us through. >> Well, this a number of sources within AWS, as customers run their workloads in AWS. We've identified somewhere around 25 sources that will be natively single click into Amazon Security Lake. We were announcing nine of them. They're traditional network logs, BBC flow, cloud trail logs, firewall logs, findings that are generated across AWS, EKS audit logs, RDS data logs. So anything that customers run workloads on will be available in data lake. But that's not limited to AWS. Customers run their environments hybridly, they have SaaS applications, they use other clouds in some instances. So it's open to bring all that data in. Customers can vector it all into this one single location if they decide, we make it pretty simple for them to do that. Again, in the same format where outcomes can be generated quickly and easily. >> Can you use the data lake off on premise or it has to be in an S3 in Amazon Cloud? >> Today it's in S3 in Amazon. If we hear customers looking to do something different, as you guys know, we tend to focus on our customers and what they want us to do, but they've been pretty happy about what we've decided to do in this first iteration. >> So we got a story about Silicon Angle. Obviously the ingestion is a big part of it. The reporters are jumping in, but the 53rd party sources is a pretty big number. Is that coming from the OCSF or is that just in general? Who's involved? >> Yeah, OCSF is the big part of that and we have a list of probably 50 more that want to join in part of this. >> The other big names are there, Cisco, CrowdStrike, Peloton Networks, all the big dogs are in there. >> All big partners of AWS, anyway, so it was an easy conversation and in most cases when we started having the conversation, they were like, "Wow, this has really been needed for a long time." And given our breadth of partners and where we sit from our customers perspective in the center of their cloud journey that they've looked at us and said, "You guys, we applaud you for driving this." >> So Mark, take us through the conversations you're having with the customers at re:Inforce. We saw a lot of meetings happening. It was great to be back face to face. You guys have been doing a lot of customer conversation, security Data Lake came out of that. What was the driving force behind it? What were some of the key concerns? What were the challenges and what's now the opportunity that's different? >> We heard from our customers in general. One, it's too hard for us to get all the data we need in a single place, whether through AWS, the industry in general, it's just too hard. We don't have those resources to data wrangle that data. We don't know how to pick schema. There's multiple ones out there. Tell us how we would do that. So these three challenges came out front and center for every customer. And mostly what they said is our resources are limited and we want to focus those resources on security outcomes and we have security engines. We don't want to focus them on data wrangling and large scale distributed systems. Can you help us solve that problem? And it came out loud and clear from almost every customer conversation we had. And that's where we took the challenge. We said, "Okay, let's build this data layer." And then on top of that we have services like Detective and Guard Duty, we'll take advantage of it as well. But we also have a myriad of ISV third parties that will also sit on top of that data and render out. >> What's interesting, I want to get your reaction. I know we don't have much time left, but I want to get your thoughts. When I see Security Data Lake, which is awesome by the way, love the focus, love how you guys put that together. It makes me realize the big thing in re:Invent this year is this idea of specialized solutions. You got instances for this and that, use cases that require certain kind of performance. You got the data pillars that Adam laid out. Are we going to start seeing more specialized data lakes? I mean, we have a video data lake. Is there going to be a FinTech data lake? Is there going to be, I mean, you got the Great Lakes kind of going on here, what is going on with these lakes? I mean, is that a trend that Amazon sees or customers are aligning to? >> Yeah, we have a couple lakes already. We have a healthcare lake and a financial lake and now we have a security lake. Foundationally we have Lake Formation, which is the tool that anyone can build a lake. And most of our lakes run on top of Lake Foundation, but specialize. And the specialization is in the data aggregation, normalization, enridgement, that is unique for those use cases. And I think you'll see more and more. >> John: So that's a feature, not a bug. >> It's a feature, it's a big feature. The customers have ask for it. >> So they want roll their own specialized, purpose-built data thing, lake? They can do it. >> And customer don't want to combine healthcare information with security information. They have different use cases and segmentation of the information that they care about. So I think you'll see more. Now, I also think that you'll see where there are adjacencies that those lakes will expand into other use cases in some cases too. >> And that's where the right tools comes in, as he was talking about this ETL zero, ETL feature. >> It be like an 80, 20 rule. So if 80% of the data is shared for different use cases, you can see how those lakes would expand to fulfill multiple use cases. >> All right, you think he's ready for the challenge? Look, we were on the same page. >> Okay, we have a new challenge, go ahead. >> So think of it as an Instagram Reel, sort of your hot take, your thought leadership moment, the clip we're going to come back to and reference your brilliance 10 years down the road. I mean, you've been a CUBE veteran, now CUBE alumni for almost 10 years, in just a few weeks it'll be that. What do you think is, and I suspect, I think I might know your answer to this, so feel free to be robust in this. But what do you think is the biggest story, key takeaway from the show this year? >> We're democratizing security data within Security Data Lake for sure. >> Well said, you are our shortest answer so far on theCUBE and I absolutely love and respect that. Mark, it has been a pleasure chatting with you and congratulations, again, on the huge announcement. This is such an exciting day for you all. >> Thank you Savannah, thank you John, pleasure to be here. >> John: Thank you, great to have you. >> We look forward to 10 more years of having you. >> Well, maybe we don't have to wait 10 years. (laughs) >> Well, more years, in another time. >> I have a feeling it'll be a lot of security content this year. >> Yeah, pretty hot theme >> Very hot theme. >> Pretty odd theme for us. >> Of course, re:Inforce will be there this year again, coming up 2023. >> All the res. >> Yep, all the res. >> Love that. >> We look forward to see you there. >> All right, thanks, Mark. >> Speaking of res, you're the reason we are here. Thank you all for tuning in to today's live coverage from AWS re:Invent. We are in Las Vegas, Nevada with John Furrier. My name is Savannah Peterson. We are theCUBE and we are the leading source for high tech coverage. (upbeat music)

Published Date : Nov 29 2022

SUMMARY :

to fabulous Las Vegas, Nevada, the food kicking in. big part of the keynote. and the most recently First and foremost, the and outside the containers, and do all of the analysis Unstructured and structured data. (John and Savannah laughing) data lake. and they get to decide what part of the governance, that data to give them different of the data governance on the Security lake. One is around the open source project They've been around for multiple years, It sounds like the first we had and the language in in the subscriber side 'Cause a big part of the Again, in the same format where outcomes and what they want us to do, Is that coming from the OCSF Yeah, OCSF is the big part of that all the big dogs are in there. in the center of their cloud journey the conversations you're having and we have security engines. You got the data pillars in the data aggregation, The customers have ask for it. So they want roll of the information that they care about. And that's where the So if 80% of the data is ready for the challenge? Okay, we have a new is the biggest story, We're democratizing security data on the huge announcement. Thank you Savannah, thank We look forward to 10 Well, maybe we don't have of security content this year. be there this year again, the reason we are here.

ENTITIES

Entity	Category	Confidence
Savannah	PERSON	0.99+
Mark Terenzoni	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
John	PERSON	0.99+
Savannah Peterson	PERSON	0.99+
Mark	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
10 times	QUANTITY	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
CrowdStrike	ORGANIZATION	0.99+
Adam	PERSON	0.99+
2019	DATE	0.99+
10 years	QUANTITY	0.99+
2023	DATE	0.99+
last year	DATE	0.99+
seven times	QUANTITY	0.99+
60 vendors	QUANTITY	0.99+
2013	DATE	0.99+
Peloton Networks	ORGANIZATION	0.99+
Macy	ORGANIZATION	0.99+
three challenges	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
Today	DATE	0.99+
10 years later	DATE	0.99+
Las Vegas, Nevada	LOCATION	0.99+
today	DATE	0.99+
10 more years	QUANTITY	0.99+
80	QUANTITY	0.99+
One	QUANTITY	0.99+
first iteration	QUANTITY	0.98+
10 years ago	DATE	0.98+
60	QUANTITY	0.98+
two options	QUANTITY	0.98+
First	QUANTITY	0.98+
third announcement	QUANTITY	0.98+
first	QUANTITY	0.98+
fourth	QUANTITY	0.98+
one region	QUANTITY	0.98+
Las Vegas, Nevada	LOCATION	0.98+
this year	DATE	0.98+
Data Lake	ORGANIZATION	0.97+
both worlds	QUANTITY	0.97+
20 rule	QUANTITY	0.97+
Great Lakes	LOCATION	0.97+
single place	QUANTITY	0.96+
Security Lake	ORGANIZATION	0.96+
S3	TITLE	0.96+
one place	QUANTITY	0.96+
one location	QUANTITY	0.96+
Instagram	ORGANIZATION	0.96+
EKS	ORGANIZATION	0.95+

Evolution of Data Lakes

(light music) >> Kevin Miller joins us. He's the Vice President and General Manager of Amazon S3. And we're going to discuss the evolution of data lakes. Hey Kevin. >> Hey Dave. Great to be here. >> Yeah, let's riff on this a little bit. Why is S3 so popular for data lakes? How have data lakes on S3 changed and evolved? >> Well, I think a lot of the core benefits of S3 really play directly into what customers are looking for when they're building a data lake, right? They're looking for low cost storage, some place that they can put shared data sets and have, make it very easy for other teams and businesses to access a set of data as well as have all the management around it. Knowing that the data's secure, is durable, it's protected. And so all of the capability that S3 provides out of the box, is just a really good fit for what customers need out of a data lake storage provider. >> And it's really the simple form. I remember when Schema on Read hit, and people were like, oh great, we can just shove all our stuff into a data lake. And then of course the old broma it became a data swamp. But the industry has evolved, hasn't it? It has new tools, machine intelligence and AI, and machine learning have really helped a lot. Talk about how that's changed from the, the old days if you will, where it was just kind of this mess and you really couldn't do much with it. And why today we're able to get so much more out of data lakes. >> Yeah. I think that original use of data lakes centered a lot around analytics and sort of Hadoop or Spark type applications. And that continues to be a big driver. But I think that one is that we're continuing to expand the kinds of applications. Like you mentioned, machine learning, or other kinds of intelligence are, those applications are increasing as things that customers want to do around these shared data sets. And being able to pretty easily sort of dynamically combine data sets together and use that to drive more insight. I think that you're absolutely right. You know, if you left unstructured or left without any kind of governance you can quickly develop a lot of unusable data. And so I think we're seeing the evolution is in customers putting more of a governance structure in place around it, really trying to understand and catalog the the data sets they have. And I think that's going to continue. That's something that we're seeing pretty actively develop right now in terms of knowing what data I have, knowing the essence of metadata around it. As far as how frequently is this data being updated? When is it updated? What are the rules around when I can access it and so forth. As well as around data lake access control, making it very easy to grant an end user, a specific end user, access to certain data sets knowing that they can then audit and really know exactly who has access to what data in that data lake. So you're seeing a lot of that governance type structure come around while not taking away the essence of having a simple, low cost, scalable way to store and then access data from a number of applications. So that's all now starting to really come together, I see. >> I think this is a really important point you're making because I see organizations rethinking their data architecture and their data organizations to really put put data in the hands of the lines of business, those with domain expertise and self-service is becoming really important. I see a lot of organizations say, 'Hey we're going to give the lines of business their own data lakes that they can spin up' but, they have to be governed in a federated fashion. I know you guys use this term lake house. How do these things fit together? >> Well, Dave, I think you're absolutely right. I think that what a lot of organizations, what I see a lot of organizations doing is evolving to a point where they want as minimal layers between someone who owns a business outcome. Whether it's a top level revenue generation line or bottom level cost line, they want to connect the people who are in the, closest to the business problem with the applications and the technology that they can use to solve it. And that's, a big part of that then is the data and the data sets that are available. So I think where it needs to come together and where it is coming together is around making it very easy to federate, to know what data sources I have, to know what the rules are around accessing it, to remove as much of the friction as we can around just the basics of provisioning access. Knowing that this set of people is allowed to access it. And how do they access it. Just as much as possible removing that, so that it's not weeks between when I have an idea and when I can build an application to process that data. Ideally it's within an hour, I have an idea, I can spin up a notebook. I can pull in the data sets I need. Train an ML algorithm or build some analytics function and then start to see some results and see is this really working or not? And then of course sort of scale it up from there in a seamless fashion. So I think that a lot of the essence of AWS that we've built over the years is really starting to come together. And where we are continuing to make it simpler for customers is all around that federation and the simplicity of provisioning access to the data. >> And share that data across a massive global network. Kevin Miller, thanks so much for coming on theCube and talking about data lakes. >> Yeah. Thanks for having me, Dave. >> You're welcome. And thank you for watching. This is Dave Vellante for theCube. (light music)

Published Date : Aug 1 2022

SUMMARY :

the evolution of data lakes. Why is S3 so popular for data lakes? And so all of the And it's really the simple form. And I think that's going to continue. of the lines of business, of the essence of AWS And share that data across And

ENTITIES

Entity	Category	Confidence
Kevin Miller	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Kevin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
today	DATE	0.94+
an hour	QUANTITY	0.93+
Spark	TITLE	0.93+
Schema	TITLE	0.87+
S3	TITLE	0.84+
one	QUANTITY	0.81+
Hadoop	TITLE	0.67+
S3	COMMERCIAL_ITEM	0.46+

Jim Walker, Cockroach Labs & Christian Hüning, finleap connect | Kubecon + Cloudnativecon EU 2022

>> (bright music) >> Narrator: The Cube, presents Kubecon and Cloudnativecon, year of 2022, brought to you by Red Hat, the cloud native computing foundation and its ecosystem partners. >> Now what we're opening. Welcome to Valencia, Spain in Kubecon Cloudnativecon, Europe, 2022. I'm Keith Townsend, along with my host, Paul Gillin, who is the senior editor for architecture at Silicon angle, Paul. >> Keith you've been asking me questions all these last two days. Let me ask you one. You're a traveling man. You go to a lot of conferences. What's different about this one. >> You know what, we're just talking about that pre-conference, open source conferences are usually pretty intimate. This is big. 7,500 people talking about complex topics, all in one big area. And then it's, I got to say it's overwhelming. It's way more. It's not focused on a single company's product or messaging. It is about a whole ecosystem, very different show. >> And certainly some of the best t-shirts I've ever seen. And our first guest, Jim has one of the better ones. >> I mean a bit cockroach come on, right. >> Jim Walker, principal product evangelist at CockroachDB and Christian Huning, tech director of cloud technologies at Finleap Connect, a financial services company that's based out of Germany, now offering services in four countries now. >> Basically all over Europe. >> Okay. >> But we are in three countries with offices. >> So you're CockroachDB customer and I got to ask the obvious question. Databases are hard and started the company in 2015 CockroachDB, been a customer since 2019, I understand. Why take the risk on a four year old database. I mean that just sounds like a world of risk and trouble. >> So it was in 2018 when we joined the company back then and we did this cloud native transformation, that was our task basically. We had very limited amount of time and we were faced with a legacy infrastructure and we needed something that would run in a cloud native way and just blend in with everything else we had. And the idea was to go all in with Kubernetes. Though early days, a lot of things were alpha beta, and we were running on mySQL back then. >> Yeah. >> On a VM, kind of small setup. And then we were looking for something that we could just deploy in Kubernetes, alongside with everything else. And we had to stack and we had to duplicate it many times. So also to maintain that we wanted to do it all the same like with GitOps and everything and Cockroach delivered that proposition. So that was why we evaluate the risk of relatively early adopting that solution with the proposition of having something that's truly cloud native and really blends in with everything else we do in the same way was something we considered, and then we jumped the leap of faith and >> The fin leap of faith >> The fin leap of faith. Exactly. And we were not dissatisfied. >> So talk to me a little bit about the challenges because when we think of MySQL, MySQL scales to amazing sizes, it is the de facto database for many cloud based architectures. What problems were you running into with MySQL? >> We were running into the problem that we essentially, as a finTech company, we are regulated and we have companies, customers that really value running things like on-prem, private cloud, on-prem is a bit of a bad word, maybe. So it's private cloud, hybrid cloud, private cloud in our own data centers in Frankfurt. And we needed to run it in there. So we wanted to somehow manage that and with, so all of the managed solution were off the table, so we couldn't use them. So we needed something that ran in Kubernetes because we only wanted to maintain Kubernetes. We're a small team, didn't want to use also like full blown VM solution, of sorts. So that was that. And the other thing was, we needed something that was HA distributable somehow. So we also looked into other solutions back at the time, like Vitis, which is also prominent for having a MySQL compliant interface and great solution. We also got into work, but we figured, this is from the scale, and from the sheer amount of maintenance it would need, we couldn't deliver that, we were too small for that. So that's where then Cockroach just fitted in nicely by being able to distribute BHA, be resilient against failure, but also be able to scale out because we had this problem with a single MySQL deployment to not really, as it grew, as the data amounts grew, we had trouble to operatively keep that under control. >> So Jim, every time someone comes to me and says, I have a new database, I think we don't need it, yet another database. >> Right. >> What problem, or how does CockroachDB go about solving the types of problems that Christian had? >> Yeah. I mean, Christian laid out why it exists. I mean, look guys, building a database isn't easy. If it was easy, we'd have a database for every application, but you know, Michael Stonebraker, kind of godfather of all database says it himself, it takes seven, eight years for a database to fully gestate to be something that's like enterprise ready and kind of, be relied upon. We've been billing for about seven, eight years. I mean, I'm thankful for people like Christian to join us early on to help us kind of like troubleshoot and go through some things. We're building a database, it's not easy. You're right. But building a distributor system is also not easy. And so for us, if you look at what's going on in just infrastructure in general, what's happening in Kubernetes, like this whole space is Kubernetes. It's all about automation. How do I automate scale? How do I automate resilience out of the entire equation of what we're actually doing? I don't want to have to think about active passive systems. I don't want to think about sharding a database. Sure you can scale MySQL. You know, how many people it takes to run three or four shards of MySQL database. That's not automation. And I tell you what, this world right now with the advances in data how hard it is to find people who actually understand infrastructure to hire them. This is why this automation is happening, because our systems are more complex. So we started from the very beginning to be something that was very different. This is a cloud native database. This is built with the same exact principles that are in Kubernetes. In fact, like Kubernetes it's kind of a spawn of borg, the back end of Google. We are inspired by Spanner. I mean, this started by three engineers that worked at Google, are frustrated, they didn't have the tools, they had at Google. So they built something that was, outside of Google. And how do we give that kind of Google like infrastructure for everybody. And that's, the advent of Cockroach and kind of why we're doing, what we're doing. >> As your database has matured, you're now beginning a transition or you're in a transition to a serverless version. How are you doing that without disrupting the experience for existing customers? And why go serverless at all? >> Yeah, it's interesting. So, you know, serverless was, it was kind of a an R&D project for us. And when we first started on a path, because I think you know, ultimately what we would love to do for the database is let's not even think about database, Keith. Like, I don't want to think about the database. What we're building too is, we want a SQL API in the cloud. That's it. I don't want to think about scale. I don't want to think about upgrades. I literally like. that stuff should just go away. That's what we need, right. As developers, I don't want to think about isolation levels or like, you know, give me DML and I want to be able to communicate. And for us the realization of that vision is like, if we're going to put a database on the planet for everybody to actually use it, we have to be really, really efficient. And serverless, which I believe really should be infrastructure less because I don't think we should be thinking of just about service. We got to think about, how do I take the context of regions out of this thing? How do I take the context of cloud providers out of what we're talking about? Let's just not think about that. Let's just code against something. Serverless was the answer. Now we've been building for about a year and a half. We launched a serverless version of Cockroach last October and we did it so that everybody in the public could have a free version of a database. And that's what serverless allows us to do. It's all consumption based up to certain limits and then you pay. But I think ultimately, and we spoke a little bit about this at the very beginning. I think as ISVs, people who are building software today the serverless vision gets really interesting because I think what's on the mind of the CTO is, how do I drive down my cost to the cloud provider? And if we can basically, drive down costs through either making things multi-tenant and super efficient, and then optimizing how much compute we use, spinning things down to zero and back up and auto scaling these sort of things in our software. We can start to make changes in the way that people are thinking about spend with the cloud provider. And ultimately we did that, so we could do things for free. >> So, Jim, I think I disagree Christian, I'm sorry, Jim. I think I disagree with you just a little bit. Christian, I think the biggest challenge facing CTOs are people. >> True. >> Getting the people to worry about cost and spend and implementation. So as you hear the concepts of CoachDB moving to a serverless model, and you're a large customer how does that make you think or react to your people side of your resources? >> Well, I can say that from the people side of resources luckily Cockroach is our least problem. So it just kind of, we always said, it's an operator stream because that was the part that just worked for us, so. >> And it's worked as you have scaled it? without you having ... >> Yeah. I mean, we use it in a bit of a, we do not really scale out like the Cockroach, like really large. It's like, more that we use it with the enterprise features of encryption in the stack and our customers then demand. If they do so, we have the Zas offering and we also do like dedicated stacks. So by having a fully cloud native solution on top of Kubernetes, as the foundational layer we can just use that and stamp it out and deploy it. >> How does that translate into services you can provide your customers? Are there services you can provide customers that you couldn't have, if you were running, say, MySQL? >> No, what we do is, we run this, so the SAS offering runs in our hybrid private cloud. And the other thing that we offer is that we run the entire stack at a cloud provider of their choosing. So if they are an AWS, they give us an AWS account, we put it in there. Theoretically, we could then also talk about using the serverless variant, if they like so, but it's not strictly required for us. >> So Christian, talk to me about that provisioning process because if I had a MySQL deployment before I can imagine how putting that into a cloud native type of repeatable CICD pipeline or Ansible script that could be difficult. Talk to me about that. How CockroachDB enables you to create new onboarding experiences for your customers? >> So what we do is, we use helm charts all over the place as probably everybody else. And then each application team has their parts of services, they've packaged them to helm charts, they've wrapped us in a super chart that gets wrapped into the super, super chart for the entire stack. And then at the right place, somewhere in between Cockroach is added, where it's a dependency. And as they just offer a helm chart that's as easy as it gets. And then what the teams do is they have an inner job, that once you deploy all that, it would spin up. And as soon as Cockroach is ready it's just the same reconcile loop as everything. It will then provision users, set up database schema, do all that. And initialize, initial data sets that might be required for a new setup. So with that setup, we can spin up a new cluster and then deploy that stack chart in there. And it takes some time. And then it's done. >> So talk to me about life cycle management. Because when I have one database, I have one schema. When I have a lot of databases I have a lot of different schemas. How do you keep your stack consistent across customers? >> That is basically part of the same story. We have get offs all over the place. So we have this repository, we see the super helm chart versions and we maintain like minus three versions and ensure that we update the customers and keep them up to date. It's part of the contract sometimes, down to the schedule of the customer at times. And Cockroach nicely supports also, these updates with these migrations in the background, the schema migrations in the background. So we use in our case, in that integration SQL alchemy, which is also nicely supported. So there was also part of the story from MySQL to Postgres, was supported by the ORM, these kind of things. So the skill approach together with the ease of helm charts and the background migrations of the schema is a very seamless upgrade operations. Before that we had to have downtime. >> That's right, you could have online schema changes. Upgrading the database uses the same concept of rolling upgrades that you have in Kubernetes. It's just cloud native. It just fits that same context, I think. >> Christian: It became a no-brainer. >> Yeah. >> Yeah. >> Jim, you mentioned the idea of a SQL API in the cloud, that's really interesting. Why does such a thing not exist? >> Because it's really difficult to build. You know, SQL API, what does that mean? Like, okay. What I'm going to, where does that endpoint live? Is there one in California one on the east coast, one in Europe, one in Asia? Okay. And I'm asking that endpoint for data. Where does that data live? Can you control where data lives on the planet? Because ultimately what we're fighting in software today in a lot of these situations is the speed of light. And so how do you intelligently place data on this planet? So that, you know, when you're asking for data, when you're maybe home, it's a different latency than when you're here in Valencia. Does that data follow and move you? These are really, really difficult problems to solve. And I think that we're at that layer of, we're at this moment in time in software engineering, we're solving some really interesting, interesting things cause we are budding against this speed of light problem. And ultimately that's one of the biggest challenges. But underneath, it has to have all this automation like the ease at which we can scale this database like the always on resilient, the way that we can upgrade the entire thing with just rolling upgrades. The cloud native concepts is really what's enabling us to do things at global scale it's automation. >> Let's alk about that speed of light in global scale. There's no better conference for speed of light, for scale, than Kubecon. Any predictions coming out of the show? >> It's less a prediction for me and more of an observation, you guys. Like look at two years ago, when we were here in Barcelona at QCon EU, it was a lot of hype. It's a lot of hype, a lot of people walking around, curious, fascinated, this is reality. The conversations that I'm having with people today, there's a reality. There's people really doing, they're becoming cloud native. And to me, I think what we're going to see over the next two to three years is people start to adopt this kind of distributed mindset. And it permeates not just within infrastructure but it goes up into the stack. We'll start to see much more developers using, Go and these kind of the threaded languages, because I think that distributed mindset, if it starts at the chip all the way to the fingertip of the person clicking and you're distributed everywhere in between. It is extremely powerful. And I think that's what Finleap, I mean, that's exactly what the team is doing. And I think there's a lot of value and a lot of power in that. >> Jim, Christian, thank you so much for coming on the Cube and sharing your story. You know what we're past the hype cycle of Kubernetes, I agree. I was a nonbeliever in Kubernetes two, three years ago. It was mostly hype. We're looking at customers from Microsoft, Finleap and competitors doing amazing things with this platform and cloud native in general. Stay tuned for more coverage of Kubecon from Valencia, Spain. I'm Keith Townsend, along with Paul Gillin and you're watching the Cube, the leader in high tech coverage. (bright music)

Published Date : May 19 2022

SUMMARY :

brought to you by Red Hat, Welcome to Valencia, Spain You go to a lot of conferences. I got to say it's overwhelming. And certainly some of the and Christian Huning, But we are in three and started the company and we were faced with So also to maintain that we And we were not dissatisfied. So talk to me a little and we have companies, customers I think we don't need it, And how do we give that kind disrupting the experience and we did it so that I think I disagree with Getting the people to worry because that was the part And it's worked as you have scaled it? It's like, more that we use it And the other thing that we offer is that So Christian, talk to me it's just the same reconcile I have a lot of different schemas. and ensure that we update the customers Upgrading the database of a SQL API in the cloud, the way that we can Any predictions coming out of the show? and more of an observation, you guys. so much for coming on the Cube

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Jim Walker	PERSON	0.99+
California	LOCATION	0.99+
Keith Townsend	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2018	DATE	0.99+
Germany	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
2015	DATE	0.99+
Frankfurt	LOCATION	0.99+
Keith	PERSON	0.99+
Europe	LOCATION	0.99+
seven	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
Cockroach Labs	ORGANIZATION	0.99+
Christia	PERSON	0.99+
Barcelona	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Valencia	LOCATION	0.99+
Asia	LOCATION	0.99+
Christian	PERSON	0.99+
Finleap Connect	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
Kubernetes	TITLE	0.99+
Valencia, Spain	LOCATION	0.99+
three	QUANTITY	0.99+
two years ago	DATE	0.99+
Finleap	ORGANIZATION	0.99+
three engineers	QUANTITY	0.99+
three countries	QUANTITY	0.99+
first guest	QUANTITY	0.99+
SQL API	TITLE	0.99+
Paul	PERSON	0.99+
Kubecon	ORGANIZATION	0.98+
last October	DATE	0.98+
eight years	QUANTITY	0.98+
2022	DATE	0.98+
each application	QUANTITY	0.98+
four countries	QUANTITY	0.98+
one database	QUANTITY	0.98+
one	QUANTITY	0.98+
2019	DATE	0.98+
three years ago	DATE	0.98+
CockroachDB	ORGANIZATION	0.98+
one schema	QUANTITY	0.98+
Christian Huning	PERSON	0.97+
about a year and a half	QUANTITY	0.97+
two	DATE	0.96+
first	QUANTITY	0.96+
Christian Hüning	PERSON	0.94+
today	DATE	0.94+
about seven	QUANTITY	0.93+
Cloudnativecon	ORGANIZATION	0.93+
three years	QUANTITY	0.93+

Predictions 2022: Top Analysts See the Future of Data

(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)

Published Date : Jan 7 2022

SUMMARY :

and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well

ENTITIES

Entity	Category	Confidence
Dave Menninger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Doug Henschen	PERSON	0.99+
David	PERSON	0.99+
Brad Shimmin	PERSON	0.99+
Doug	PERSON	0.99+
Tony Baer	PERSON	0.99+
Dave Velannte	PERSON	0.99+
Tony	PERSON	0.99+
Carl	PERSON	0.99+
Brad	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2014	DATE	0.99+
Sanjeev Mohan	PERSON	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
January of 2022	DATE	0.99+
three	QUANTITY	0.99+
381 databases	QUANTITY	0.99+
IDC	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
2021	DATE	0.99+
Google	ORGANIZATION	0.99+
Omdia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
SanjMo	ORGANIZATION	0.99+
79%	QUANTITY	0.99+
second question	QUANTITY	0.99+
last week	DATE	0.99+
15 data stores	QUANTITY	0.99+
100%	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+

Ravi Mayuram, Couchbase | Couchbase ConnectONLINE 2021

>>Welcome back to the cubes coverage of Couchbase connect online, where the theme of the event is, or is modernized now. Yes, let's talk about that. And with me is Ravi, who's the senior vice president of engineering and the CTO at Couchbase Ravi. Welcome. Great to see you. >>Thank you so much. I'm so glad to be here with you. >>I asked you what the new requirements are around modern applications. I've seen some, you know, some of your comments, you gotta be flexible, distributed, multimodal, mobile edge. It, that those are all the very cool sort of buzz words, smart applications. What does that all mean? And how do you put that into a product and make it real? >>Yeah, I think what has basically happened is that, uh, so far, uh, it's been a transition of sorts. And now we are come to a point where, uh, the tipping point and the tipping point has been, uh, uh, more because of COVID and there COVID has pushed us to a world where we are living, uh, in a sort of, uh, occasionally connected manner where our digital, uh, interactions, precede our physical interactions in one sense. So it's a world where we do a lot more stuff that's less than, uh, in a digital manner, as opposed to sort of making a more specific human contact that has really been the, uh, sort of accelerant to this modernized. Now, as a team in this process, what has happened is that so far all the databases and all the data infrastructure that we have built historically, are all very centralized. >>They're all sitting behind. Uh, they used to be in mainframes from where they came to like your own data centers, where we used to run hundreds of servers to where they're going now, which is the computing marvelous change to consumption-based computing, which is all cloud oriented now. And so, uh, but they are all centralized still. Uh, but where our engagement happens with the data is, uh, at the edge, uh, at your point of convenience at your point of consumption, not where the data is actually sitting. So this has led to, uh, you know, all those buzzwords, as you said, which is like, oh, well we need a distributed data infrastructure, where is the edge? Uh, but it just basically comes down to the fact that the data needs to be where you are engaging with it. And that means if you are doing it on your mobile phone, or if you are sitting, uh, doing something in your body or traveling, or whether you are in a subway, whether you're in a plane or a ship, wherever the data needs to come to you, uh, and be available as opposed to every time you going to the data, which is centrally sitting in some place. >>And that is the fundamental shift in terms of how the modern architecture needs to think, uh, when they, when it comes to digital transformation and, uh, transitioning their old applications to, uh, the, the modern infrastructure, because that's, what's going to define your customer experiences and your personalized experiences. Uh, otherwise people are basically waiting for that circle of death that we all know, uh, and blaming the networks and other pieces. The problem is actually, the data is not where you are engaging with. It has got to be fetched, you know, seven seas away. Um, and that is the problem that we are basically solving in this modern modernization of that data, data infrastructure. >>I love this conversation and I love the fact that there's a technical person that can kind of educate us on, on this, because date data by its very nature is distributed. It's always been distributed, but w w but distributed database has always been incredibly challenging, whether it was a global SIS Plex or an eventual consistency of getting recovery for a distributed architecture has been extremely difficult. You know, I hate that this is a terrible term, lots of ways to skin a cat, but, but you've been the visionary behind this notion of optionality, how to solve technical problems in different ways. So how do you solve that, that problem of, of, of, uh, of, uh, of a super rock solid database that can handle, you know, distributed data? Yes. >>So there are two issues that you're a little too over there with Forrest is the optionality piece of it, which is that same data that you have that requires different types of processing on it. It's almost like fractional distillation. It is, uh, like your crude flowing through the system. You start all over from petrol and you can end up with Vaseline and rayon on the other end, but the raw material, that's our data in one sense. So far, we never treated the data that way. That's part of the problem. It has always been very purpose built and cast first problem. And so you just basically have to recast it every time we want to look at the data. The first thing that we have done is make data that fluid. So when you're actually, uh, when you have the data, you can first look at it to perform. >>Let's say a simple operation that we call as a key value store operation. Given my ID, give him a password kind of scenarios, which is like, you know, there are customers of ours who have billions of user IDs in their management. So things get slower. How do you make it fast and easily available? Log-in should not take more than five minutes. Again, this is a, there's a class of problem that we solve that same data. Now, eventually, without you ever having to, uh, sort of do a casting it to a different database, you can now do a solid, uh, acquire. These are classic sequel queries, which is our next magic. We are a no SQL database, but we have a full functional sequel. The sequel has been the language that has talked to data for 40 odd years successfully. Every other database has come and try to implement their own QL query language, but they've all failed only sequel as which stood the test of time of 40 odd years. >>Why? Because there's a solid mathematics behind it. It's called a relational calculus. And what that helps you is, is, uh, basically, uh, look at the data and any common tutorial, uh, any, uh, any which way you look at the data. All it will come, uh, the data in a format that you can consume. That's the guarantee sort of gives you in one sense. And because of that, you can now do some really complex in the database signs, what we call us, predicate logic on top of that. And that gives you the ability to do the classic relational type queries, select star from where Canada stuff, because it's at an English level, it becomes easy to, so the same data, you didn't have to go move it to another database, do your, uh, sort of transformation of the data and all this stuff. Same day that you do this. >>Now, that's where the optionality comes in. Now you can do another piece of logic on top of this, which we call search. This is built on this concept of inverted index and TF IDF, the classic Google in a very simple terms, but Google tokenized search, you can do that in the same data without you ever having to move the data to a different format. And then on top of it, they can do what is known as a eventing or your own custom logic, which we all which we do on a, on programming language called Java script. And finally analytics and analytics is the ability to query the operational data in a different way. I'll talk budding. What was my sales of this widget year over year on December 1st week, that's a very complex question to ask, and it takes a lot of different types of processing. >>So these are different types of that's optionality with different types of processing on the same data without you having to go to five different systems without you having to recast the data in five different ways and find different application logic. So you put them in one place. Now is your second question. Now this has got to be distributed and made available in multiple cloud in your data center, all the way to the edge, which is the operational side of the, uh, the database management system. And that's where the distributed, uh, platform that we have built enables us to get it to where you need the data to be, you know, in a classic way, we call it CDN in the data as in like content delivery networks. So far do static, uh, uh, sort of moving of static content to the edges. Now we can actually dynamically move the data. Now imagine the richness of applications you can develop. >>The first part of the, the answer to my question, are you saying you could do this without skiing with a no schema on, right? And then you can apply those techniques. >>Uh, fantastic question. Yes. That's the brilliance of this database is that so far classically databases have always demanded that you first define a schema before you can write a single byte of data. Couchbase is one of the rare databases. I, for one don't know any other one, but there could be, let's give the benefit of doubt. It's a database which writes data first and then late binds to schema as we call it. It's a schema on read things. So because there is no schema, it is just a on document that is sitting inside. And Jason is the lingua franca of the web, as you very well know by now. So it just Jason that we manage, you can do key lookups of the Jason. You can do full credit capability, like a classic relational database. We even have cost-based optimizers and the other sophisticated pieces of technology behind it. >>You can do searching on it, using the, um, the full textual analysis pipeline. You can do ad hoc wedding on the analytic side, and you can write your own custom logic on it using our eventing capabilities. So that's, that's what it allows because we keep the data in the native form of Jason. It's not a data structure or a data schema imposed by a database. It is how the data is produced. And on top of it, we bring different types of logic, five different types of it's like the philosophy is bringing logic to data as opposed to moving data to logic. This is what we have been doing, uh, in the last 40 years because we developed various, uh, database systems and data processing systems of various points. In time in our history, we had key value stores. We had relational systems, we had search systems, we had analytical systems. >>We had queuing systems, all the systems, if you want to use any one of them, our answer has always been, just move the data to that system. Versus we are saying that do not move the data as we get bigger and bigger and data just moving this data is going to be a humongous problem. If you're going to be moving petabytes of data for this is not one to fly instead, bring the logic to the data. So you can now apply different types of logic to the data. I think that's what, in one sense, the optionality piece of this, >>As you know, there's plenty of schema-less data stores. They're just, they're called data swamps. I mean, that's what they, that's what they became, right? I mean, so this is some, some interesting magic that you're applying here. >>Yes. I mean, the one problem with the data swamps as you call them is that that was a little too open-ended because the data format itself could change. And then you do your, then everything became like a game data casting because it required you to have it in seven schema in one sense at the end of the day, for certain types of processing. So in that where a lot of gaps it's probably flooded, but it not really, uh, how do you say, um, keep to the promise that it actually meant to be? So that's why it was a swamp I need, because it was fundamentally not managing the data. The data was sitting in some file system, and then you are doing something, this is a classic database where the data is managed and you create indexes to manage it, and you create different types of indexes to manage it. You distribute the index, you distribute the data you have, um, like we were discussing, you have acid semantics on top of, and when you, when you put all these things together, uh, it's, it's, it's a tough proposition, but they have solved some really tough problems, which are good computer science stuff, computer science problems that we have to solve to bring this, to bring this, to bear, to bring this to the market. >>So you predicted the trend around multimodal and converged, uh, databases. Um, you kind of led Couchbase through that. I want to, I always ask this question because it's clearly a trend in the industry and it, it definitely makes sense from a simplification standpoint. And, and, and so that I don't have to keep switching databases or the flip side of that though, Ravi. And I wonder if you could give me your opinion on this is kind of the right tool for the right job. So I often say isn't that the Swiss army knife approach, we have a little teeny scissors and a knife. That's not that sharp. How do you respond to that? Uh, >>A great one. Um, my answer is always, I use another analogy to tackle that, but is that, have you ever accused a smartphone of being a Swiss army knife? No. No. Nobody does that because it's actually 40 functions in one is what a smartphone becomes. You never call your iPhone or your Android phone, a Swiss army knife, because here's the reason is that you can use that same device in the full capacity. That's what optionality is. It's not, I'm not, it's not like your good old one where there's a keyboard hiding half the screen, and you can do everything only through the keyboard without touching and stuff like that. That's not the whole devices available to you to do one type of processing when you want it. When you're done with that, it can do another completely different types of processing. Like as in a moment, it could be a Tom, Tom telling you all the directions, the next one, it's your PDA. >>Third one, it's a fantastic phone. Uh, four, it's a beautiful camera, which can do your f-stop management and give you a nice SLR quality picture. Right? So next moment is a video camera. People are shooting movies with this thing in Hollywood, these days for God's sake. So it gives you the full power of what you want to do when you want it. And now, if you just taught that iPhone is a great device or any smartphone is a great device, because you can do five things in one or 50 things in one, and at a certain level, they missed the point because what that device really enabled is not just these five things in one place. It becomes easy to consume and easy to operate. It actually started the app is the economy. That's the brilliance of bringing so many things in one place, because in the morning, you know, I get the alert saying that today you got to leave home at eight 15 for your nine o'clock meeting. >>And the next day it might actually say 8 45 is good enough because it knows where the phone is sitting. The geo position of it. It knows from my calendar where the meeting is actually happening. It can do a traffic calculation because it's got my map and all of the routes. And then it's gone there's notification system, which eventually pops up on my phone to say, Hey, you got to leave at this time. Now five different systems have to come together and they can because the data is in one place without that, you couldn't even do this simple function, uh, in a, in a sort of predictable manner in a, in a, in a manner that's useful to you. So I believe a database which gives you this optionality of doing multiple data processing on the same set of data allows you will allow you to build a class of products, which you are so far been able to struggling to build, because half the time you're running sideline to sideline, just, you know, um, integrating data from one system to the other. >>So I love the analogy with the smartphone. I w I want to, I want to continue it and double click on it. So I use this camera. I used to, you know, my kid had a game. I would bring the, the, the big camera, the 35 millimeter. So I don't use that anymore no way, but my wife does, she still uses the DSLR. So is, is there a similar analogy here? That those, and by the way, the camera, the camera shop in my town went out of business, you know? And so, so, but, but is there, is that a fair, where, in other words, those specialized databases, they say there still is a place for them, but they're getting >>Absolutely, absolutely great analogy and a great extension to the question. That's, that's the contrarian side of it in one sense is that, Hey, if everything can just be done in one, do you have a need for the other things? I mean, you gave a camera example where it is sort of, it's a, it's a slippery slope. Let me give you another one, which is actually less straight to the point better. I've been just because my, I, I listened to half of the music on the iPhone. Doesn't stop me from having my full digital receiver. And, you know, my Harman Kardon speakers at home because they haven't, they produce a kind of sounded immersive experience. This teeny little speaker has never in its lifetime intended to produce, right? It's the convenience. Yes. It's the convenience of convergence that I can put my earphones on and listen to all the great music. >>Yes, it's 90% there or 80% there. It depends on your audio file mess of your, uh, I mean, you don't experience the super specialized ones do not go away. You know, there are, there are places where, uh, the specialized use cases will demand a separate system to exist, but even there that has got to be very closed. Um, how do you say close, binding or late binding? I should be able to stream that song from my phone to that receiver so I can get it from those speakers. You can say that, oh, there's a digital divide between these two things done, and I can only play CDs on that one. That's not how it's going to work going forward. It's going to be, this is the connected world, right? As in, if I'm listening to the song in my car and then step off the car and walk into my living room, that's same songs should continue and play in my living room speakers. Then it's a world because it knows my preference and what I'm doing that all happened only because of this data flowing between all these systems. >>I love, I love that example too. When I was a kid, we used to go to Twitter, et cetera. And we'd to play around with, we take off the big four foot speakers. Those stores are out of business too. Absolutely. Um, now we just plug into Sonos. So that is the debate between relational and non-relational databases over Ravi. >>I believe so. Uh, because I think, uh, what had happened was the relational systems. Uh, I've been where the norm, they rule the roost, if you will, for the last 40 odd years, and then gain this no sequel movement, which was almost as though a rebellion from the relational world, we all inhibited, uh, uh, because we, it was very restrictive. It, it had the schema definition and the schema evolution as we call it, all those things, they were like, they required a committee, they required your DBA and your data architect. And you have to call them just to add one column and stuff like that. And the world had moved on. This was the world of blogs and tweets and, uh, you know, um, mashups and, um, uh, uh, a different generation of digital behavior, digital, native people now, um, who are operating in these and the, the applications, the, the consumer facing applications. >>We are living in this world. And yet the enterprise ones were still living in the, um, in the other, the other side of the divide. So all came this solution to say that we don't need SQL. Actually, the problem was never sequel. No sequel was, you know, best approximation, good marketing name, but from a technologist perspective, the problem was never the query language, no SQL was not the problem, the schema limitations, and the inability for these, the system to scale, the relational systems were built like, uh, airplanes, which is that if, uh, San Francisco Boston, there is a flight route, it's so popular that if you want to add 50 more seats to it, the only way you can do that is to go back to Boeing and ask them to get you a set in from 7 3 7 2 7 7 7, or whatever it is. And they'll stick you with a billion dollar bill on the alarm to somehow pay that by, you know, either flying more people or raising the rates or whatever you have to do. >>These are called vertically scaling systems. So relational systems are vertically scaling. They are expensive. Versus what we have done in this modern world, uh, is make the system how it is only scaling, which is more like the same thing. If it's a train that is going from San Francisco to Boston, you need 50 more people be my guests. I'll add one more coach to it, one more car to it. And the better part of the way we have done this year is that, and we have super specialized on that. This route actually requires three, three dining cars and only 10 sort of sleeper cars or whatever. Then just pick those and attach the next route. You can choose to have ID only one dining car. That's good enough. So the way you scale the plane is also can be customized based on the route along the route, more, more dining capabilities, shorter route, not an abandoned capability. >>You can attach the kind of coaches we call this multi-dimensional scaling. Not only do we scale horizontally, we can scale to different types of workloads by adding different types of coaches to it quite. So that's the beauty of this architecture. Now, why is that important? Is that where we land eventually is the ability to do operational and analytical in the same place. This is another thing which doesn't happen in the past, because you would say that I cannot run this analytical Barre because then my operational workload will suffer. Then my friend, then we'll slow down millions of customers that impacted that problem. We will solve the same data in which you can do analytical buddy, an operational query because they're separated by these cars, right? As in like we, we fence the, the, the resources, so that one doesn't impede the other. So you can, at the same time, have a microsecond 10 million ops per second, happening of a key value or equity. >>And then yet you can run this analytical body, which will take a couple of minutes to run one, not impeding the other. So that's in one sense, sort of the, part of the, um, uh, problems that we have solved here is that relational versus, uh, uh, the no SQL portion of it. These are the kinds of problems we have to solve. We solve those. And then we yet put back the same quality language on top. Y it's like Tesla in one sense, right underneath the surface is where all the stuff that had to be changed had to change, which is like the gasoline, uh, the internal combustion engine, uh, I think gas, uh, you says, these are the issues we really wanted to solve. Um, so solve that, change the engine out, you don't need to change the steering wheel or the gas pedal or the, you know, the battle shifters or whatever else you need, or that are for your shifters. >>Those need to remain in the same place. Otherwise people won't buy it. Otherwise it does not even look like a car to people. So, uh, even when you feed people the most advanced technology, it's got to be accessible to them in the manner that people can consume. Only in software, we forget this first design principle, and we go and say that, well, I got a car here, you got the blue harder to go fast and lean back for, for it to, you know, uh, to apply a break that's, that's how we seem to define, uh, design software. Instead, we should be designing them in a manner that it is easiest for our audience, which is developers to consume. And they've been using SQL for 40 years or 30 years. And so we give them the steering wheel on the, uh, and the gas bottle and the, um, and the gear shifter is by putting cul back on underneath the surface, we have completely solved, uh, the relational, uh, uh, limitations of schema, as well as scalability. >>So in, in, in that way, and by bringing back the classic acid capabilities, which is what relational systems, uh, we accounted on and being able to do that with the sequel programming language, we call it like multi-state SQL transaction. So to say, which is what a classic way all the enterprise software was built by putting that back. Now, I can say that that debate between relational and non-relational is over because this has truly extended the database to solve the problems that the relational systems had to grow up the salt in the modern times, but rather than get, um, sort of pedantic about whether it's, we have no SQL or sequel or new sequel, or, uh, you know, any of that sort of, uh, jargon, oriented debate, uh, this, these are the debates of computer science that they are actually, uh, and they were the solve and they have solved them with, uh, the latest release of $7, which we released a few months ago. >>Right, right. Last July, Ravi, we got to leave it there. I, I love the examples and the analogies. I can't wait to be face to face with you. I want to hang with you at the cocktail party because I've learned so much and really appreciate your time. Thanks for coming to the cube. >>Fantastic. Thanks for the time. And the Aboriginal Dan was, I mean, very insightful questions really appreciate it. Thank you. >>Okay. This is Dave Volante. We're covering Couchbase connect online, keep it right there for more great content on the cube.

Published Date : Oct 26 2021

SUMMARY :

Welcome back to the cubes coverage of Couchbase connect online, where the theme of the event Thank you so much. And how do you put that into a product and all the data infrastructure that we have built historically, are all very Uh, but it just basically comes down to the fact that the data needs to be where you And that is the fundamental shift in terms of how the modern architecture needs to think, So how do you solve that, of it, which is that same data that you have that requires different give him a password kind of scenarios, which is like, you know, there are customers of ours who have And that gives you the ability to do the classic relational you can do that in the same data without you ever having to move the data to a different format. platform that we have built enables us to get it to where you need the data to be, The first part of the, the answer to my question, are you saying you could So it just Jason that we manage, you can do key lookups of the Jason. You can do ad hoc wedding on the analytic side, and you can write your own custom logic on it using our We had queuing systems, all the systems, if you want to use any one of them, our answer has always been, As you know, there's plenty of schema-less data stores. You distribute the index, you distribute the data you have, um, So I often say isn't that the Swiss army knife approach, we have a little teeny scissors and That's not the whole devices available to you to do one type of processing when you want it. because in the morning, you know, I get the alert saying that today you got to leave home at multiple data processing on the same set of data allows you will allow you to build a class the camera shop in my town went out of business, you know? in one, do you have a need for the other things? Um, how do you say close, binding or late binding? is the debate between relational and non-relational databases over Ravi. And you have to call them just to add one column and stuff like that. to add 50 more seats to it, the only way you can do that is to go back to Boeing and So the way you scale the plane is also can be customized based on So you can, at the same time, so solve that, change the engine out, you don't need to change the steering wheel or the gas pedal or you got the blue harder to go fast and lean back for, for it to, you know, you know, any of that sort of, uh, jargon, oriented debate, I want to hang with you at the cocktail party because I've learned so much And the Aboriginal Dan was, I mean, very insightful questions really appreciate more great content on the cube.

ENTITIES

Entity	Category	Confidence
Ravi Mayuram	PERSON	0.99+
Ravi	PERSON	0.99+
Boston	LOCATION	0.99+
Dave Volante	PERSON	0.99+
$7	QUANTITY	0.99+
second question	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
90%	QUANTITY	0.99+
80%	QUANTITY	0.99+
40 years	QUANTITY	0.99+
today	DATE	0.99+
30 years	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
three	QUANTITY	0.99+
40 functions	QUANTITY	0.99+
35 millimeter	QUANTITY	0.99+
five things	QUANTITY	0.99+
nine o'clock	DATE	0.99+
40 odd years	QUANTITY	0.99+
50 things	QUANTITY	0.99+
Last July	DATE	0.99+
Boeing	ORGANIZATION	0.99+
two issues	QUANTITY	0.99+
Tesla	ORGANIZATION	0.99+
50 more seats	QUANTITY	0.99+
one sense	QUANTITY	0.99+
one place	QUANTITY	0.99+
one	QUANTITY	0.99+
one more car	QUANTITY	0.99+
San Francisco Boston	LOCATION	0.99+
one more coach	QUANTITY	0.99+
50 more people	QUANTITY	0.99+
first	QUANTITY	0.99+
two things	QUANTITY	0.98+
five different systems	QUANTITY	0.98+
Canada	LOCATION	0.98+
Java	TITLE	0.98+
Harman Kardon	ORGANIZATION	0.98+
five different ways	QUANTITY	0.98+
more than five minutes	QUANTITY	0.98+
first part	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
first problem	QUANTITY	0.98+
Couchbase	ORGANIZATION	0.98+
first thing	QUANTITY	0.98+
Jason	PERSON	0.97+
Tom	PERSON	0.97+
SQL	TITLE	0.97+
next day	DATE	0.97+
Sonos	ORGANIZATION	0.97+
Android	TITLE	0.97+
Twitter	ORGANIZATION	0.97+
December 1st week	DATE	0.96+
one dining car	QUANTITY	0.96+
seven schema	QUANTITY	0.96+
this year	DATE	0.96+
Third one	QUANTITY	0.96+
three dining cars	QUANTITY	0.95+
SIS Plex	TITLE	0.95+
one column	QUANTITY	0.95+
10 sort of sleeper cars	QUANTITY	0.95+
English	OTHER	0.95+
one system	QUANTITY	0.94+
eight 15	DATE	0.94+
millions of customers	QUANTITY	0.94+
single byte	QUANTITY	0.93+
one problem	QUANTITY	0.93+
five	QUANTITY	0.93+
2021	DATE	0.93+
four foot	QUANTITY	0.92+
billion dollar	QUANTITY	0.92+
8 45	OTHER	0.91+
Forrest	ORGANIZATION	0.9+
one type	QUANTITY	0.88+
billions of user IDs	QUANTITY	0.88+
10 million ops	QUANTITY	0.88+

Ravi Mayuram, Senior Vice President of Engineering and CTO, Couchbase

>> Welcome back to the cubes coverage of Couchbase connect online, where the theme of the event is, is modernize now. Yes, let's talk about that. And with me is Ravi mayor him, who's the senior vice president of engineering and the CTO at Couchbase Ravi. Welcome. Great to see you. >> Thank you so much. I'm so glad to be here with you. >> I want to ask you what the new requirements are around modern applications. I've seen some of your comments, you got to be flexible, distributed, multimodal, mobile, edge. Those are all the very cool sort of buzz words, smart applications. What does that all mean? And how do you put that into a product and make it real? >> Yeah, I think what has basically happened is that so far it's been a transition of sorts. And now we are come to a point where that tipping point and that tipping point has been more because of COVID and there are COVID has pushed us to a world where we are living in a in a sort of occasionally connected manner where our digital interactions precede, our physical interactions in one sense. So it's a world where we do a lot more stuff that's less than in a digital manner, as opposed to sort of making a more specific human contact. That does really been the sort of accelerant to this modernize Now, as a team. In this process, what has happened is that so far all the databases and all the data infrastructure that we have built historically, are all very centralized. They're all sitting behind. They used to be in mainframes from where they came to like your own data centers, where we used to run hundreds of servers to where they're going now, which is the computing marvelous change to consumption-based computing, which is all cloud oriented now. And so, but they are all centralized still, but where our engagement happens with the data is at the edge at your point of convenience, at your point of consumption, not where the data is actually sitting. So this has led to, you know, all those buzzwords, as you said, which is like, oh, well we need a distributed data infrastructure, where is the edge? But it just basically comes down to the fact that the data needs to be there, if you are engaging with it. And that means if you are doing it on your mobile phone, or if you're sitting, but doing something in your while you're traveling, or whether you're in a subway, whether you're in a plane or a ship, wherever the data needs to come to you and be available, as opposed to every time you going to the data, which is centrally sitting in some place. And that is the fundamental shift in terms of how the modern architecture needs to think when they, when it comes to digital transformation and, transitioning their old applications to the, the modern infrastructure, because that's, what's going to define your customer experiences and your personalized experiences. Otherwise, people are basically waiting for that circle of death that we all know, and blaming the networks and other pieces. The problem was actually, the data is not where you are engaging with it. It's got to be fetched, you know, seven sea's away. And that is the problem that we are basically solving in this modern modernization of that data, data infrastructure. >> I love this conversation and I love the fact that there's a technical person that can kind of educate us on, on this because date data by its very nature is distributed. It's always been distributed, but with the distributed database has always been incredibly challenging, whether it was a global SIS Plex or an eventual consistency of getting recovery for a distributed architecture has been extremely difficult. You know, I hate that this is a terrible term, lots of ways to skin a cat, but, but you've been the visionary behind this notion of optionality, how to solve technical problems in different ways. So how do you solve that, that problem of, of, of, of, of a super rock solid database that can handle, you know, distributed data? >> Yes. So there are two issues that you alluded little too over there. The first is the optionality piece of it, which is that same data that you have that requires different types of processing on it. It's almost like fractional distillation. It is like your crude flowing through the system. You start all over from petrol and you can end up with Vaseline and rayon on the other end, but the raw material, that's our data. In one sense. So far, we never treated the data that way. That's part of the problem. It has always been very purpose built and cast first problem. And so you just basically have to recast it every time we want to look at the data. The first thing that we have done is make data that fluid. So when you're actually, when you have the data, you can first look at it to perform. Let's say a simple operation that we call as a key value store operation. Given my ID, give him a password kind of scenarios, which is like, you know, there are customers of ours who have billions of user IDs in their management. So things get slower. How do you make it fast and easily available? Log-in should not take more than five milliseconds, this is, this is a class of problem that we solve that same data. Now, eventually, without you ever having to sort of do a casting it to a different database, you can now do solid queries. Our classic SQL queries, which is our next magic. We are a no SQL database, but we have a full functional SQL. The SQL has been the language that has talked to data for 40 odd years successfully. Every other database has come and tried to implement their own QL query language, but they've all failed only SQL has stood the test of time of 40 odd years. Why? Because there's a solid mathematics behind it. It's called a relational calculus. And what that helps you is, is basically a look at the data and any common editorial, any, any which way you look at the data, all it will come, the data in a format that you can consume. That's the guarantee sort of gives you in one sense. And because of that, you can now do some really complex in the database signs, what we call us, predicate logic on top of that. And that gives you the ability to do the classic relational type queries select star from where, kind of stuff, because it's at an English level becomes easy to so the same day that you didn't have to go move it to another database, do your sort of transformation of the data and all the stuff, same day that you do this. Now that's where the optionality comes in. Now you can do another piece of logic on top of this, which we call search. This is built on this concept of inverted index and TF IDF, the classic Google in a very simple terms, what Google tokenized search, you can do that in the same data without you ever having to move the data to a different format. And then on top of it, they can do what is known as a eventing or your own custom logic, which we all which we do on a, on programming language called Java script. And finally analytics and analytics is the, your ability to query the operational data in a different way. And talk querying, what was my sales of this widget year over year on December 1st week, that's a very complex question to ask, and it takes a lot of different types of processing. So these are different types of that's optionality with different types of processing on the same data without you having to go to five different systems without you having to recast the data in five different ways and apply different application logic. So you put them in one place. Now is your second question. Now this has got to be distributed and made available in multiple cloud in your data center, all the way to the edge, which is the operational side of the, the database management system. And that's where the distributed platform that we have built enables us to get it to where you need the data to be, you know, in the classic way we call it CDN'ing the data as in like content delivery networks. So far do static, sort of moving of static content to the edges. Now we can actually dynamically move the data. Now imagine the richness of applications you can develop. >> And on the first part of, of the, the, the answer to my question, are you saying you could do this without scheme with a no schema on, right? And then you can apply those techniques. >> Fantastic question. Yes. That's the brilliance of this database is that so far classically databases have always demanded that you first define a schema before you can write a single byte of data. Couchbase is one of the rare databases. I, for one don't know any other one, but there could be, let's give the benefit of doubt. It's a database which writes data first and then late binds to schema as we call it. It's a schema on read thing. So, because there is no schema, it is just a Json document that is sitting inside. And Json is the lingua franca of the web, as you very well know by now. So it just Json that we manage, you can do key value look ups of the Json. You can do full credit capability, like a classic relational database. We even have cost-based optimizers and other sophisticated pieces of technology behind it. You can do searching on it, using the, the full textual analysis pipeline. You can do ad hoc webbing on the analytics side, and you can write your own custom logic on it using or inventing capabilities. So that's, that's what it allows because we keep the data in the native form of Json. It's not a data structure or a data schema imposed by a database. It is how the data is produced. And on top of it, bring, we bring different types of logic, five different types of it's like the philosophy is bringing logic to data as opposed to moving data to logic. This is what we have been doing in the last 40 years, because we developed various database systems and data processing systems at various points in time in our history, we had key value stores. We had relational systems, we had search systems, we had analytical systems. We had queuing systems, all these systems, if you want to use any one of them are answered. It always been, just move the data to that system. Versus we are saying that do not move the data as we get bigger and bigger and data just moving this data is going to be a humongous problem. If you're going to be moving petabytes of data for this, it's not going to fly instead, bring the logic to the data, right? So you can now apply different types of logic to the data. I think that's what, in one sense, the optionality piece of this. >> But as you know, there's plenty of schema-less data stores. They're just, they're called data swamps. I mean, that's what they, that's what they became, right? I mean, so this is some, some interesting magic that you're applying here. >> Yes. I mean, the one problem with the data swamps as you call them is that that was a little too open-ended because the data format itself could change. And then you do your, then everything became like a game data recasting because it required you to have it in seven schema in one sense at, at the end of the day, for certain types of processing. So in that where a lot of gaps it's probably related, but it not really, how do you say keep to the promise that it actually meant to be? So that's why it was a swamp I mean, because it was fundamentally not managing the data. The data was sitting in some file system, and then you are doing something, this is a classic database where the data is managed and you create indexes to manage it. And you create different types of indexes to manage it. You distribute the index, you distribute the data you have, like we were discussing, you have ACID semantics on top of, and when you, when you put all these things together, it's, it's, it's a tough proposition, but we have solved some really tough problems, which are good computer science stuff, computer science problems that we have to solve to bring this, to bring this, to bear, to bring this to the market. >> So you predicted the trend around multimodal and converged databases. You kind of led Couchbase through that. I, I want, I always ask this question because it's clearly a trend in the industry and it, and it definitely makes sense from a simplification standpoint. And, and, and so that I don't have to keep switching databases or the flip side of that though, Ravi. And I wonder if you could give me your opinion on this is kind of the right tool for the right job. So I often say isn't that the Swiss army knife approach, where you have have a little teeny scissors and a knife, that's not that sharp. How, how do you respond to that? >> A great one. My answer is always, I use another analogy to tackle that, and is that, have you ever accused a smartphone of being a Swiss army knife? - No. No. >> Nobody does. That because it actually 40 functions in one is what a smartphone becomes. You never call your iPhone or your Android phone, a Swiss army knife, because here's the reason is that you can use that same device in the full capacity. That's what optionality is. It's not, I'm not, it's not like your good old one where there's a keyboard hiding half the screen, and you can do everything only through the keyboard without touching and stuff like that. That's not the whole devices available to you to do one type of processing when you want it. When you're done with that, it can do another completely different types of processing. Right? As in a moment, it could be a TomTom, telling you all the directions, the next one, it's your PDA. Third one. It's a fantastic phone. Four. It's a beautiful camera which can do your f-stop management and give you a nice SLR quality picture. Right? So next moment, it's the video camera. People are shooting movies with this thing in Hollywood, these days for God's sake. So it gives you the full power of what you want to do when you want it. And now, if you just thought that iPhone is a great device or any smartphone is a great device, because you can do five things in one or 50 things in one, and at a certain level, he missed the point because what that device really enabled is not just these five things in one place. It becomes easy to consume and easy to operate. It actually started the app based economy. That's the brilliance of bringing so many things in one place, because in the morning, you know, I get an alert saying that today you got to leave home at >> 8: 15 for your nine o'clock meeting. And the next day it might actually say 8 45 is good enough because it knows where the phone is sitting. The geo position of it. It knows from my calendar where the meeting is actually happening. It can do a traffic calculation because it's got my map and all of the routes. And then it's got this notification system, which eventually pops up on my phone to say, Hey, you got to leave at this time. Now five different systems have to come together and they can because the data is in one place. Without that, you couldn't even do this simple function in a, in a sort of predictable manner in a, in a, in a manner that's useful to you. So I believe a database which gives you this optionality of doing multiple data processing on the same set of data allows you will allow you to build a class of products, which you are so far been able to struggling to build. Because half the time you're running sideline to sideline, just, you know, integrating data from one system to the other. >> So I love the analogy with the smartphone. I want to, I want to continue it and double click on it. So I use this camera. I used to, you know, my kid had a game. I would bring the, the, the big camera, the 35 millimeter. So I don't use that anymore no way, but my wife does, she still uses the DSLR. So is, is there a similar analogy here? That those, and by the way, the camera, the camera shop in my town went out of business, you know? So, so, but, but is there, is that a fair and where, in other words, those specialized databases, they say there still is a place for them, but they're getting. >> Absolutely, absolutely great analogy and a great extension to the question. That's like, that's the contrarian side of it in one sense is that, Hey, if everything can just be done in one, do you have a need for the other things? I mean, you gave a camera example where it is sort of, it's a, it's a slippery slope. Let me give you another one, which is actually less straight to the point better. I've been just because my, I, I listened to half of my music on the iPhone. Doesn't stop me from having my full digital receiver. And, you know, my Harman Kardon speakers at home because they, I mean, they produce a kind of sounded immersive experience. This teeny little speaker has never in its lifetime intended to produce, right? It's the convenience. Yes. It's the convenience of convergence that I can put my earphones on and listen to all the great music. Yes, it's 90% there or 80% there. It depends on your audio file-ness of your, I mean, your experience super specialized ones do not go away. You know, there are, there are places where the specialized use cases will demand a separate system to exist. But even there that has got to be very closed. How do you say close, binding or late binding? I should be able to stream that song from my phone to that receiver so I can get it from those speakers. You can say that all, there's a digital divide between these two things done, and I can only play CDs on that one. That's not how it's going to work going forward. It's going to be, this is the connected world, right? As in, if I'm listening to the song in my car and then step off the car, walk into my living room, that same songs should continue and play in my living room speakers. Then it's a connected world because it knows my preference and what I'm doing that all happened only because of this data flowing between all these systems. >> I love, I love that example too. When I was a kid, we used to go to Tweeter, et cetera. And we used to play around with three, take home, big four foot speakers. Those stores are out of business too. Absolutely. And now we just plug into Sonos. So that is the debate between relational and non-relational databases over Ravi? >> I believe so, because I think what had happened was relational systems. I've mean where the norm, they rule the roost, if you will, for the last 40 odd years and then gain this no SQL movement, which was almost as though a rebellion from the relational world, we all inhabited because we, it was very restrictive. It, it had the schema definition and the schema evolution as we call it, all those things, they were like, they required a committee. They required your DBA and your data architect. And you had to call them just to add one column and stuff like that. And the world had moved on. This was a world of blogs and tweets and, you know, mashups and a different generation of digital behavior, There are digital, native people now who are operating in these and the, the applications, the, the consumer facing applications. We are living in this world. And yet the enterprise ones were still living in the, in the other, the other side of the divide. So out came this solution to say that we don't need SQL. Actually the problem was never SQL. No SQL was, you know, best approximation, good marketing name, but from a technologist perspective, the problem was never the query language, no SQL was not the problem, the schema limitations and the inability for these, the system to scale, the relational systems were built like airplanes, which is that if a San Francisco, Boston, there is a flight route, it's so popular that if you want to add 50 more seats to it, the only way you can do that is to go back to Boeing and ask them to get you a set from 7 3 7 2 7 7 7, or whatever it is. And they'll stick you with a billion dollar bill on the allowance that you'll somehow pay that by, you know, either flying more people or raising the rates or whatever you have to do. These are all vertically scaling systems. So relational systems are vertically scaling. They are expensive. Versus what we have done in this modern world is make the system horizontally scaling, which is more like the same thing. If it's a train that is going from San Francisco to Boston, you need 50 more people be my guest. I'll add one more coach to it, one more car to it. And the better part of the way we have done this here is that, and we are super specialized on that. This route actually requires three, three dining cars and only 10 sort of sleeper cars or whatever. Then just pick those and attach the next route. You can choose to have, I need only one dining car. That's good enough. So the way you scale the plane is also can be customized based on the route along the route, more, more dining capabilities, shorter route, not an abandoned capability. You can attach the kind of coaches we call this multidimensional scaling. Not only do we scale horizontally, we can scale to different types of workloads by adding different types of coaches to it, right? So that's the beauty of this architecture. Now, why is that architecture important? Is that where we land eventually is the ability to do operational and analytical in the same place. This is another thing which doesn't happen in the past, because, you would say that I cannot run this analytical query because then my operational workload will suffer. Then my front end, then we'll slow down millions of customers that impacted that problem. They'll solve the same data once again, do analytical query, an operational query because they're separated by these cars, right? As in like we, we, we fence the, the, the resources so that one doesn't impede the other. So you can, at the same time, have a microsecond 10 million ops per second, happening of a key value or a query. And then yet you can run this analytical query, which will take a couple of minutes to them. One, not impeding the other. So that's in one sense, sort of the part of the problems that we have solved it here is that relational versus the no SQL portion of it. These are the kinds of problems we have to solve. We solve those. And then we yet put back the same query language on top. Why? It's like Tesla in one sense, right underneath the surface is where all the stuff that had to be changed had to change, which is like the gasoline, the internal combustion engine the gas, you says, these were the issues we really wanted to solve. So solve that, change the engine out, you don't need to change the steering wheel or the gas pedal or the, you know, the battle shifters or whatever else you need, over there your gear shifters. Those need to remain in the same place. Otherwise people won't buy it. Otherwise it does not even look like a car to people. So even when you feed people, the most advanced technology, it's got to be accessible to them in the manner that people can consume. Only in software, we forget this first design principle, and we go and say that, well, I got a car here, you got the blow harder to go fast. And they lean back for, for it to, you know, to apply a break that's, that's how we seem to define design software. Instead, we shouldn't be designing them in a manner that it is easiest for our audience, which is developers to consume. And they've been using SQL for 40 years or 30 years. And so we give them the steering wheel on the, and the gas pedal and the, and the gear shifters by putting SQL back on underneath the surface, we have completely solved the relational limitations of schema, as well as scalability. So in, in, in that way, and by bringing back the classic ACID capabilities, which is what relational systems we accounted on, and being able to do that with the SQL programming language, we call it like multi-statement SQL transaction. So to say, which is what a classic way all the enterprise software was built by putting that back. Now, I can say that that debate between relational and non-relational is over because this has truly extended the database to solve the problems that the relational systems had to grow up to solve in the modern times, rather than get sort of pedantic about whether it's we have no SQL or SQL or new SQL, or, you know, any of that sort of jargon oriented debate. This is, these are the debates of computer science that they are actually, and they were the solve, and they have solved them with the latest release of 7.0, which we released a few months ago. >> Right, right. Last July, Ravi, we got got to leave it there. I love the examples and the analogies. I can't wait to be face-to-face with you. I want to hang with you at the cocktail party because I've learned so much and really appreciate your time. Thanks for coming to the cube. >> Fantastic. Thanks for the time. And the opportunity I was, I mean, very insightful questions really appreciate it. - Thank you. >> Okay. This is Dave Volante. We're covering Couchbase connect online, keep it right there for more great content on the cube.

Published Date : Oct 1 2021

SUMMARY :

of engineering and the CTO Thank you so much. And how do you put that into And that is the problem that that can handle, you know, the data in a format that you can consume. the answer to my question, the data to that system. But as you know, the data is managed and you So I often say isn't that the have you ever accused a place, because in the morning, you know, And the next day it might So I love the analogy with my music on the iPhone. So that is the debate between So the way you scale the plane I love the examples and the analogies. And the opportunity I was, I mean, great content on the cube.

ENTITIES

Entity	Category	Confidence
San Francisco	LOCATION	0.99+
Boston	LOCATION	0.99+
90%	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Ravi Mayuram	PERSON	0.99+
40 years	QUANTITY	0.99+
80%	QUANTITY	0.99+
second question	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
five things	QUANTITY	0.99+
Ravi	PERSON	0.99+
today	DATE	0.99+
40 odd years	QUANTITY	0.99+
30 years	QUANTITY	0.99+
one	QUANTITY	0.99+
Last July	DATE	0.99+
50 more seats	QUANTITY	0.99+
35 millimeter	QUANTITY	0.99+
three	QUANTITY	0.99+
five things	QUANTITY	0.99+
Harman Kardon	ORGANIZATION	0.99+
SQL	TITLE	0.99+
two issues	QUANTITY	0.99+
nine o'clock	DATE	0.99+
40 functions	QUANTITY	0.99+
five different systems	QUANTITY	0.99+
Sonos	ORGANIZATION	0.99+
Java	TITLE	0.99+
Tesla	ORGANIZATION	0.99+
50 more people	QUANTITY	0.99+
millions	QUANTITY	0.99+
50 things	QUANTITY	0.99+
one more car	QUANTITY	0.99+
one place	QUANTITY	0.99+
one more coach	QUANTITY	0.99+
one place	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
two things	QUANTITY	0.98+
first	QUANTITY	0.98+
Couchbase	ORGANIZATION	0.98+
one sense	QUANTITY	0.98+
December 1st week	DATE	0.98+
five different systems	QUANTITY	0.98+
first part	QUANTITY	0.98+
Android	TITLE	0.98+
Third one	QUANTITY	0.97+
Four	QUANTITY	0.97+
next day	DATE	0.96+
first thing	QUANTITY	0.96+
Json	TITLE	0.96+
8 45	OTHER	0.95+
SIS Plex	TITLE	0.95+
Boeing	ORGANIZATION	0.95+
one problem	QUANTITY	0.95+
one column	QUANTITY	0.94+
more than five milliseconds	QUANTITY	0.94+
three dining cars	QUANTITY	0.94+
One	QUANTITY	0.94+
one system	QUANTITY	0.94+
10 sort of sleeper cars	QUANTITY	0.93+
8: 15	DATE	0.93+
billion dollar	QUANTITY	0.92+
one dining car	QUANTITY	0.92+
first problem	QUANTITY	0.92+
English	OTHER	0.92+

Frank Keynote with Disclaimer

>>Hi, I'm Frank's Luqman CEO of Snowflake. And welcome to the Snowflake Data Cloud Summit. I'd like to take the next few minutes to introduce you to >>the data cloud on why it matters to the modern enterprise. As an industry, we have struggled to mobilize our data, meaning that has been hard to put data into service of our enterprises. We're not living in a data economy and for most data central how we run our lives, our businesses and our institutions, every single interaction we have now, whether it's in social media, e commerce or any other service, engagement generates critical data. You multiply this out with the number of actors and transactions. The volume is overwhelming, growing in leaps and bounds every day. There was a time when data operations focused mostly on running reports and populating dashboards to inform people in the enterprise of what had happened on what was going on. And we still do a ton of that. But the emphasis is shifting to data driving operations from just data informing people. There is such a thing as the time value off data meaning that the faster data becomes available, the more impactful and valuable it ISS. As data ages, it loses much of its actionable value. Digital transformation is an overused term in our industry, but the snowflake it means the end to end automation of business processes, from selling to transacting to supporting to servicing customers. Digital processes are entirely disinter mediated in terms of people. Involvement in are driven into end by data. Of course, many businesses have both physical and digital processes, and they are >>intertwined. Think of retail, logistics, delivery services and so on. So a data centric operating discipline is no longer optional data operations Air now the beating heart >>of the modern enterprise that requires a massively scalable data platform talented data engineering and data science teams to fully exploit the technology that now is becoming available. Enter snowflake. Chances are that, you know, snowflake as a >>world class execution platform for a diverse set of workloads. Among them data warehousing, data engineering, data, lakes, data, science, data applications and data sharing. Snowflake was architected from scratch for cloud scale computing. No legacy technology was carried forward in the process. Snowflake reimagined many aspects of data management data operations. The result was a cloud data platform with massive scale, blistering performance, superior economics and world class data governance. Snowflake innovated on a number of vectors that wants to deliver this breakthrough. First scale and performance. Snowflake is completely designed for cloud scale computing, both in terms of data volume, computational performance and concurrent workload. Execution snowflake features numerous distinct innovations in this category, but none stands up more than the multi cluster shared stories. Architectural Removing the control plane from the individual cluster led to a dramatically different approach that has yielded tremendous benefits. But our customers love about Snowflake is to spin up new workloads without limitation and provisioned these workloads with his little or as much compute as they see fit. No longer do they fear hidden capacity limits or encroaching on other workloads. Customers can have also scale storage and compute independent of each other, something that was not possible before second utility and elasticity. Not only can snowflake customer spin up much capacity for as long as they deem necessary. Three. Utility model in church, they only get charged for what they consumed by the machine. Second, highly granular measurement of utilization. Ah, lot of the economic impact of snowflake comes from the fact that customers no longer manage capacity. What they do now is focused on consumption. In snowflake is managing the capacity. Performance and economics now go hand in hand because faster is now also cheaper. Snowflake contracts with the public cloud vendors for capacity at considerable scale, which then translates to a good economic value at the retail level is, well, third ease of use and simplicity. Snowflake is a platform that scales from the smallest workloads to the largest data estates in the world. It is unusual in this offer industry to have a platform that controversy the entire spectrum of scale, a database technology snowflake is dramatically simple fire. To compare to previous generations, our founders were bent on making snowflake, a self managing platform that didn't require expert knowledge to run. The role of the Deba has evolved into snowflake world, more focused on data model insights and business value, not tuning and keeping the infrastructure up and running. This has expanded the marketplace to nearly any scale. No job too small or too large. Fourth, multi cloud and Cross Cloud or snowflake was first available on AWS. It now also runs very successfully on mark yourself. Azure and Google Cloud Snowflake is a cloud agnostic platform, meaning that it doesn't know what it's running on. Snowflake completely abstracts the underlying cloud platform. The user doesn't need to see or touch it directly and also does not receive a separate bill from the cloud vendor for capacity consumed by snowflake. Being multi cloud capable customers have a choice and also the flexibility to change over time snowflakes. Relationships with Amazon and Microsoft also allow customers to transact through their marketplaces and burned down their cloud commit with their snowflakes. Spend Snowflake is also capable of replicating across cloud regions and cloud platforms. It's not unusual to see >>the same snowflake data on more than one public cloud at the time. Also, for disaster recovery purposes, it is desirable to have access to snowflake on a completely different public cloud >>platform. Fifth, data Security and privacy, security and privacy are commonly grouped under the moniker of data governance. As a highly managed cloud data platform, snowflake designed and deploys a comprehensive and coherent security model. While privacy requirements are newer and still emerging in many areas, snowflake as a platform is evolving to help customers steer clear from costly violations. Our data sharing model has already enabled many customers to exchange data without surrendering custody of data. Key privacy concerns There's no doubt that the strong governance and compliance framework is critical to extracting you analytical value of data directly following the session. Police Stay tuned to hear from Anita Lynch at Disney Streaming services about how >>to date a cloud enables data governance at Disney. The world beat a >>path to our door snowflake unleashed to move from UN promised data centers to the public cloud platforms, notably AWS, Azure and Google Cloud. Snowflake now has thousands of enterprise customers averaging over 500 million queries >>today across all customer accounts, and it's one of the fastest growing enterprise software companies in a generation. Our recent listing on the New York Stock Exchange was built is the largest software AIPO in history. But the data cloth conversation is bigger. There is another frontier workload. Execution is a huge part of it, but it's not the entire story. There is another elephant in the room, and that is that The world's data is incredibly fragmented in siloed, across clouds of old sorts and data centers all over the place. Basically, data lives in a million places, and it's incredibly hard to analyze data across the silos. Most intelligence analytics and learning models deploy on single data sets because it has been next to impossible to analyze data across sources. Until now, Snowflake Data Cloud is a data platform shared by all snowflake users. If you are on snowflake, you are already plugged into it. It's like being part of a Global Data Federation data orbit, if you will, where all other data can now be part of your scope. Historically, technology limitations led us to build systems and services that siloed the data behind systems, software and network perimeters. To analyze data across silos, we resorted to building special purpose data warehouses force fed by multiple data sources empowered by expensive proprietary hardware. The scale limitations lead to even more silos. The onslaught of the public cloud opened the gateway to unleashing the world's data for access for sharing a monetization. But it didn't happen. Pretty soon they were new silos, different public clouds, regions within the and a huge collection of SAS applications hoarding their data all in their own formats on the East NC ations whole industries exist just to move data from A to B customer behavior precipitated the silo ing of data with what we call a war clothes at a time mentality. Customers focused on the applications in isolation of one another and then deploy data platforms for their workload characteristics and not much else, thereby throwing up new rules between data. Pretty soon, we don't just have our old Silas, but new wants to content with as well. Meanwhile, the promise of data science remains elusive. With all this silo ing and bunkering of data workload performance is necessary but not sufficient to enable the promise of data science. We must think about unfettered data access with ease, zero agency and zero friction. There's no doubt that the needs of data science and data engineering should be leading, not an afterthought. And those needs air centered on accessing and analyzing data across sources. It is now more the norm than the exception that data patterns transcend data sources. Data silos have no meaning to data science. They are just remnants of legacy computing. Architectures doesn't make sense to evaluate strictly on the basis of existing workloads. The world changes, and it changes quickly. So how does the data cloud enabled unfettered data access? It's not just a function of being in the public cloud. Public Cloud is an enabler, no doubt about it. But it introduces new silos recommendation by cloud, platform by cloud region by Data Lake and by data format, it once again triggered technical grandstands and a lot of programming to bring a single analytical perspective to a diversity of data. Data was not analytics ready, not optimized for performance or efficiency and clearly lacking on data governance. Snowflake, address these limitations, thereby combining great execution with great data >>access. But, snowflake, we can have the best of both. So how does it all work when you join Snowflake and have your snowflake account? You don't just >>avail yourself of unlimited stories. And compute resource is along with a world class execution platform. You also plug into the snowflake data cloud, meaning that old snowflake accounts across clouds, regions and geography are part of a single snowflake data universe. That is the data clouds. It is based on our global data sharing architectures. Any snowflake data can be exposed and access by any other snowflake user. It's seamless and frictionless data is generally not copied. Her moves but access in place, subject to the same snowflake governance model. Accessing the data cloth can be a tactical one on one sharing relationship. For example, imagine how retailer would share data with a consumer back. It's good company, but then it easily proliferate from 1 to 1. Too many too many. The data cloud has become a beehive of data supply and demand. It has attracted hundreds of professional data listings to the Snowflake Data Marketplace, which fuels the data cloud with a rich supply of options. For example, our partner Star Schema, listed a very detailed covert 19 incident and fatality data set on the Snowflake Data Marketplace. It became an instant hit with snowflake customers. Scar schema is not raw data. It is also platform optimize, meaning that it was analytics ready for all snowflake accounts. Snowflake users were accessing, joining and overlaying this new data within a short time of it becoming available. That is the power of platform in financial services. It's common to see snowflake users access data from snowflake marketplace listings like fax set and Standard and Poor's on, then messed it up against for example. Salesforce data There are now over 100 suppliers of data listings on the snowflake marketplace That is, in addition to thousands of enterprise and institutional snowflake users with their own data sets. Best part of the snowflake data cloud is this. You don't need to do or buy anything different. If your own snowflake you're already plugged into the data clouds. A whole world data access options awaits you on data silos. Become a thing of the past, enjoy today's presentations. By the end of it, you should have a better sense in a bigger context for your choices of data platforms. Thank you for joining us.

Published Date : Nov 19 2020

SUMMARY :

I'd like to take the next few minutes to introduce you to term in our industry, but the snowflake it means the end to end automation of business processes, So a data centric operating discipline is no longer optional data operations Air now the beating of the modern enterprise that requires a massively scalable data platform talented This has expanded the marketplace to nearly any scale. the same snowflake data on more than one public cloud at the time. no doubt that the strong governance and compliance framework is critical to extracting you analytical value to date a cloud enables data governance at Disney. centers to the public cloud platforms, notably AWS, Azure and Google Cloud. The onslaught of the public cloud opened the gateway to unleashing the world's data you join Snowflake and have your snowflake account? That is the data clouds.

ENTITIES

Entity	Category	Confidence
Anita Lynch	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Disney	ORGANIZATION	0.99+
New York Stock Exchange	ORGANIZATION	0.99+
Global Data Federation	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
second	QUANTITY	0.99+
today	DATE	0.99+
19 incident	QUANTITY	0.99+
Second	QUANTITY	0.99+
Fourth	QUANTITY	0.98+
both	QUANTITY	0.98+
over 500 million queries	QUANTITY	0.98+
Standard and Poor	ORGANIZATION	0.98+
Snowflake Data Cloud Summit	EVENT	0.98+
over 100 suppliers	QUANTITY	0.98+
Star Schema	ORGANIZATION	0.98+
Fifth	QUANTITY	0.98+
Data Lake	ORGANIZATION	0.98+
Three	QUANTITY	0.97+
Snowflake	ORGANIZATION	0.97+
one	QUANTITY	0.97+
1	QUANTITY	0.96+
Snowflake	TITLE	0.96+
Frank Keynote	PERSON	0.95+
thousands of enterprise customers	QUANTITY	0.95+
first	QUANTITY	0.95+
single snowflake	QUANTITY	0.91+
snowflake	TITLE	0.91+
third	QUANTITY	0.9+
single	QUANTITY	0.9+
more than one public cloud	QUANTITY	0.86+
thousands of enterprise	QUANTITY	0.84+
Cloud	TITLE	0.84+
Snowflake Data Cloud	TITLE	0.84+
Frank	ORGANIZATION	0.83+
single data	QUANTITY	0.83+
Salesforce	ORGANIZATION	0.81+
a million places	QUANTITY	0.8+
hundreds of professional data listings	QUANTITY	0.8+
Azure	TITLE	0.78+
snowflake users	QUANTITY	0.78+
zero	QUANTITY	0.77+
zero friction	QUANTITY	0.74+
East NC	LOCATION	0.73+
Scar schema	ORGANIZATION	0.73+
First scale	QUANTITY	0.71+
Google Cloud Snowflake	TITLE	0.65+
UN	ORGANIZATION	0.63+
Silas	TITLE	0.59+
Spend	TITLE	0.51+
Google	ORGANIZATION	0.51+
Public Cloud	TITLE	0.49+
snowflake	EVENT	0.48+
Luqman	ORGANIZATION	0.44+
AIPO	ORGANIZATION	0.43+
SAS	TITLE	0.42+

Paula D'Amico, Webster Bank | Io Tahoe | Enterprise Data Automation

>>from around the globe. It's the Cube with digital coverage of enterprise data automation, an event Siri's brought to you by Iot. Tahoe, >>my buddy, We're back. And this is Dave Volante, and we're covering the whole notion of automating data in the Enterprise. And I'm really excited to have Paul Damico here. She's a senior vice president of enterprise data Architecture at Webster Bank. Good to see you. Thanks for coming on. >>Hi. Nice to see you, too. Yes. >>So let's let's start with Let's start with Webster Bank. You guys are kind of a regional. I think New York, New England, uh, leave headquartered out of Connecticut, but tell us a little bit about the bank. >>Yeah, Um, Webster Bank >>is regional Boston And that again, and New York, Um, very focused on in Westchester and Fairfield County. Um, they're a really highly rated saying regional bank for this area. They, um, hold, um, quite a few awards for the area for being supportive for the community and, um, are really moving forward. Technology lives. They really want to be a data driven bank, and they want to move into a more robust Bruce. >>Well, we got a lot to talk about. So data driven that is an interesting topic. And your role as data architect. The architecture is really senior vice president data architecture. So you got a big responsibility as it relates to It's kind of transitioning to this digital data driven bank. But tell us a little bit about your role in your organization, >>right? Um, currently, >>today we have, ah, a small group that is just working toward moving into a more futuristic, more data driven data warehouse. That's our first item. And then the other item is to drive new revenue by anticipating what customers do when they go to the bank or when they log into there to be able to give them the best offer. The only way to do that is you >>have uh huh. >>Timely, accurate, complete data on the customer and what's really a great value on off something to offer that or a new product or to help them continue to grow their savings or do and grow their investment. >>Okay. And I really want to get into that. But before we do and I know you're sort of part way through your journey, you got a lot of what they do. But I want to ask you about Cove. It how you guys you're handling that? I mean, you had the government coming down and small business loans and P p p. And huge volume of business and sort of data was at the heart of that. How did you manage through that? >>But we were extremely successful because we have a big, dedicated team that understands where their data is and was able to switch much faster than a larger bank to be able to offer. The TPP longs at to our customers within lightning speeds. And part of that was is we adapted to Salesforce very, for we've had salesforce in house for over 15 years. Um, you know, pretty much, uh, that was the driving vehicle to get our CPP is loans in on and then developing logic quickly. But it was a 24 7 development role in get the data moving, helping our customers fill out the forms. And a lot of that was manual. But it was a It was a large community effort. >>Well, think about that. Think about that too. Is the volume was probably much, much higher the volume of loans to small businesses that you're used to granting. But and then also, the initial guidelines were very opaque. You really didn't know what the rules were, but you were expected to enforce them. And then finally, you got more clarity. So you had to essentially code that logic into the system in real time, right? >>I wasn't >>directly involved, but part of my data movement Team Waas, and we had to change the logic overnight. So it was on a Friday night was released. We've pushed our first set of loans through and then the logic change, Um, from, you know, coming from the government and changed. And we had to re develop our our data movement piece is again and we design them and send them back. So it was It was definitely kind of scary, but we were completely successful. We hit a very high peak and I don't know the exact number, but it was in the thousands of loans from, you know, little loans to very large loans, and not one customer who buy it's not yet what they needed for. Um, you know, that was the right process and filled out the rate and pace. >>That's an amazing story and really great support for the region. New York, Connecticut, the Boston area. So that's that's fantastic. I want to get into the rest of your story. Now let's start with some of the business drivers in banking. I mean, obviously online. I mean, a lot of people have sort of joked that many of the older people who kind of shunned online banking would love to go into the branch and see their friendly teller had no choice, You know, during this pandemic to go to online. So that's obviously a big trend you mentioned. So you know the data driven data warehouse? I wanna understand that. But well, at the top level, what were some of what are some of the key business drivers there catalyzing your desire for change? >>Um, the ability to give the customer what they need at the time when they need it. And what I mean by that is that we have, um, customer interactions in multiple ways, right? >>And I want >>to be able for the customer, too. Walk into a bank, um, or online and see the same the same format and being able to have the same feel, the same look, and also to be able to offer them the next best offer for them. But they're you know, if they want looking for a new a mortgage or looking to refinance or look, you know, whatever it iss, um, that they have that data, we have the data and that they feel comfortable using it. And that's a untethered banker. Um, attitude is, you know, whatever my banker is holding and whatever the person is holding in their phone, that that is the same. And it's comfortable, so they don't feel that they've, you know, walked into the bank and they have to do a lot of different paperwork comparative filling out paperwork on, you know, just doing it on their phone. >>So you actually want the experience to be better. I mean, and it is in many cases now, you weren't able to do this with your existing against mainframe based Enterprise data warehouse. Is is that right? Maybe talk about that a little bit. >>Yeah, we were >>definitely able to do it with what we have today. The technology we're using, but one of the issues is that it's not timely, Um, and and you need a timely process to be able to get the customers to understand what's happening. Um, you want you need a timely process so we can enhance our risk management. We can apply for fraud issues and things like that. >>Yeah, so you're trying to get more real time in the traditional e g W. It's it's sort of a science project. There's a few experts that know how to get it. You consider line up. The demand is tremendous, and often times by the time you get the answer, you know it's outdated. So you're trying to address that problem. So So part of it is really the cycle time, the end end cycle, time that you're pressing. And then there's if I understand it, residual benefits that are pretty substantial from a revenue opportunity. Other other offers that you can you can make to the right customer, Um, that that you, you maybe know through your data. Is that right? >>Exactly. It's drive new customers, Teoh new opportunities. It's enhanced the risk, and it's to optimize the banking process and then obviously, to create new business. Um, and the only way we're going to be able to do that is that we have the ability to look at the data right when the customer walks in the door or right when they open up their app. And, um, by doing, creating more to New York time near real time data for the data warehouse team that's giving the lines of business the ability to to work on the next best offer for that customer. >>Paulo, we're inundated with data sources these days. Are there their data sources that you maybe maybe had access to before? But perhaps the backlog of ingesting and cleaning and cataloging and you know of analyzing. Maybe the backlog was so great that you couldn't perhaps tap some of those data sources. You see the potential to increase the data sources and hence the quality of the data, Or is that sort of premature? >>Oh, no. Um, >>exactly. Right. So right now we ingest a lot of flat files and from our mainframe type of Brennan system that we've had for quite a few years. But now that we're moving to the cloud and off Prem and on France, you know, moving off Prem into like an s three bucket. Where That data king, We can process that data and get that data faster by using real time tools to move that data into a place where, like, snowflake could utilize that data or we can give it out to our market. >>Okay, so we're >>about the way we do. We're in batch mode. Still, so we're doing 24 hours. >>Okay, So when I think about the data pipeline and the people involved, I mean, maybe you could talk a little bit about the organization. I mean, you've got I know you have data. Scientists or statisticians? I'm sure you do. Ah, you got data architects, data engineers, quality engineers, you know, developers, etcetera, etcetera. And oftentimes, practitioners like yourself will will stress about pay. The data's in silos of the data quality is not where we want it to be. We have to manually categorize the data. These are all sort of common data pipeline problems, if you will. Sometimes we use the term data ops, which is kind of a play on Dev Ops applied to the data pipeline. I did. You just sort of described your situation in that context. >>Yeah. Yes. So we have a very large data ops team and everyone that who is working on the data part of Webster's Bay has been there 13 14 years. So they get the data, they understand that they understand the lines of business. Um, so it's right now, um, we could we have data quality issues, just like everybody else does. We have. We have places in him where that gets clans, Um, and we're moving toward. And there was very much silo data. The data scientists are out in the lines of business right now, which is great, cause I think that's where data science belongs. We should give them on. And that's what we're working towards now is giving them more self service, giving them the ability to access the data, um, in a more robust way. And it's a single source of truth. So they're not pulling the data down into their own like tableau dashboards and then pushing the data back out. Um, so they're going to more not, I don't want to say a central repository, but a more of a robust repository that's controlled across multiple avenues where multiple lines of business can access. That said, how >>got it? Yes, and I think that one of the key things that I'm taking away from your last comment is the cultural aspects of this bite having the data. Scientists in the line of business, the line of lines of business, will feel ownership of that data as opposed to pointing fingers, criticizing the data quality they really own that that problem, as opposed to saying, Well, it's it's It's Paulus problem, >>right? Well, I have. My problem >>is, I have a date. Engineers, data architects, they database administrators, right, Um, and then data traditional data forwarding people. Um, and because some customers that I have that our business customers lines of business, they want to just subscribe to a report. They don't want to go out and do any data science work. Um, and we still have to provide that. So we still want to provide them some kind of regimen that they wake up in the morning and they open up their email. And there's the report that they just drive, um, which is great. And it works out really well. And one of the things is why we purchase I o waas. I would have the ability to give the lines of business the ability to do search within the data. And we read the data flows and data redundancy and things like that help me cleanup the data and also, um, to give it to the data. Analysts who say All right, they just asked me. They want this certain report, and it used to take Okay, well, we're gonna four weeks, we're going to go. We're gonna look at the data, and then we'll come back and tell you what we dio. But now with Iot Tahoe, they're able to look at the data and then, in one or two days of being able to go back and say, yes, we have data. This is where it is. This is where we found that this is the data flows that we've found also, which is that what I call it is the birth of a column. It's where the calm was created and where it went live as a teenager. And then it went to, you know, die very archive. Yeah, it's this, you know, cycle of life for a column. And Iot Tahoe helps us do that, and we do. Data lineage has done all the time. Um, and it's just takes a very long time. And that's why we're using something that has AI and machine learning. Um, it's it's accurate. It does it the same way over and over again. If an analyst leads, you're able to utilize talked something like, Oh, to be able to do that work for you. I get that. >>Yes. Oh, got it. So So a couple things there is in in, In researching Iot Tahoe, it seems like one of the strengths of their platform is the ability to visualize data the data structure and actually dig into it. But also see it, um, and that speeds things up and gives everybody additional confidence. And then the other pieces essentially infusing AI or machine intelligence into the data pipeline is really how you're attacking automation, right? And you're saying it's repeatable and and then that helps the data quality, and you have this virtuous cycle. Is there a firm that and add some color? Perhaps >>Exactly. Um, so you're able to let's say that I have I have seven cause lines of business that are asking me questions and one of the questions I'll ask me is. We want to know if this customer is okay to contact, right? And you know, there's different avenues, so you can go online to go. Do not contact me. You can go to the bank and you can say I don't want, um, email, but I'll take tests and I want, you know, phone calls. Um, all that information. So seven different lines of business asked me that question in different ways once said okay to contact the other one says, you know, customer one to pray All these, You know, um, and each project before I got there used to be siloed. So one customer would be 100 hours for them to do that and analytical work, and then another cut. Another analysts would do another 100 hours on the other project. Well, now I can do that all at once, and I can do those type of searches and say, Yes, we already have that documentation. Here it is. And this is where you can find where the customer has said, you know, you don't want I don't want to get access from you by email, or I've subscribed to get emails from you. >>Got it. Okay? Yeah. Okay. And then I want to come back to the cloud a little bit. So you you mentioned those three buckets? So you're moving to the Amazon cloud. At least I'm sure you're gonna get a hybrid situation there. You mentioned Snowflake. Um, you know what was sort of the decision to move to the cloud? Obviously, snowflake is cloud only. There's not an on Prem version there. So what precipitated that? >>Alright, So, from, um, I've been in >>the data I t Information field for the last 35 years. I started in the US Air Force and have moved on from since then. And, um, my experience with off brand waas with Snowflake was working with G McGee capital. And that's where I met up with the team from Iot to house as well. And so it's a proven. So there's a couple of things one is symptomatic of is worldwide. Now to move there, right, Two products, they have the on frame in the offering. Um, I've used the on Prem and off Prem. They're both great and it's very stable and I'm comfortable with other people are very comfortable with this. So we picked. That is our batch data movement. Um, we're moving to her, probably HBR. It's not a decision yet, but we're moving to HP are for real time data which has changed capture data, you know, moves it into the cloud. And then So you're envisioning this right now in Petrit, you're in the S three and you have all the data that you could possibly want. And that's Jason. All that everything is sitting in the S three to be able to move it through into snowflake and snowflake has proven cto have a stability. Um, you only need to learn in train your team with one thing. Um, aws has is completely stable at this 10.2. So all these avenues, if you think about it going through from, um, you know, this is your your data lake, which is I would consider your s three. And even though it's not a traditional data leg like you can touch it like a like a progressive or a dupe and into snowflake and then from snowflake into sandboxes. So your lines of business and your data scientists and just dive right in, Um, that makes a big, big win. and then using Iot. Ta ho! With the data automation and also their search engine, um, I have the ability to give the data scientists and eight analysts the the way of they don't need to talk to i t to get, um, accurate information or completely accurate information from the structure. And we'll be right there. >>Yes, so talking about, you know, snowflake and getting up to speed quickly. I know from talking to customers you get from zero to snowflake, you know, very fast. And then it sounds like the i o Ta ho is sort of the automation cloud for your data pipeline within the cloud. This is is that the right way to think about it? >>I think so. Um, right now I have I o ta >>ho attached to my >>on Prem. And, um, I >>want to attach it to my offering and eventually. So I'm using Iot Tahoe's data automation right now to bring in the data and to start analyzing the data close to make sure that I'm not missing anything and that I'm not bringing over redundant data. Um, the data warehouse that I'm working off is not a It's an on Prem. It's an Oracle database and its 15 years old. So it has extra data in it. It has, um, things that we don't need anymore. And Iot. Tahoe's helping me shake out that, um, extra data that does not need to be moved into my S three. So it's saving me money when I'm moving from offering on Prem. >>And so that was a challenge prior because you couldn't get the lines of business to agree what to delete or what was the issue there. >>Oh, it was more than that. Um, each line of business had their own structure within the warehouse, and then they were copying data between each other and duplicating the data and using that, uh so there might be that could be possibly three tables that have the same data in it. But it's used for different lines of business. And so I had we have identified using Iot Tahoe. I've identified over seven terabytes in the last, um, two months on data that is just been repetitive. Um, it just it's the same exact data just sitting in a different scheme. >>And and that's not >>easy to find. If you only understand one schema that's reporting for that line of business so that >>yeah, more bad news for the storage companies out there. Okay to follow. >>It's HCI. That's what that's what we were telling people you >>don't know and it's true, but you still would rather not waste it. You apply it to, you know, drive more revenue. And and so I guess Let's close on where you see this thing going again. I know you're sort of part way through the journey. May be you could sort of describe, you know, where you see the phase is going and really what you want to get out of this thing, You know, down the road Midterm. Longer term. What's your vision or your your data driven organization? >>Um, I want >>for the bankers to be able to walk around with on iPad in their hands and be able to access data for that customer really fast and be able to give them the best deal that they can get. I want Webster to be right there on top, with being able to add new customers and to be able to serve our existing customers who had bank accounts. Since you were 12 years old there and now our, you know, multi. Whatever. Um, I want them to be able to have the best experience with our our bankers, and >>that's awesome. I mean, that's really what I want is a banking customer. I want my bank to know who I am, anticipate my needs and create a great experience for me. And then let me go on with my life. And so that is a great story. Love your experience, your background and your knowledge. Can't thank you enough for coming on the Cube. >>No, thank you very much. And you guys have a great day. >>Alright, Take care. And thank you for watching everybody keep it right there. We'll take a short break and be right back. >>Yeah, yeah, yeah, yeah.

Published Date : Jun 25 2020

SUMMARY :

of enterprise data automation, an event Siri's brought to you by Iot. And I'm really excited to have Paul Damico here. Hi. Nice to see you, too. So let's let's start with Let's start with Webster Bank. awards for the area for being supportive for the community So you got a big responsibility as it relates to It's kind of transitioning to And then the other item is to drive new revenue Timely, accurate, complete data on the customer and what's really But I want to ask you about Cove. And part of that was is we adapted to Salesforce very, And then finally, you got more clarity. Um, from, you know, coming from the government and changed. I mean, a lot of people have sort of joked that many of the older people Um, the ability to give the customer what they a new a mortgage or looking to refinance or look, you know, whatever it iss, So you actually want the experience to be better. Um, you want you need a timely process so we can enhance Other other offers that you can you can make to the right customer, Um, and the only way we're going to be You see the potential to Prem and on France, you know, moving off Prem into like an s three bucket. about the way we do. quality engineers, you know, developers, etcetera, etcetera. Um, so they're going to more not, I don't want to say a central criticizing the data quality they really own that that problem, Well, I have. We're gonna look at the data, and then we'll come back and tell you what we dio. it seems like one of the strengths of their platform is the ability to visualize data the data structure and to contact the other one says, you know, customer one to pray All these, You know, So you you mentioned those three buckets? All that everything is sitting in the S three to be able to move it through I know from talking to customers you get from zero to snowflake, Um, right now I have I o ta Um, the data warehouse that I'm working off is And so that was a challenge prior because you couldn't get the lines Um, it just it's the same exact data just sitting If you only understand one schema that's reporting Okay to That's what that's what we were telling people you You apply it to, you know, drive more revenue. for the bankers to be able to walk around with on iPad And so that is a great story. And you guys have a great day. And thank you for watching everybody keep it right there.

ENTITIES

Entity	Category	Confidence
Paul Damico	PERSON	0.99+
Dave Volante	PERSON	0.99+
Webster Bank	ORGANIZATION	0.99+
Westchester	LOCATION	0.99+
Paula D'Amico	PERSON	0.99+
iPad	COMMERCIAL_ITEM	0.99+
New York	LOCATION	0.99+
one	QUANTITY	0.99+
Connecticut	LOCATION	0.99+
100 hours	QUANTITY	0.99+
S three	COMMERCIAL_ITEM	0.99+
15 years	QUANTITY	0.99+
Jason	PERSON	0.99+
France	LOCATION	0.99+
Siri	TITLE	0.99+
first item	QUANTITY	0.99+
three tables	QUANTITY	0.99+
24 hours	QUANTITY	0.99+
thousands	QUANTITY	0.99+
two months	QUANTITY	0.99+
each line	QUANTITY	0.99+
Fairfield County	LOCATION	0.99+
HP	ORGANIZATION	0.99+
Friday night	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Two products	QUANTITY	0.99+
Boston	LOCATION	0.99+
four weeks	QUANTITY	0.99+
US Air Force	ORGANIZATION	0.98+
over 15 years	QUANTITY	0.98+
two days	QUANTITY	0.98+
New England	LOCATION	0.98+
each project	QUANTITY	0.98+
today	DATE	0.98+
Iot Tahoe	PERSON	0.98+
Paulo	PERSON	0.98+
Iot Tahoe	ORGANIZATION	0.98+
both	QUANTITY	0.97+
one thing	QUANTITY	0.97+
first set	QUANTITY	0.97+
TPP	TITLE	0.97+
Paulus	PERSON	0.97+
seven cause	QUANTITY	0.97+
one schema	QUANTITY	0.97+
one customer	QUANTITY	0.96+
13 14 years	QUANTITY	0.96+
over seven terabytes	QUANTITY	0.96+
three	QUANTITY	0.96+
single source	QUANTITY	0.95+
Webster's Bay	ORGANIZATION	0.95+
Webster	ORGANIZATION	0.94+
seven different lines	QUANTITY	0.94+
Cove	ORGANIZATION	0.94+
Prem	ORGANIZATION	0.93+
Enterprise Data Automation	ORGANIZATION	0.92+
eight analysts	QUANTITY	0.92+
10.2	QUANTITY	0.89+
12 years old	QUANTITY	0.89+
Iot	ORGANIZATION	0.88+
three buckets	QUANTITY	0.88+
Snowflake	EVENT	0.86+
last 35 years	DATE	0.84+
Team Waas	ORGANIZATION	0.8+
Io Tahoe	PERSON	0.79+
24 7 development	QUANTITY	0.72+
Salesforce	ORGANIZATION	0.68+
each	QUANTITY	0.68+
Amazon cloud	ORGANIZATION	0.66+
Tahoe	PERSON	0.66+
zero	QUANTITY	0.64+
snowflake	EVENT	0.61+
things	QUANTITY	0.57+

Paula D'Amico, Webster Bank | Io Tahoe | Enterprise Data Automation

>> Narrator: From around the Globe, it's theCube with digital coverage of Enterprise Data Automation, and event series brought to you by Io-Tahoe. >> Everybody, we're back. And this is Dave Vellante, and we're covering the whole notion of Automated Data in the Enterprise. And I'm really excited to have Paula D'Amico here. Senior Vice President of Enterprise Data Architecture at Webster Bank. Paula, good to see you. Thanks for coming on. >> Hi, nice to see you, too. >> Let's start with Webster bank. You guys are kind of a regional I think New York, New England, believe it's headquartered out of Connecticut. But tell us a little bit about the bank. >> Webster bank is regional Boston, Connecticut, and New York. Very focused on in Westchester and Fairfield County. They are a really highly rated regional bank for this area. They hold quite a few awards for the area for being supportive for the community, and are really moving forward technology wise, they really want to be a data driven bank, and they want to move into a more robust group. >> We got a lot to talk about. So data driven is an interesting topic and your role as Data Architecture, is really Senior Vice President Data Architecture. So you got a big responsibility as it relates to kind of transitioning to this digital data driven bank but tell us a little bit about your role in your Organization. >> Currently, today, we have a small group that is just working toward moving into a more futuristic, more data driven data warehousing. That's our first item. And then the other item is to drive new revenue by anticipating what customers do, when they go to the bank or when they log in to their account, to be able to give them the best offer. And the only way to do that is you have timely, accurate, complete data on the customer and what's really a great value on offer something to offer that, or a new product, or to help them continue to grow their savings, or do and grow their investments. >> Okay, and I really want to get into that. But before we do, and I know you're, sort of partway through your journey, you got a lot to do. But I want to ask you about Covid, how you guys handling that? You had the government coming down and small business loans and PPP, and huge volume of business and sort of data was at the heart of that. How did you manage through that? >> We were extremely successful, because we have a big, dedicated team that understands where their data is and was able to switch much faster than a larger bank, to be able to offer the PPP Long's out to our customers within lightning speed. And part of that was is we adapted to Salesforce very for we've had Salesforce in house for over 15 years. Pretty much that was the driving vehicle to get our PPP loans in, and then developing logic quickly, but it was a 24 seven development role and get the data moving on helping our customers fill out the forms. And a lot of that was manual, but it was a large community effort. >> Think about that too. The volume was probably much higher than the volume of loans to small businesses that you're used to granting and then also the initial guidelines were very opaque. You really didn't know what the rules were, but you were expected to enforce them. And then finally, you got more clarity. So you had to essentially code that logic into the system in real time. >> I wasn't directly involved, but part of my data movement team was, and we had to change the logic overnight. So it was on a Friday night it was released, we pushed our first set of loans through, and then the logic changed from coming from the government, it changed and we had to redevelop our data movement pieces again, and we design them and send them back through. So it was definitely kind of scary, but we were completely successful. We hit a very high peak. Again, I don't know the exact number but it was in the thousands of loans, from little loans to very large loans and not one customer who applied did not get what they needed for, that was the right process and filled out the right amount. >> Well, that is an amazing story and really great support for the region, your Connecticut, the Boston area. So that's fantastic. I want to get into the rest of your story now. Let's start with some of the business drivers in banking. I mean, obviously online. A lot of people have sort of joked that many of the older people, who kind of shunned online banking would love to go into the branch and see their friendly teller had no choice, during this pandemic, to go to online. So that's obviously a big trend you mentioned, the data driven data warehouse, I want to understand that, but what at the top level, what are some of the key business drivers that are catalyzing your desire for change? >> The ability to give a customer, what they need at the time when they need it. And what I mean by that is that we have customer interactions in multiple ways. And I want to be able for the customer to walk into a bank or online and see the same format, and being able to have the same feel the same love, and also to be able to offer them the next best offer for them. But they're if they want looking for a new mortgage or looking to refinance, or whatever it is that they have that data, we have the data and that they feel comfortable using it. And that's an untethered banker. Attitude is, whatever my banker is holding and whatever the person is holding in their phone, that is the same and it's comfortable. So they don't feel that they've walked into the bank and they have to do fill out different paperwork compared to filling out paperwork on just doing it on their phone. >> You actually do want the experience to be better. And it is in many cases. Now you weren't able to do this with your existing I guess mainframe based Enterprise Data Warehouses. Is that right? Maybe talk about that a little bit? >> Yeah, we were definitely able to do it with what we have today the technology we're using. But one of the issues is that it's not timely. And you need a timely process to be able to get the customers to understand what's happening. You need a timely process so we can enhance our risk management. We can apply for fraud issues and things like that. >> Yeah, so you're trying to get more real time. The traditional EDW. It's sort of a science project. There's a few experts that know how to get it. You can so line up, the demand is tremendous. And then oftentimes by the time you get the answer, it's outdated. So you're trying to address that problem. So part of it is really the cycle time the end to end cycle time that you're progressing. And then there's, if I understand it residual benefits that are pretty substantial from a revenue opportunity, other offers that you can make to the right customer, that you maybe know, through your data, is that right? >> Exactly. It's drive new customers to new opportunities. It's enhanced the risk, and it's to optimize the banking process, and then obviously, to create new business. And the only way we're going to be able to do that is if we have the ability to look at the data right when the customer walks in the door or right when they open up their app. And by doing creating more to New York times near real time data, or the data warehouse team that's giving the lines of business the ability to work on the next best offer for that customer as well. >> But Paula, we're inundated with data sources these days. Are there other data sources that maybe had access to before, but perhaps the backlog of ingesting and cleaning in cataloging and analyzing maybe the backlog was so great that you couldn't perhaps tap some of those data sources. Do you see the potential to increase the data sources and hence the quality of the data or is that sort of premature? >> Oh, no. Exactly. Right. So right now, we ingest a lot of flat files and from our mainframe type of front end system, that we've had for quite a few years. But now that we're moving to the cloud and off-prem and on-prem, moving off-prem, into like an S3 Bucket, where that data we can process that data and get that data faster by using real time tools to move that data into a place where, like snowflake could utilize that data, or we can give it out to our market. Right now we're about we do work in batch mode still. So we're doing 24 hours. >> Okay. So when I think about the data pipeline, and the people involved, maybe you could talk a little bit about the organization. You've got, I don't know, if you have data scientists or statisticians, I'm sure you do. You got data architects, data engineers, quality engineers, developers, etc. And oftentimes, practitioners like yourself, will stress about, hey, the data is in silos. The data quality is not where we want it to be. We have to manually categorize the data. These are all sort of common data pipeline problems, if you will. Sometimes we use the term data Ops, which is sort of a play on DevOps applied to the data pipeline. Can you just sort of describe your situation in that context? >> Yeah, so we have a very large data ops team. And everyone that who is working on the data part of Webster's Bank, has been there 13 to 14 years. So they get the data, they understand it, they understand the lines of business. So it's right now. We could the we have data quality issues, just like everybody else does. But we have places in them where that gets cleansed. And we're moving toward and there was very much siloed data. The data scientists are out in the lines of business right now, which is great, because I think that's where data science belongs, we should give them and that's what we're working towards now is giving them more self service, giving them the ability to access the data in a more robust way. And it's a single source of truth. So they're not pulling the data down into their own, like Tableau dashboards, and then pushing the data back out. So they're going to more not, I don't want to say, a central repository, but a more of a robust repository, that's controlled across multiple avenues, where multiple lines of business can access that data. Is that help? >> Got it, Yes. And I think that one of the key things that I'm taking away from your last comment, is the cultural aspects of this by having the data scientists in the line of business, the lines of business will feel ownership of that data as opposed to pointing fingers criticizing the data quality. They really own that that problem, as opposed to saying, well, it's Paula's problem. >> Well, I have my problem is I have data engineers, data architects, database administrators, traditional data reporting people. And because some customers that I have that are business customers lines of business, they want to just subscribe to a report, they don't want to go out and do any data science work. And we still have to provide that. So we still want to provide them some kind of regiment that they wake up in the morning, and they open up their email, and there's the report that they subscribe to, which is great, and it works out really well. And one of the things is why we purchased Io-Tahoe was, I would have the ability to give the lines of business, the ability to do search within the data. And we'll read the data flows and data redundancy and things like that, and help me clean up the data. And also, to give it to the data analysts who say, all right, they just asked me they want this certain report. And it used to take okay, four weeks we're going to go and we're going to look at the data and then we'll come back and tell you what we can do. But now with Io-Tahoe, they're able to look at the data, and then in one or two days, they'll be able to go back and say, Yes, we have the data, this is where it is. This is where we found it. This is the data flows that we found also, which is what I call it, is the break of a column. It's where the column was created, and where it went to live as a teenager. (laughs) And then it went to die, where we archive it. And, yeah, it's this cycle of life for a column. And Io-Tahoe helps us do that. And we do data lineage is done all the time. And it's just takes a very long time and that's why we're using something that has AI in it and machine running. It's accurate, it does it the same way over and over again. If an analyst leaves, you're able to utilize something like Io-Tahoe to be able to do that work for you. Is that help? >> Yeah, so got it. So a couple things there, in researching Io-Tahoe, it seems like one of the strengths of their platform is the ability to visualize data, the data structure and actually dig into it, but also see it. And that speeds things up and gives everybody additional confidence. And then the other piece is essentially infusing AI or machine intelligence into the data pipeline, is really how you're attacking automation. And you're saying it repeatable, and then that helps the data quality and you have this virtual cycle. Maybe you could sort of affirm that and add some color, perhaps. >> Exactly. So you're able to let's say that I have seven cars, lines of business that are asking me questions, and one of the questions they'll ask me is, we want to know, if this customer is okay to contact, and there's different avenues so you can go online, do not contact me, you can go to the bank and you can say, I don't want email, but I'll take texts. And I want no phone calls. All that information. So, seven different lines of business asked me that question in different ways. One said, "No okay to contact" the other one says, "Customer 123." All these. In each project before I got there used to be siloed. So one customer would be 100 hours for them to do that analytical work, and then another analyst would do another 100 hours on the other project. Well, now I can do that all at once. And I can do those types of searches and say, Yes, we already have that documentation. Here it is, and this is where you can find where the customer has said, "No, I don't want to get access from you by email or I've subscribed to get emails from you." >> Got it. Okay. Yeah Okay. And then I want to go back to the cloud a little bit. So you mentioned S3 Buckets. So you're moving to the Amazon cloud, at least, I'm sure you're going to get a hybrid situation there. You mentioned snowflake. What was sort of the decision to move to the cloud? Obviously, snowflake is cloud only. There's not an on-prem, version there. So what precipitated that? >> Alright, so from I've been in the data IT information field for the last 35 years. I started in the US Air Force, and have moved on from since then. And my experience with Bob Graham, was with snowflake with working with GE Capital. And that's where I met up with the team from Io-Tahoe as well. And so it's a proven so there's a couple of things one is Informatica, is worldwide known to move data. They have two products, they have the on-prem and the off-prem. I've used the on-prem and off-prem, they're both great. And it's very stable, and I'm comfortable with it. Other people are very comfortable with it. So we picked that as our batch data movement. We're moving toward probably HVR. It's not a total decision yet. But we're moving to HVR for real time data, which is changed capture data, moves it into the cloud. And then, so you're envisioning this right now. In which is you're in the S3, and you have all the data that you could possibly want. And that's JSON, all that everything is sitting in the S3 to be able to move it through into snowflake. And snowflake has proven to have a stability. You only need to learn and train your team with one thing. AWS as is completely stable at this point too. So all these avenues if you think about it, is going through from, this is your data lake, which is I would consider your S3. And even though it's not a traditional data lake like, you can touch it like a Progressive or Hadoop. And then into snowflake and then from snowflake into sandbox and so your lines of business and your data scientists just dive right in. That makes a big win. And then using Io-Tahoe with the data automation, and also their search engine. I have the ability to give the data scientists and data analysts the way of they don't need to talk to IT to get accurate information or completely accurate information from the structure. And we'll be right back. >> Yeah, so talking about snowflake and getting up to speed quickly. I know from talking to customers you can get from zero to snowflake very fast and then it sounds like the Io-Tahoe is sort of the automation cloud for your data pipeline within the cloud. Is that the right way to think about it? >> I think so. Right now I have Io-Tahoe attached to my on-prem. And I want to attach it to my off-prem eventually. So I'm using Io-Tahoe data automation right now, to bring in the data, and to start analyzing the data flows to make sure that I'm not missing anything, and that I'm not bringing over redundant data. The data warehouse that I'm working of, it's an on-prem. It's an Oracle Database, and it's 15 years old. So it has extra data in it. It has things that we don't need anymore, and Io-Tahoe's helping me shake out that extra data that does not need to be moved into my S3. So it's saving me money, when I'm moving from off-prem to on-prem. >> And so that was a challenge prior, because you couldn't get the lines of business to agree what to delete, or what was the issue there? >> Oh, it was more than that. Each line of business had their own structure within the warehouse. And then they were copying data between each other, and duplicating the data and using that. So there could be possibly three tables that have the same data in it, but it's used for different lines of business. We have identified using Io-Tahoe identified over seven terabytes in the last two months on data that has just been repetitive. It's the same exact data just sitting in a different schema. And that's not easy to find, if you only understand one schema, that's reporting for that line of business. >> More bad news for the storage companies out there. (both laughs) So far. >> It's cheap. That's what we were telling people. >> And it's true, but you still would rather not waste it, you'd like to apply it to drive more revenue. And so, I guess, let's close on where you see this thing going. Again, I know you're sort of partway through the journey, maybe you could sort of describe, where you see the phase is going and really what you want to get out of this thing, down the road, mid-term, longer term, what's your vision or your data driven organization. >> I want for the bankers to be able to walk around with an iPad in their hand, and be able to access data for that customer, really fast and be able to give them the best deal that they can get. I want Webster to be right there on top with being able to add new customers, and to be able to serve our existing customers who had bank accounts since they were 12 years old there and now our multi whatever. I want them to be able to have the best experience with our bankers. >> That's awesome. That's really what I want as a banking customer. I want my bank to know who I am, anticipate my needs, and create a great experience for me. And then let me go on with my life. And so that follow. Great story. Love your experience, your background and your knowledge. I can't thank you enough for coming on theCube. >> Now, thank you very much. And you guys have a great day. >> All right, take care. And thank you for watching everybody. Keep right there. We'll take a short break and be right back. (gentle music)

Published Date : Jun 23 2020

SUMMARY :

to you by Io-Tahoe. And I'm really excited to of a regional I think and they want to move it relates to kind of transitioning And the only way to do But I want to ask you about Covid, and get the data moving And then finally, you got more clarity. and filled out the right amount. and really great support for the region, and being able to have the experience to be better. to be able to get the customers that know how to get it. and it's to optimize the banking process, and analyzing maybe the backlog was and get that data faster and the people involved, And everyone that who is working is the cultural aspects of this the ability to do search within the data. and you have this virtual cycle. and one of the questions And then I want to go back in the S3 to be able to move it Is that the right way to think about it? and to start analyzing the data flows and duplicating the data and using that. More bad news for the That's what we were telling people. and really what you want and to be able to serve And so that follow. And you guys have a great day. And thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Paula D'Amico	PERSON	0.99+
Paula	PERSON	0.99+
Connecticut	LOCATION	0.99+
Westchester	LOCATION	0.99+
Informatica	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
one	QUANTITY	0.99+
13	QUANTITY	0.99+
thousands	QUANTITY	0.99+
100 hours	QUANTITY	0.99+
Bob Graham	PERSON	0.99+
iPad	COMMERCIAL_ITEM	0.99+
Webster Bank	ORGANIZATION	0.99+
GE Capital	ORGANIZATION	0.99+
first item	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
two products	QUANTITY	0.99+
seven	QUANTITY	0.99+
New York	LOCATION	0.99+
Boston	LOCATION	0.99+
three tables	QUANTITY	0.99+
Each line	QUANTITY	0.99+
first set	QUANTITY	0.99+
two days	QUANTITY	0.99+
DevOps	TITLE	0.99+
Webster bank	ORGANIZATION	0.99+
14 years	QUANTITY	0.99+
over 15 years	QUANTITY	0.99+
seven cars	QUANTITY	0.98+
each project	QUANTITY	0.98+
Friday night	DATE	0.98+
Enterprise Data Automation	ORGANIZATION	0.98+
New England	LOCATION	0.98+
Io-Tahoe	ORGANIZATION	0.98+
today	DATE	0.98+
Webster's Bank	ORGANIZATION	0.98+
one schema	QUANTITY	0.97+
Fairfield County	LOCATION	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
over seven terabytes	QUANTITY	0.97+
Salesforce	ORGANIZATION	0.96+
both	QUANTITY	0.95+
single source	QUANTITY	0.93+
one thing	QUANTITY	0.93+
US Air Force	ORGANIZATION	0.93+
Webster	ORGANIZATION	0.92+
S3	COMMERCIAL_ITEM	0.92+
Enterprise Data Architecture	ORGANIZATION	0.91+
Io Tahoe	PERSON	0.91+
Oracle	ORGANIZATION	0.9+
15 years old	QUANTITY	0.9+
Io-Tahoe	PERSON	0.89+
12 years old	QUANTITY	0.88+
Tableau	TITLE	0.87+
four weeks	QUANTITY	0.86+
S3 Buckets	COMMERCIAL_ITEM	0.84+
Covid	PERSON	0.81+
Data Architecture	ORGANIZATION	0.79+
JSON	TITLE	0.79+
Senior Vice President	PERSON	0.78+
24 seven development role	QUANTITY	0.77+
last 35 years	DATE	0.77+
both laughs	QUANTITY	0.75+
Io-Tahoe	TITLE	0.73+
each	QUANTITY	0.72+
loans	QUANTITY	0.71+
zero	QUANTITY	0.71+

Paula D'Amico, Webster Bank

Published Date : Jun 4 2020

SUMMARY :

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Paula D'Amico	PERSON	0.99+
Paula	PERSON	0.99+
Connecticut	LOCATION	0.99+
Westchester	LOCATION	0.99+
Informatica	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
one	QUANTITY	0.99+
13	QUANTITY	0.99+
thousands	QUANTITY	0.99+
100 hours	QUANTITY	0.99+
Bob Graham	PERSON	0.99+
iPad	COMMERCIAL_ITEM	0.99+
Webster Bank	ORGANIZATION	0.99+
GE Capital	ORGANIZATION	0.99+
first item	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
two products	QUANTITY	0.99+
seven	QUANTITY	0.99+
New York	LOCATION	0.99+
Boston	LOCATION	0.99+
three tables	QUANTITY	0.99+
Each line	QUANTITY	0.99+
first set	QUANTITY	0.99+
two days	QUANTITY	0.99+
DevOps	TITLE	0.99+
Webster bank	ORGANIZATION	0.99+
14 years	QUANTITY	0.99+
over 15 years	QUANTITY	0.99+
seven cars	QUANTITY	0.98+
each project	QUANTITY	0.98+
Friday night	DATE	0.98+
New England	LOCATION	0.98+
Io-Tahoe	ORGANIZATION	0.98+
today	DATE	0.98+
Webster's Bank	ORGANIZATION	0.98+
one schema	QUANTITY	0.97+
Fairfield County	LOCATION	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
over seven terabytes	QUANTITY	0.97+
Salesforce	ORGANIZATION	0.96+
both	QUANTITY	0.95+
single source	QUANTITY	0.93+
one thing	QUANTITY	0.93+
US Air Force	ORGANIZATION	0.93+
Webster	ORGANIZATION	0.92+
S3	COMMERCIAL_ITEM	0.92+
Enterprise Data Architecture	ORGANIZATION	0.91+
Oracle	ORGANIZATION	0.9+
15 years old	QUANTITY	0.9+
Io-Tahoe	PERSON	0.89+
12 years old	QUANTITY	0.88+
Tableau	TITLE	0.87+
four weeks	QUANTITY	0.86+
S3 Buckets	COMMERCIAL_ITEM	0.84+
Covid	PERSON	0.81+
Data Architecture	ORGANIZATION	0.79+
JSON	TITLE	0.79+
Senior Vice President	PERSON	0.78+
24 seven development role	QUANTITY	0.77+
last 35 years	DATE	0.77+
both laughs	QUANTITY	0.75+
Io-Tahoe	TITLE	0.73+
each	QUANTITY	0.72+
loans	QUANTITY	0.71+
zero	QUANTITY	0.71+
Amazon cloud	ORGANIZATION	0.65+
last two months	DATE	0.65+

Frank Slootman, Snowflake | CUBE Conversation, April 2020

(upbeat music) >> Narrator: From theCUBE Studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is theCUBE Coversation. >> All right everybody, this is Dave Vellante and welcome to this special CUBE Conversation. I first met Frank Slootman in 2007 when he was the CEO of Data Domain. Back then he was the CEO of a disruptive company and still is. Data Domain, believe or not back then, was actually replacing tape drives as the primary mechanism for backup. Yes, believe it or not, it used to be tape. Fast forward several years later, I met Frank again at VMworld when he had become the CEO of ServiceNow. At the time ServiceNow was a small company, about 100 plus million dollars. Frank and his team took that company to 1.2 billion. And Gartner, at the time of IPO said "you know, this doesn't make sense. "It's a small market, it's a very narrow help desk market, "it's maybe a couple billion dollars." The vision of Slootman and his team was to really expand the total available market and execute like a laser. Which they did and today, ServiceNow a very, very successful company. Snowflake first came into my line of sight in 2015 when SiliconANGLE wrote an article, "Why Snowflake is Better "Than Amazon Redshift, Re-imagining Data". Well last year Frank Slootman joined Snowflake, another disruptive company. And he's here today to talk about how Snowflake is really participating in this COVID-19 crisis. And I really want to share some of Frank's insights and leadership principles, Frank great to see you, thanks for coming on. >> Yeah, thanks for having us Dave. >> So when I first reported earlier this year on Snowflake and shared some data with the community, you reached back out to me and said "Dave, I want to just share with you. "I am not a playbook CEO, I am a situational CEO. "This is what I learned in the military." So Frank, this COVID-19 situation was thrown at you, it's a black swan, what was your first move as a leader? >> Well, my first move is let's not overreact. Take a deep breath. Let's really examine what we know. Let's not jump to conclusions, let's not try to project things that we're not capable of projecting. That's hard because we tend to have sort of levels of certainty about what's going to happen in the week, in the next month and so on and all of a sudden that's out of the window. It creates enormous anxiety with people. So in other words you got to sort of reset to okay, what do we know, what can we do, what do we control? And not let our minds sort of go out of control. So I talk to our people all the time about maintain a sense of normalcy, focus on the work, stay in the moment and by the way, turn the newsfeed off, right, because the hysteria you get fed through the media is really not helpful, right? So just cool down and focus on what we still can do. And then I think then everybody takes a deep breath and we just go back to work. I mean, we're in this mode now for three weeks and I can tell you, I'm on teleconferencing calls, whatever, eight, nine hours a day. Prospects, customers, all over the world. Pretty much what I was doing before except I'm not traveling right now. So it's not, >> Yeah, so it sounds clear-- >> Not that different than what it was before. (laughs) >> It sounds very Bill Belichickian, you know? >> Yeah. >> Focus on those things of which you can control. When you were running ServiceNow I really learned it from you and of course Mike Scarpelli, your then and current CFO about the importance of transparency. And I'm interested in how you're communicating, it sounds like you're doing some very similar things but have you changed the way in which you've communicated to your team, your internal employees at all? >> We're communicating much more. Because we can no longer rely on sort of running into people here, there and everywhere. So we have to be much more purposeful about communications. For example, I mean I send an email out to the entire company on Monday morning. And it's kind of a bunch of anecdotes. Just to bring the connection back, the normalcy. It just helps people get connected back to the mothership and like well, things are still going on. We're still talking in the way we always used to be. And that really helps and I also, I check in with people a lot more, I ask all of our leadership to constantly check in with people because you can't assume that everybody is okay, you can't be out of sight, out of mind. So we need to be more purposeful in reaching out and communicating with people than we were previously. >> And a lot of people obviously concerned about their jobs. Have you sort of communicated, what have you communicated to employees about layoffs? I mean, you guys just did a large raise just before all this, your timing was kind of impeccable. But what have you communicated in that regard? >> I've said, there's no layoffs on our radar, number one. Number two, we are hiring. And number three is we have a higher level of scrutiny on the hires that we're making. And I am very transparent. In other words I tell people look, I prioritize the roles that are closest to the direct train of the business. Right, it's kind of common sense. But I wanted to make sure that this is how we're thinking about it. There are some roles that are more postponable than others. I'm hiring in engineering without any reservation because that is the long term strategic interest of the company. One the sales side, I want to know that sales leaders know how to convert to yields, that we're not just sort of bringing capacity online. And the leadership is not convinced or confident that they can convert to yield. So there's a little bit finer level of scrutiny on the hiring. But by and large, it's not that different. There's this saying out there that we should suspend all non-essential spending and hiring, I'm like you should always do that. Right? I mean what's different today? (both laugh) If it's non-essential, why do it, right? So all of this comes back to this is probably how we should operate anyways, yep. >> I want to talk a little bit about the tech behind Snowflake. I'm very sensitive when CEOs come on my program to make sure that we're not, I'm not trying to bait CEOs into ambulance chasing, that's not what it's about. But I do want to share with our community kind of what's new, what's changed and how companies like Snowflake are participating in this crisis. And in particular, we've been reporting for awhile, if you guys bring up that first slide. That the innovation in the industry is really no longer about Moore's Law. It's really shifted. There's a new, what we call an innovation cocktail in the business and we've collected all this data over the last 10 years. With Hadoop and other distributed data and now we have Edge Data, et cetera, there's this huge trove of data. And now AI is becoming real, it's becoming much more economical. So applying machine intelligence to this data and then the Cloud allows us to do this at scale. It allows us to bring in more data sources. It brings an agility in. So I wonder if you could talk about sort of this premise and how you guys fit. >> Yeah, I would start off by reordering the sequence and saying Cloud's number one. That is foundational. That helps us bring scale to data that we never had to number two, it helps us bring computational power to data at levels we've never had before. And that just means that queries and workloads can complete orders of magnitude faster than they ever could before. And that introduces concepts like the time value of data, right? The faster you get it, the more impactful and powerful it is. I do agree, I view AI as sort of the next generation of analytics. Instead of using data to inform people, we're using data to drive processes and businesses directly, right? So I'm agreeing obviously with these strengths because we're the principal beneficiaries and drivers of these platforms. >> Well when we talked about earlier this year about Snowflake, we really brought up the notion that you guys were one of the first if not the first. And guys, bring back Frank, I got to see him. (Frank chuckles) One of the first to really sort of separate the notion of being able to scale, compute independent of storage. And that brought not only economics but it brought flexibility. So you've got this Cloud-native database. Again, what caught my attention in that Redshift article we wrote is essentially for our audience, Redshift was based on ParAccel. Amazon did a great job of really sort of making that a Cloud database but it really wasn't born in the Cloud and that's sort of the advantage of Snowflake. So that architectural approach is starting to really take hold. So I want to give an example. Guys if you bring up the next chart. This is an example of a system that I've been using since early January when I saw this COVID come out. Somebody texted me this. And it's the Johns Hopkins dataset, it's awesome. It shows you, go around the map, you can follow it, it's pretty close to real time. And it's quite good. But the problem is, all right thank you guys. The problem is that when I started to look at, I wanted to get into sort of a more granular view of the counties. And I couldn't do that. So guys bring up the next slide if you would. So what I did was I searched around and I found a New York Times GitHub data instance. And you can see it in the top left here. And basically it was a CSV. And notice what it says, it says we can't make this file beautiful and searchable because it's essentially too big. And then I ran into what you guys are doing with Star Schema, Star Schema's a data company. And essentially you guys made the notion that look, the Johns Hopkins dataset as great as it is it's not sort of ready for analytics, it's got to be cleaned, et cetera. And so I want you to talk about that a little bit. Guys, if you could bring Frank back. And share with us what you guys have done with Star Schema and how that's helping understand COVID-19 and its progression. >> Yeah, one of the really cool concepts I've felt about Snowflake is what we call the data sharing architecture. And what that really means is that if you and I both have Snowflake accounts, even though we work for different institutions, we can share data optics, tables, schema, whatever they are with each other. And you can process against that in place if they are residing in a local, to your own platform. We have taken that concept from private also to public. So that data providers like Star Schema can list their datasets, because they're a data company, so obviously it's in their business interest to allow this data to be profiled and to be accessible by the Snowflake community. And this data is what we call analytics ready. It is instantly accessible. It is also continually updated, you have to do nothing. It's augmented with incremental data and then our Snowflake users can just combine this data with supply chain, with economic data, with internal operating data and so on. And we got a very strong reaction from our customer base because they're like "man, you're saving us weeks "if not months just getting prepared to start to do an al, let alone doing them." Right? Because the data is analytics ready and they have to do literally nothing. I mean in other words if they ask us for it in the morning, in the afternoon they'll be running workloads again. Right, and then combining it with their own data. >> Yeah, so I should point out that that New York Times GitHub dataset that I showed you, it's a couple of days behind. We're talking here about near realtime, or as close as realtime as you can get, is that right? >> Yep. Yeah, every day it gets updated. >> So the other thing, one of the things we've been reporting, and Frank I wondered if you could comment on this, is this new emerging workloads in the Cloud. We've been reporting on this for a couple of years. The first generation of Cloud was IS, was really about compute, storage, some database infrastructure. But really now what we're seeing is these analytic data stores where the valuable data is sitting and much of it is in the Cloud and bringing machine intelligence and data science capabilities to that, to allow for this realtime or near realtime analysis. And that is a new, emerging workload that is really gaining a lot of steam as these companies try to go to this so-called digital transformation. Your comments on that. >> Yeah, we refer to that as the emergence or the rise of the data Cloud. If you look at the Cloud landscape, we're all very familiar with the infrastructure clouds. AWS and Azure and GCP and so on, it's just massive storage and servers. And obviously there's data locked in to those infrastructure clouds as well. We've been familiar for it for 10, 20 years now with application clouds, notably Salesforce but obviously Workday, ServiceNow, SAP and so on, they also have data in them, right? But now you're seeing that people are unsiloing the data. This is super important. Because as long as the data is locked in these infrastructure clouds, in these application clouds, we can't do the things that we need to do with it, right? We have to unsilo it to allow the scale of querying and execution against that data. And you don't see that any more clear that you do right now during this meltdown that we're experiencing. >> Okay so I learned long ago Frank not to argue with you but I want to push you on something. (Frank laughs) So I'm not trying to be argumentative. But one of those silos is on-prem. I've heard you talk about "look, we're a Cloud company. "We're Cloud first, we're Cloud only. "We're not going to do an on-prem version." But some of that data lives on-prem. There are companies out there that are saying "hey, we separate compute and storage too, "we run in the Cloud. "But we also run on-prem, that's our big differentiator." Your thoughts on that. >> Yeah, we burnt the ship behind us. Okay, we're not doing this endless hedging that people have done for 20 years, sort of keeping a leg in both worlds. Forget it, this will only work in the public Cloud. Because this is how the utility model works, right? I think everybody is coming to this realization, right? I mean excuses are running out at this point. We think that it'll, people will come to the public Cloud a lot sooner than we will ever come to the private Cloud. It's not that we can't run on a private cloud, it just diminishes the potential and the value that we bring. >> So as sort of mentioned in my intro, you have always been at the forefront of disruption. And you think about digital transformation. You know Frank we go to all of these events, it used to be physical and now we're doing theCUBE digital. And so everybody talks about digital transformation. CEOs get up, they talk about how they're helping their customers move to digital. But the reality is is when you actually talk to businesses, there was a lot of complacency. "Hey, this isn't really going to happen in my lifetime" or "we're doing pretty well." Or maybe the CEO might be committed but it doesn't necessarily trickle down to the P&L managers who have an update. One of the things that we've been talking about is COVID-19 is going to accelerate that digital transformation and make it a mandate. You're seeing it obviously in retail play out and a number of other industries, supply chains are, this has wreaked havoc on supply chains. And so there's going to be a rethinking. What are your thoughts on the acceleration of digital transformation? >> Well obviously the crisis that we're experiencing is obviously an enormous catalyst for digital transformation and everything that that entails. And what that means and I think as a industry we're just victims of inertia. Right, I mean haven't understood for 20 years why education, both K through 12 but also higher ed, why they're so brick and mortar bound and the way they're doing things, right? And we could massively scale and drop the cost of education by going digital. Now we're forced into it and everybody's like "wow, "this is not bad." You're right, it isn't, right but we haven't so the economics, the economic imperative hasn't really set in but it is now. So these are all great things. Having said that, there are also limits to digital transformation. And I'm sort of experiencing that right now, being on video calls all day. And oftentimes people I've never met before, right? There's still a barrier there, right? It's not like digital can replace absolutely everything. And that is just not true, right? I mean there's some level of filter that just doesn't happen when you're digital. So there's still a need for people to be in the same place. I don't want to sort of over rotate on this concept, that like okay, from here on out we're all going to be on the wires, that's not the way it will be. >> Yeah, be balanced. So earlier you made a comment, that "we should never "be spending on non-essential items". And so you've seen (Frank laughs) back in 2008 you saw the Rest in Peace good times, you've seen the black swan memos that go out. I assume that, I mean you're a very successful investor as well, you've done a couple of stints in the VC community. What are you seeing in the Valley in regard to investments, will investments continue, will we continue to feed innovation, what's your sense of that? Well this is another wake up call. Because in Silicon Valley there's way too much money. There's certainly a lot of ideas but there's not a lot of people that can execute on it. So what happens is a lot of things get funded and the execution is either no good or it's just not a valid opportunity. And when you go through a downturn like this you're finding out that those businesses are not going to make it. I mean when the tide is running out, only the strongest players are going to survive that. It's almost a natural selection process that happens from time to time. It's not necessarily a bad thing because people get reallocated. I mean Silicon Valley is basically one giant beehive, right? I mean we're constantly repurposing money and people and talent and so on. And that's actually good because if an idea is not worth in investing in, let's not do it. Let's repurpose those resources in places where it has merit, where it has viability. >> Well Frank, I want to thank you for coming on. Look, I mean you don't have to do this. You could've retired long, long ago but having leaders like you in place in these times of crisis, but even when in good times to lead companies, inspire people. And we really appreciate what you do for companies, for your employees, for your customers and certainly for our community, so thanks again, I really appreciate it. >> Happy to do it, thanks Dave. >> All right and thank you for watching everybody, Dave Vellante for theCUBE, we will see you next time. (upbeat music)

Published Date : Apr 1 2020

SUMMARY :

this is theCUBE Coversation. And I really want to share some of Frank's insights and said "Dave, I want to just share with you. So in other words you got to sort of reset to okay, Not that different than what it was before. I really learned it from you and of course Mike Scarpelli, I ask all of our leadership to constantly check in But what have you communicated in that regard? So all of this comes back to this is probably how and how you guys fit. And that just means that queries and workloads And then I ran into what you guys are doing And what that really means is that if you and I or as close as realtime as you can get, is that right? Yeah, every day it gets updated. and much of it is in the Cloud And you don't see that any more clear that you do right now Okay so I learned long ago Frank not to argue with you and the value that we bring. But the reality is is when you actually talk And I'm sort of experiencing that right now, And when you go through a downturn like this And we really appreciate what you do for companies, Dave Vellante for theCUBE, we will see you next time.

ENTITIES

Entity	Category	Confidence
Frank	PERSON	0.99+
Mike Scarpelli	PERSON	0.99+
2007	DATE	0.99+
Slootman	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Dave Vellante	PERSON	0.99+
2008	DATE	0.99+
Bill Belichickian	PERSON	0.99+
2015	DATE	0.99+
Amazon	ORGANIZATION	0.99+
April 2020	DATE	0.99+
Dave	PERSON	0.99+
20 years	QUANTITY	0.99+
Data Domain	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Monday morning	DATE	0.99+
1.2 billion	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
Boston	LOCATION	0.99+
eight	QUANTITY	0.99+
Star Schema	ORGANIZATION	0.99+
early January	DATE	0.99+
ServiceNow	ORGANIZATION	0.99+
last year	DATE	0.99+
10	QUANTITY	0.99+
first	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Gartner	ORGANIZATION	0.99+
first move	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
COVID-19	OTHER	0.99+
both	QUANTITY	0.98+
AWS	ORGANIZATION	0.98+
VMworld	ORGANIZATION	0.98+
One	QUANTITY	0.98+
about 100 plus million dollars	QUANTITY	0.98+
earlier this year	DATE	0.98+
theCUBE Studios	ORGANIZATION	0.98+
first slide	QUANTITY	0.98+
several years later	DATE	0.98+
SiliconANGLE	ORGANIZATION	0.98+
both worlds	QUANTITY	0.98+
playbook	ORGANIZATION	0.97+
one	QUANTITY	0.97+
next month	DATE	0.97+
New York Times	ORGANIZATION	0.97+
GitHub	ORGANIZATION	0.97+
first generation	QUANTITY	0.96+
nine hours a day	QUANTITY	0.96+
today	DATE	0.96+
12	QUANTITY	0.95+
Johns Hopkins	ORGANIZATION	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Schema: