Michael Stonebraker, TAMR | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's the Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back to Cambridge, Massachusetts. Everybody, You're watching the Cube, the leader in live tech coverage, and we're covering the M I t CDO conference M I t. CDO. My name is David Monty in here with my co host, Paul Galen. Mike Stone breakers here. The legend is founder CTO of Of Tamer, as well as many other companies. Inventor Michael. Thanks for coming back in the Cube. Good to see again. Nice to be here. So this is kind of ah, repeat pattern for all of us. We kind of gather here in August that the CDO conference You're always the highlight of the show. You gave a talk this week on the top 10. Big data mistakes. You and I are one of the few. You were the few people who still use the term big data. I happen to like it. Sad that it's out of vogue already, but people associated with the doo doop it's kind of waning, but regardless, so welcome. How'd the talk go? What were you talking about. >> So I talked to a lot of people who were doing analytics. We're doing operation Offer operational day of data at scale, and they always make most of them make a collection of bad mistakes. And so the talk waas a litany of the blunders that I've seen people make, and so the audience could relate to the blunders about most. Most of the enterprise is represented. Make a bunch of the blunders. So I think no. One blunder is not planning on moving most everything to the cloud. >> So that's interesting, because a lot of people would would would love to debate that, but and I would imagine you probably could have done this 10 years ago in a lot of the blunders would be the same, but that's one that wouldn't have been there. But so I tend to agree. I was one of the two hands that went up this morning, and vocalist talk when he asked, Is the cloud cheaper for us? It is anyway. But so what? Why should everybody move everything? The cloud aren't there laws of physics, laws of economics, laws of the land that suggest maybe you >> shouldn't? Well, I guess 22 things and then a comment. First thing is James Hamilton, who's no techies. Techie works for Amazon. We know James. So he claims that he could stand up a server for 25% of your cost. I have no reason to disbelieve him. That number has been pretty constant for a few years, so his cost is 1/4 of your cost. Sooner or later, prices are gonna reflect costs as there's a race to the bottom of cloud servers. So >> So can I just stop you there for a second? Because you're some other date on that. All you have to do is look at a W S is operating margin and you'll see how profitable they are. They have software like economics. Now we're deploying servers. So sorry to interrupt, but so carry. So >> anyway, sooner or later, they're gonna have their gonna be wildly cheaper than you are. The second, then yet is from Dave DeWitt, whose database wizard. And here's the current technology that that Microsoft Azure is using. As of 18 months ago, it's shipping containers and parking lots, chilled water in power in Internet, Ian otherwise sealed roof and walls optional. So if you're doing raised flooring in Cambridge versus I'm doing shipping containers in the Columbia River Valley, who's gonna be a lot cheaper? And so you know the economies of scale? I mean, that, uh, big, big cloud guys are building data centers as fast as they can, using the cheapest technology around. You put up the data center every 10 years on dhe. You do it on raised flooring in Cambridge. So sooner or later, the cloud guys are gonna be a lot cheaper. And the only thing that isn't gonna the only thing that will change that equation is For example, my lab is up the street with Frank Gehry building, and we have we have an I t i t department who runs servers in Cambridge. Uh, and they claim they're cheaper than the cloud. And they don't pay rent for square footage and they don't pay for electricity. So yeah, if if think externalities, If there are no externalities, the cloud is assuredly going to be cheaper. And then the other thing is that most everybody tonight that I talk thio including me, has very skewed resource demands. So in the cloud finding three servers, except for the last day of the month on the last day of the month. I need 20 servers. I just do it. If I'm doing on Prem, I've got a provision for peak load. And so again, I'm just way more expensive. So I think sooner or later these combinations of effects was going to send everybody to the cloud for most everything, >> and my point about the operating margins is difference in price and cost. I think James Hamilton's right on it. If he If you look at the actual cost of deploying, it's even lower than the price with the market allows them to their growing at 40 plus percent a year and a 35 $40,000,000,000 run rate company sooner, Sooner or >> later, it's gonna be a race to the lot of you >> and the only guys are gonna win. You have guys have the best cost structure. A >> couple other highlights from your talk. >> Sure, I think 2nd 2nd thing like Thio Thio, no stress is that machine learning is going to be a game is going to be a game changer for essentially everybody. And not only is it going to be autonomous vehicles. It's gonna be automatic. Check out. It's going to be drone delivery of most everything. Uh, and so you can, either. And it's gonna affect essentially everybody gonna concert of, say, categorically. Any job that is easy to understand is going to get automated. And I think that's it's gonna be majorly impactful to most everybody. So if you're in Enterprise, you have two choices. You can be a disrupt or or you could be a disruptive. And so you can either be a taxi company or you can be you over, and it's gonna be a I machine learning that's going going to be determined which side of that equation you're on. So I was a big blunder that I see people not taking ml incredibly seriously. >> Do you see that? In fact, everyone I talked who seems to be bought in that this is we've got to get on the bandwagon. Yeah, >> I'm just pointing out the obvious. Yeah, yeah, I think, But one that's not quite so obvious you're is a lot of a lot of people I talked to say, uh, I'm on top of data science. I've hired a group of of 10 data scientists, and they're doing great. And when I talked, one vignette that's kind of fun is I talked to a data scientist from iRobot, which is the guys that have the vacuum cleaner that runs around your living room. So, uh, she said, I spend 90% of my time locating the data. I want to analyze getting my hands on it and cleaning it, leaving the 10% to do data science job for which I was hired. Of the 10% I spend 90% fixing the data cleaning errors in my data so that my models work. So she spends 99% of her time on what you call data preparation 1% of her time doing the job for which he was hired. So data science is not about data science. It's about data integration, data cleaning, data, discovery. >> But your new latest venture, >> so tamer does that sort of stuff. And so that's But that's the rial data science problem. And a lot of people don't realize that yet, And, uh, you know they will. I >> want to ask you because you've been involved in this by my count and starting up at least a dozen companies. Um, 99 Okay, It's a lot. >> It's not overstated. You estimated high fall. How do you How >> do you >> decide what challenge to move on? Because they're really not. You're not solving the same problems. You're You're moving on to new problems. How do you decide? What's the next thing that interests you? Enough to actually start a company. Okay, >> that's really easy. You know, I'm on the faculty of M i t. My job is to think of news new ship and investigate it, and I come up. No, I'm paid to come up with new ideas, some of which have commercial value, some of which don't and the ones that have commercial value, like, commercialized on. So it's whatever I'm doing at the time on. And that's why all the things I've commercialized, you're different >> s so going back to tamer data integration platform is a lot of companies out there claim to do it day to get integration right now. What did you see? What? That was the deficit in the market that you could address. >> Okay, great question. So there's the traditional data. Integration is extract transforming load systems and so called Master Data management systems brought to you by IBM in from Attica. Talent that class of folks. So a dirty little secret is that that technology does not scale Okay, in the following sense that it's all well, e t l doesn't scale for a different reason with an m d l e t l doesn't scale because e t. L is based on the premise that somebody really smart comes up with a global data model For all the data sources you want put together. You then send a human out to interview each business unit to figure out exactly what data they've got and then how to transform it into the global data model. How to load it into your data warehouse. That's very human intensive. And it doesn't scale because it's so human intensive. So I've never talked to a data warehouse operator who who says I integrate the average I talk to says they they integrate less than 10 data sources. Some people 20. If you twist my arm hard, I'll give you 50. So a Here. Here's a real world problem, which is Toyota Motor Europe. I want you right now. They have a distributor in Spain, another distributor in France. They have a country by country distributor, sometimes canton by Canton. Distribute distribution. So if you buy a Toyota and Spain and move to France, Toyota develops amnesia. The French French guys know nothing about you. So they've got 250 separate customer databases with 40,000,000 total records in 50 languages. And they're in the process of integrating that. It was single customer database so that they can Duke custom. They could do the customer service we expect when you cross cross and you boundary. I've never seen an e t l system capable of dealing with that kind of scale. E t l dozen scale to this level of problem. >> So how do you solve that problem? >> I'll tell you that they're a tamer customer. I'll tell you all about it. Let me first tell you why MGM doesn't scare. >> Okay. Great. >> So e t l says I now have all your data in one place in the same format, but now you've got following problems. You've got a d duplicated because if if I if I bought it, I bought a Toyota in Spain, I bought another Toyota in France. I'm both databases. So if you want to avoid double counting customers, you got a dupe. Uh, you know, got Duke 30,000,000 records. And so MGM says Okay, you write some rules. It's a rule based technology. So you write a rule. That's so, for example, my favorite example of a rule. I don't know if you guys like to downhill downhill skiing, All right? I love downhill skiing. So ski areas, Aaron, all kinds of public databases assemble those all together. Now you gotta figure out which ones are the same the same ski area, and they're called different names in different addresses and so forth. However, a vertical drop from bottom to the top is the same. Chances are they're the same ski area. So that's a rule that says how to how to put how to put data together in clusters. And so I now have a cluster for mount sanity, and I have a problem which is, uh, one address says something rather another address as something else. Which one is right or both? Right, so now you want. Now you have a gold. Let's call the golden Record problem to basically decide which, which, which data elements among a variety that maybe all associated with the same entity are in fact correct. So again, MDM, that's a rule's a rule based system. So it's a rule based technology and rule systems don't scale the best example I can give you for why Rules systems don't scale. His tamer has another customer. General Electric probably heard of them, and G wanted to do spend analytics, and so they had 20,000,000 spend transactions. Frank the year before last and spend transaction is I paid $12 to take a cab from here here to the airport, and I charged it to cost center X Y Z 20,000,000 of those so G has a pre built classification system for spend, so they have parts and underneath parts or computers underneath computers and memory and so forth. So pre existing preexisting class classifications for spend they want to simply classified 20,000,000 spent transactions into this pre existing hierarchy. So the traditional technology is, well, let's write some rules. So G wrote 500 rules, which is about the most any single human I can get there, their arms around so that classified 2,000,000 of the 20,000,000 transactions. You've now got 18 to go and another 500 rules is not going to give you 2,000,000 more. It's gonna give you love diminishing returns, right? So you have to write a huge number of rules and no one can possibly understand. So the technology simply doesn't scale, right? So in the case of G, uh, they had tamer health. Um, solve this. Solved this classification problem. Tamer used their 2,000,000 rule based, uh, tag records as training data. They used an ML model, then work off the training data classifies remaining 18,000,000. So the answer is machine learning. If you don't use machine learning, you're absolutely toast. So the answer to MDM the answer to MGM doesn't scale. You've got to use them. L The answer to each yell doesn't scale. You gotta You're putting together disparate records can. The answer is ml So you've got to replace humans by machine learning. And so that's that seems, at least in this conference, that seems to be resonating, which is people are understanding that at scale tradition, traditional data integration, technology's just don't work >> well and you got you got a great shot out on yesterday from the former G S K Mark Grams, a leader Mark Ramsay. Exactly. Guys. And how they solve their problem. He basically laid it out. BTW didn't work and GM didn't work, All right. I mean, kick it, kick the can top down data modelling, didn't work, kicked the candid governance That's not going to solve the problem. And But Tamer did, along with some other tooling. Obviously, of course, >> the Well, the other thing is No. One technology. There's no silver bullet here. It's going to be a bunch of technologies working together, right? Mark Ramsay is a great example. He used his stream sets and a bunch of other a bunch of other startup technology operating together and that traditional guys >> Okay, we're good >> question. I want to show we have time. >> So with traditional vendors by and large or 10 years behind the times, And if you want cutting edge stuff, you've got to go to start ups. >> I want to jump. It's a different topic, but I know that you in the past were critic of know of the no sequel movement, and no sequel isn't going away. It seems to be a uh uh, it seems to be actually gaining steam right now. What what are the flaws in no sequel? It has your opinion changed >> all? No. So so no sequel originally meant no sequel. Don't use it then. Then the marketing message changed to not only sequel, So sequel is fine, but no sequel does others. >> Now it's all sequel, right? >> And my point of view is now. No sequel means not yet sequel because high level language, high level data languages, air good. Mongo is inventing one Cassandra's inventing one. Those unless you squint, look like sequel. And so I think the answer is no sequel. Guys are drifting towards sequel. Meanwhile, Jason is That's a great idea. If you've got your regular data sequel, guys were saying, Sure, let's have Jason is the data type, and I think the only place where this a fair amount of argument is schema later versus schema first, and I pretty much think schema later is a bad idea because schema later really means you're creating a data swamp exactly on. So if you >> have to fix it and then you get a feel of >> salary, so you're storing employees and salaries. So, Paul salaries recorded as dollars per month. Uh, Dave, salary is in euros per week with a lunch allowance minds. So if you if you don't, If you don't deal with irregularities up front on data that you care about, you're gonna create a mess. >> No scheme on right. Was convenient of larger store, a lot of data cheaply. But then what? Hard to get value out of it created. >> So So I think the I'm not opposed to scheme later. As long as you realize that you were kicking the can down the road and you're just you're just going to give your successor a big mess. >> Yeah, right. Michael, we gotta jump. But thank you so much. Sure appreciate it. All right. Keep it right there, everybody. We'll be back with our next guest right into the short break. You watching the cue from M i t cdo Ike, you right back

Published Date : Aug 1 2019

SUMMARY :

Brought to you by We kind of gather here in August that the CDO conference You're always the highlight of the so the audience could relate to the blunders about most. physics, laws of economics, laws of the land that suggest maybe you So he claims that So can I just stop you there for a second? And so you know the and my point about the operating margins is difference in price and cost. You have guys have the best cost structure. And so you can either be a taxi company got to get on the bandwagon. leaving the 10% to do data science job for which I was hired. But that's the rial data science problem. want to ask you because you've been involved in this by my count and starting up at least a dozen companies. How do you How You're You're moving on to new problems. No, I'm paid to come up with new ideas, s so going back to tamer data integration platform is a lot of companies out there claim to do and so called Master Data management systems brought to you by IBM I'll tell you that they're a tamer customer. So the answer to MDM the I mean, kick it, kick the can top down data modelling, It's going to be a bunch of technologies working together, I want to show we have time. and large or 10 years behind the times, And if you want cutting edge It's a different topic, but I know that you in the past were critic of know of the no sequel movement, No. So so no sequel originally meant no So if you So if you if Hard to get value out of it created. So So I think the I'm not opposed to scheme later. But thank you so much.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
James	PERSON	0.99+
Mark Ramsay	PERSON	0.99+
James Hamilton	PERSON	0.99+
Paul Galen	PERSON	0.99+
Dave DeWitt	PERSON	0.99+
Toyota	ORGANIZATION	0.99+
David Monty	PERSON	0.99+
General Electric	ORGANIZATION	0.99+
2,000,000	QUANTITY	0.99+
France	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
20,000,000	QUANTITY	0.99+
10%	QUANTITY	0.99+
Michael Stonebraker	PERSON	0.99+
Cambridge	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
50	QUANTITY	0.99+
$12	QUANTITY	0.99+
Spain	LOCATION	0.99+
18,000,000	QUANTITY	0.99+
25%	QUANTITY	0.99+
20 servers	QUANTITY	0.99+
90%	QUANTITY	0.99+
Columbia River Valley	LOCATION	0.99+
99%	QUANTITY	0.99+
18	QUANTITY	0.99+
Aaron	PERSON	0.99+
Dave	PERSON	0.99+
August	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
three servers	QUANTITY	0.99+
35 $40,000,000,000	QUANTITY	0.99+
50 languages	QUANTITY	0.99+
500 rules	QUANTITY	0.99+
22 things	QUANTITY	0.99+
10 data scientists	QUANTITY	0.99+
Mike Stone	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
MGM	ORGANIZATION	0.99+
less than 10 data sources	QUANTITY	0.99+
Ian	PERSON	0.99+
Paul	PERSON	0.99+
1%	QUANTITY	0.99+
both	QUANTITY	0.99+
Toyota Motor Europe	ORGANIZATION	0.99+
Of Tamer	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
single	QUANTITY	0.99+
Attica	ORGANIZATION	0.99+
10 years ago	DATE	0.99+
yesterday	DATE	0.99+
iRobot	ORGANIZATION	0.99+
Mark Grams	PERSON	0.99+
TAMR	PERSON	0.99+
10 years	QUANTITY	0.99+
20	QUANTITY	0.98+
1/4	QUANTITY	0.98+
250 separate customer databases	QUANTITY	0.98+
Cassandra	PERSON	0.98+
First thing	QUANTITY	0.98+
30,000,000 records	QUANTITY	0.98+
both databases	QUANTITY	0.98+
18 months ago	DATE	0.98+
first	QUANTITY	0.98+
M I t CDO	EVENT	0.98+
One blunder	QUANTITY	0.98+
Tamer	PERSON	0.98+
one place	QUANTITY	0.98+
second	QUANTITY	0.97+
two choices	QUANTITY	0.97+
tonight	DATE	0.97+
each business unit	QUANTITY	0.97+
Thio Thio	PERSON	0.97+
two hands	QUANTITY	0.96+
this week	DATE	0.96+
Frank	PERSON	0.95+
Duke	ORGANIZATION	0.95+

Jim Walker, Cockroach Labs & Christian Hüning, finleap connect | Kubecon + Cloudnativecon EU 2022

>> (bright music) >> Narrator: The Cube, presents Kubecon and Cloudnativecon, year of 2022, brought to you by Red Hat, the cloud native computing foundation and its ecosystem partners. >> Now what we're opening. Welcome to Valencia, Spain in Kubecon Cloudnativecon, Europe, 2022. I'm Keith Townsend, along with my host, Paul Gillin, who is the senior editor for architecture at Silicon angle, Paul. >> Keith you've been asking me questions all these last two days. Let me ask you one. You're a traveling man. You go to a lot of conferences. What's different about this one. >> You know what, we're just talking about that pre-conference, open source conferences are usually pretty intimate. This is big. 7,500 people talking about complex topics, all in one big area. And then it's, I got to say it's overwhelming. It's way more. It's not focused on a single company's product or messaging. It is about a whole ecosystem, very different show. >> And certainly some of the best t-shirts I've ever seen. And our first guest, Jim has one of the better ones. >> I mean a bit cockroach come on, right. >> Jim Walker, principal product evangelist at CockroachDB and Christian Huning, tech director of cloud technologies at Finleap Connect, a financial services company that's based out of Germany, now offering services in four countries now. >> Basically all over Europe. >> Okay. >> But we are in three countries with offices. >> So you're CockroachDB customer and I got to ask the obvious question. Databases are hard and started the company in 2015 CockroachDB, been a customer since 2019, I understand. Why take the risk on a four year old database. I mean that just sounds like a world of risk and trouble. >> So it was in 2018 when we joined the company back then and we did this cloud native transformation, that was our task basically. We had very limited amount of time and we were faced with a legacy infrastructure and we needed something that would run in a cloud native way and just blend in with everything else we had. And the idea was to go all in with Kubernetes. Though early days, a lot of things were alpha beta, and we were running on mySQL back then. >> Yeah. >> On a VM, kind of small setup. And then we were looking for something that we could just deploy in Kubernetes, alongside with everything else. And we had to stack and we had to duplicate it many times. So also to maintain that we wanted to do it all the same like with GitOps and everything and Cockroach delivered that proposition. So that was why we evaluate the risk of relatively early adopting that solution with the proposition of having something that's truly cloud native and really blends in with everything else we do in the same way was something we considered, and then we jumped the leap of faith and >> The fin leap of faith >> The fin leap of faith. Exactly. And we were not dissatisfied. >> So talk to me a little bit about the challenges because when we think of MySQL, MySQL scales to amazing sizes, it is the de facto database for many cloud based architectures. What problems were you running into with MySQL? >> We were running into the problem that we essentially, as a finTech company, we are regulated and we have companies, customers that really value running things like on-prem, private cloud, on-prem is a bit of a bad word, maybe. So it's private cloud, hybrid cloud, private cloud in our own data centers in Frankfurt. And we needed to run it in there. So we wanted to somehow manage that and with, so all of the managed solution were off the table, so we couldn't use them. So we needed something that ran in Kubernetes because we only wanted to maintain Kubernetes. We're a small team, didn't want to use also like full blown VM solution, of sorts. So that was that. And the other thing was, we needed something that was HA distributable somehow. So we also looked into other solutions back at the time, like Vitis, which is also prominent for having a MySQL compliant interface and great solution. We also got into work, but we figured, this is from the scale, and from the sheer amount of maintenance it would need, we couldn't deliver that, we were too small for that. So that's where then Cockroach just fitted in nicely by being able to distribute BHA, be resilient against failure, but also be able to scale out because we had this problem with a single MySQL deployment to not really, as it grew, as the data amounts grew, we had trouble to operatively keep that under control. >> So Jim, every time someone comes to me and says, I have a new database, I think we don't need it, yet another database. >> Right. >> What problem, or how does CockroachDB go about solving the types of problems that Christian had? >> Yeah. I mean, Christian laid out why it exists. I mean, look guys, building a database isn't easy. If it was easy, we'd have a database for every application, but you know, Michael Stonebraker, kind of godfather of all database says it himself, it takes seven, eight years for a database to fully gestate to be something that's like enterprise ready and kind of, be relied upon. We've been billing for about seven, eight years. I mean, I'm thankful for people like Christian to join us early on to help us kind of like troubleshoot and go through some things. We're building a database, it's not easy. You're right. But building a distributor system is also not easy. And so for us, if you look at what's going on in just infrastructure in general, what's happening in Kubernetes, like this whole space is Kubernetes. It's all about automation. How do I automate scale? How do I automate resilience out of the entire equation of what we're actually doing? I don't want to have to think about active passive systems. I don't want to think about sharding a database. Sure you can scale MySQL. You know, how many people it takes to run three or four shards of MySQL database. That's not automation. And I tell you what, this world right now with the advances in data how hard it is to find people who actually understand infrastructure to hire them. This is why this automation is happening, because our systems are more complex. So we started from the very beginning to be something that was very different. This is a cloud native database. This is built with the same exact principles that are in Kubernetes. In fact, like Kubernetes it's kind of a spawn of borg, the back end of Google. We are inspired by Spanner. I mean, this started by three engineers that worked at Google, are frustrated, they didn't have the tools, they had at Google. So they built something that was, outside of Google. And how do we give that kind of Google like infrastructure for everybody. And that's, the advent of Cockroach and kind of why we're doing, what we're doing. >> As your database has matured, you're now beginning a transition or you're in a transition to a serverless version. How are you doing that without disrupting the experience for existing customers? And why go serverless at all? >> Yeah, it's interesting. So, you know, serverless was, it was kind of a an R&D project for us. And when we first started on a path, because I think you know, ultimately what we would love to do for the database is let's not even think about database, Keith. Like, I don't want to think about the database. What we're building too is, we want a SQL API in the cloud. That's it. I don't want to think about scale. I don't want to think about upgrades. I literally like. that stuff should just go away. That's what we need, right. As developers, I don't want to think about isolation levels or like, you know, give me DML and I want to be able to communicate. And for us the realization of that vision is like, if we're going to put a database on the planet for everybody to actually use it, we have to be really, really efficient. And serverless, which I believe really should be infrastructure less because I don't think we should be thinking of just about service. We got to think about, how do I take the context of regions out of this thing? How do I take the context of cloud providers out of what we're talking about? Let's just not think about that. Let's just code against something. Serverless was the answer. Now we've been building for about a year and a half. We launched a serverless version of Cockroach last October and we did it so that everybody in the public could have a free version of a database. And that's what serverless allows us to do. It's all consumption based up to certain limits and then you pay. But I think ultimately, and we spoke a little bit about this at the very beginning. I think as ISVs, people who are building software today the serverless vision gets really interesting because I think what's on the mind of the CTO is, how do I drive down my cost to the cloud provider? And if we can basically, drive down costs through either making things multi-tenant and super efficient, and then optimizing how much compute we use, spinning things down to zero and back up and auto scaling these sort of things in our software. We can start to make changes in the way that people are thinking about spend with the cloud provider. And ultimately we did that, so we could do things for free. >> So, Jim, I think I disagree Christian, I'm sorry, Jim. I think I disagree with you just a little bit. Christian, I think the biggest challenge facing CTOs are people. >> True. >> Getting the people to worry about cost and spend and implementation. So as you hear the concepts of CoachDB moving to a serverless model, and you're a large customer how does that make you think or react to your people side of your resources? >> Well, I can say that from the people side of resources luckily Cockroach is our least problem. So it just kind of, we always said, it's an operator stream because that was the part that just worked for us, so. >> And it's worked as you have scaled it? without you having ... >> Yeah. I mean, we use it in a bit of a, we do not really scale out like the Cockroach, like really large. It's like, more that we use it with the enterprise features of encryption in the stack and our customers then demand. If they do so, we have the Zas offering and we also do like dedicated stacks. So by having a fully cloud native solution on top of Kubernetes, as the foundational layer we can just use that and stamp it out and deploy it. >> How does that translate into services you can provide your customers? Are there services you can provide customers that you couldn't have, if you were running, say, MySQL? >> No, what we do is, we run this, so the SAS offering runs in our hybrid private cloud. And the other thing that we offer is that we run the entire stack at a cloud provider of their choosing. So if they are an AWS, they give us an AWS account, we put it in there. Theoretically, we could then also talk about using the serverless variant, if they like so, but it's not strictly required for us. >> So Christian, talk to me about that provisioning process because if I had a MySQL deployment before I can imagine how putting that into a cloud native type of repeatable CICD pipeline or Ansible script that could be difficult. Talk to me about that. How CockroachDB enables you to create new onboarding experiences for your customers? >> So what we do is, we use helm charts all over the place as probably everybody else. And then each application team has their parts of services, they've packaged them to helm charts, they've wrapped us in a super chart that gets wrapped into the super, super chart for the entire stack. And then at the right place, somewhere in between Cockroach is added, where it's a dependency. And as they just offer a helm chart that's as easy as it gets. And then what the teams do is they have an inner job, that once you deploy all that, it would spin up. And as soon as Cockroach is ready it's just the same reconcile loop as everything. It will then provision users, set up database schema, do all that. And initialize, initial data sets that might be required for a new setup. So with that setup, we can spin up a new cluster and then deploy that stack chart in there. And it takes some time. And then it's done. >> So talk to me about life cycle management. Because when I have one database, I have one schema. When I have a lot of databases I have a lot of different schemas. How do you keep your stack consistent across customers? >> That is basically part of the same story. We have get offs all over the place. So we have this repository, we see the super helm chart versions and we maintain like minus three versions and ensure that we update the customers and keep them up to date. It's part of the contract sometimes, down to the schedule of the customer at times. And Cockroach nicely supports also, these updates with these migrations in the background, the schema migrations in the background. So we use in our case, in that integration SQL alchemy, which is also nicely supported. So there was also part of the story from MySQL to Postgres, was supported by the ORM, these kind of things. So the skill approach together with the ease of helm charts and the background migrations of the schema is a very seamless upgrade operations. Before that we had to have downtime. >> That's right, you could have online schema changes. Upgrading the database uses the same concept of rolling upgrades that you have in Kubernetes. It's just cloud native. It just fits that same context, I think. >> Christian: It became a no-brainer. >> Yeah. >> Yeah. >> Jim, you mentioned the idea of a SQL API in the cloud, that's really interesting. Why does such a thing not exist? >> Because it's really difficult to build. You know, SQL API, what does that mean? Like, okay. What I'm going to, where does that endpoint live? Is there one in California one on the east coast, one in Europe, one in Asia? Okay. And I'm asking that endpoint for data. Where does that data live? Can you control where data lives on the planet? Because ultimately what we're fighting in software today in a lot of these situations is the speed of light. And so how do you intelligently place data on this planet? So that, you know, when you're asking for data, when you're maybe home, it's a different latency than when you're here in Valencia. Does that data follow and move you? These are really, really difficult problems to solve. And I think that we're at that layer of, we're at this moment in time in software engineering, we're solving some really interesting, interesting things cause we are budding against this speed of light problem. And ultimately that's one of the biggest challenges. But underneath, it has to have all this automation like the ease at which we can scale this database like the always on resilient, the way that we can upgrade the entire thing with just rolling upgrades. The cloud native concepts is really what's enabling us to do things at global scale it's automation. >> Let's alk about that speed of light in global scale. There's no better conference for speed of light, for scale, than Kubecon. Any predictions coming out of the show? >> It's less a prediction for me and more of an observation, you guys. Like look at two years ago, when we were here in Barcelona at QCon EU, it was a lot of hype. It's a lot of hype, a lot of people walking around, curious, fascinated, this is reality. The conversations that I'm having with people today, there's a reality. There's people really doing, they're becoming cloud native. And to me, I think what we're going to see over the next two to three years is people start to adopt this kind of distributed mindset. And it permeates not just within infrastructure but it goes up into the stack. We'll start to see much more developers using, Go and these kind of the threaded languages, because I think that distributed mindset, if it starts at the chip all the way to the fingertip of the person clicking and you're distributed everywhere in between. It is extremely powerful. And I think that's what Finleap, I mean, that's exactly what the team is doing. And I think there's a lot of value and a lot of power in that. >> Jim, Christian, thank you so much for coming on the Cube and sharing your story. You know what we're past the hype cycle of Kubernetes, I agree. I was a nonbeliever in Kubernetes two, three years ago. It was mostly hype. We're looking at customers from Microsoft, Finleap and competitors doing amazing things with this platform and cloud native in general. Stay tuned for more coverage of Kubecon from Valencia, Spain. I'm Keith Townsend, along with Paul Gillin and you're watching the Cube, the leader in high tech coverage. (bright music)

Published Date : May 19 2022

SUMMARY :

brought to you by Red Hat, Welcome to Valencia, Spain You go to a lot of conferences. I got to say it's overwhelming. And certainly some of the and Christian Huning, But we are in three and started the company and we were faced with So also to maintain that we And we were not dissatisfied. So talk to me a little and we have companies, customers I think we don't need it, And how do we give that kind disrupting the experience and we did it so that I think I disagree with Getting the people to worry because that was the part And it's worked as you have scaled it? It's like, more that we use it And the other thing that we offer is that So Christian, talk to me it's just the same reconcile I have a lot of different schemas. and ensure that we update the customers Upgrading the database of a SQL API in the cloud, the way that we can Any predictions coming out of the show? and more of an observation, you guys. so much for coming on the Cube

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Jim Walker	PERSON	0.99+
California	LOCATION	0.99+
Keith Townsend	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2018	DATE	0.99+
Germany	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
2015	DATE	0.99+
Frankfurt	LOCATION	0.99+
Keith	PERSON	0.99+
Europe	LOCATION	0.99+
seven	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
Cockroach Labs	ORGANIZATION	0.99+
Christia	PERSON	0.99+
Barcelona	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Valencia	LOCATION	0.99+
Asia	LOCATION	0.99+
Christian	PERSON	0.99+
Finleap Connect	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
Kubernetes	TITLE	0.99+
Valencia, Spain	LOCATION	0.99+
three	QUANTITY	0.99+
two years ago	DATE	0.99+
Finleap	ORGANIZATION	0.99+
three engineers	QUANTITY	0.99+
three countries	QUANTITY	0.99+
first guest	QUANTITY	0.99+
SQL API	TITLE	0.99+
Paul	PERSON	0.99+
Kubecon	ORGANIZATION	0.98+
last October	DATE	0.98+
eight years	QUANTITY	0.98+
2022	DATE	0.98+
each application	QUANTITY	0.98+
four countries	QUANTITY	0.98+
one database	QUANTITY	0.98+
one	QUANTITY	0.98+
2019	DATE	0.98+
three years ago	DATE	0.98+
CockroachDB	ORGANIZATION	0.98+
one schema	QUANTITY	0.98+
Christian Huning	PERSON	0.97+
about a year and a half	QUANTITY	0.97+
two	DATE	0.96+
first	QUANTITY	0.96+
Christian Hüning	PERSON	0.94+
today	DATE	0.94+
about seven	QUANTITY	0.93+
Cloudnativecon	ORGANIZATION	0.93+
three years	QUANTITY	0.93+

Mary Roth, Couchbase | Couchbase ConnectONLINE 2021

(upbeat music playing) >> Welcome to theCUBE's coverage of Couchbase ConnectONLINE Mary Roth, VP of Engineering Operations with Couchbase is here for Couchbase ConnectONLINE. Mary. Great to see you. Thanks for coming on remotely for this segment. >> Thank you very much. It's great to be here. >> Love the fire in the background, a little fireside chat here, kind of happening, but I want to get into it because, Engineering and Operations with the pandemic has really kind of shown that, engineers and developers have been good, working remotely for a while, but for the most part it's impacted companies in general, across the organizations. How did the Couchbase engineering team adapt to the remote work? >> Great question. And I actually think the Couchbase team responded very well to this new model of working imposed by the pandemic. And I have a unique perspective on the Couchbase journey. I joined in February, 2020 after 20 plus years at IBM, which had embraced a hybrid, in-office remote work model many years earlier. So in my IBM career, I live four minutes away from my research lab in Almaden Valley, but IBM is a global company with headquarters on the East Coast, and so throughout my career, I often found myself on phone calls with people around the globe at 5:00 AM in the morning, I quickly learned and quickly adapted to a hybrid model. I'd go into the office to collaborate and have in-person meetings when needed. But if I was on the phone at 5:00 AM in the morning, I didn't feel the need to get up at 4:30 AM to go in. I just worked from home and I discovered I could be more productive there, doing think time work, and I really only needed the in-person time for collaboration. This hybrid model allowed me to have a great career at IBM and raise my two daughters at the same time. So when I joined Couchbase, I joined a company that was all about being in-person and instead of a four minute commute, it was going to be an hour or more commute for me each way. This was going to be a really big transition for me, but I was excited enough by Couchbase and what it offered, that I decided to give it a try. Well, that was February, 2020. I showed up early in the morning on March 10th, 2020 for an early morning meeting in-person only to learn that I was one of the only few people that didn't get the memo. We were switching to a remote working model. And so over the last year, I have had the ability to watch Couchbase and other companies pivot to make this remote working model possible and not only possible, but effective. And I'm really happy to see the results. A remote work model does have its challenges, that's for sure, but it also has its benefits, better work-life balance and more time to interact with family members during the day and more quiet time just to think. We just did a retrospective on a major product release, Couchbase server 7.0, that we did over the past 18 months. And one of the major insights by the leadership team is that working from home actually made people more effective. I don't think a full remote model is the right approach going forward, but a hybrid model that IBM adopted many years ago and that I was able to participate in for most of my career, I believe is a healthier and more productive approach. >> Well, great story. I love the come back and now you take leverage of all the best practices from the IBM days, but how did they, your team and the Couchbase engineering team react? And were there any best practices or key learnings that you guys pulled out of that? >> The initial reaction was not good. I mean, as I mentioned, it was a culture based on in-person, people had to be in in-person meetings. So it took a while to get used to it, but there was a forcing function, right? We had to work remotely. That was the only option. And so people made it work. I think the advancement of virtual meeting technology really helps a lot. Over earlier days in my career where I had just bad phone connections, that was very difficult. But with the virtual meetings that you have, where you can actually see people and interact, I think is really quite helpful. And probably the key. >> What's the DNA of the company there? I mean, every company's got the DNA, Intel's Moore's Law, and what's the engineering culture at Couchbase like, if you could describe it. >> The engineering culture at Couchbase is very familiar to me. We are at our heart, a database company, and I grew up in the database world, which has a very unique culture based on two values, merit and mentorship. And we also focus on something that I like to call growing the next generation. Now database technology started in the late sixties, early seventies, with a few key players and institutions. These key players were extremely bright and they tackled and solved really hard problems with elegant solutions, long before anybody knew they were going to be necessary. Now, those original key players, people like Jim Gray, Bruce Lindsay, Don Chamberlin, Pat Selinger, David Dewitt, Michael Stonebraker. They just love solving hard problems. And they wanted to share that elegance with a new generation. And so they really focused on growing the next generation of leaders, which became the Mike Carey's and the Mohan's and the Lagerhaus's of the world. And that culture grew over multiple generations with the previous generation cultivating, challenging, and advocating for the next, I was really lucky to grow up in that culture. And I've advanced my career as a result, as being part of it. The reason I joined Couchbase is because I see that culture alive and well here. Our two fundamental values on the engineering side, are merit and mentorship. >> One of the things I want to get your thoughts on, on the database questions. I remember, back in the old glory days, you mentioned some of those luminaries, you know, there wasn't many database geeks out there, there was kind of a small community, now, as databases are everywhere. So you see, there's no one database that has rule in the world, but you starting to see a pattern of database, kinds of things are emerging, more databases than ever before, they are on the internet, they are on the cloud, there are none the edge. It's essentially, we're living in a large distributed computing environment. So now it's cool to be in databases because they're everywhere. (laughing) So, I mean, this is kind of where we are at. What's your reaction to that? >> You're absolutely right. There used to be a few small vendors and a few key technologies and it's grown over the years, but the fundamental problems are the same, data integrity, performance and scalability in the face of distributed systems. Those were all the hard problems that those key leaders solved back in the sixties and seventies. They're not new problems. They're still there. And they did a lot of the fundamental work that you can apply and reapply in different scenarios and situations. >> That's pretty exciting. I love that. I love the different architectures that are emerging and allows for more creativity for application developers. And this becomes like the key thing we're seeing right now, driving the business and a big conversation here at the, at the event is the powering of these modern applications that need low latency. There's no more, not many spinning disks anymore. It's all in RAM, all these kinds of different memory, you got centralization, you got all kinds of new constructs. How do you make sense of it all? How do you talk to customers? What's the main core thing happening right now? If you had to describe it. >> Yeah, it depends on the type of customer you're talking to. We have focused primarily on the enterprise market and in that market, there are really fundamental issues. Information for these enterprises is key. It's their core asset that they have and they understand very well that they need to protect it and make it available more quickly. I started as a DBA at Morgan Stanley, back, right out of college. And at the time I think it was, it probably still is, but at the time it was the best run IT shop that I'd ever seen in my life. The fundamental problems that we had to solve to get information from one stock exchange to another, to get it to the SEC are the same problems that we're solving today. Back then we were working on mainframes and over high-speed Datacom links. Today, it's the same kind of problem. It's just the underlying infrastructure has changed. >> Yeah, the key, there has been a big supporter of women in tech. We've done thousands of interviews and why I got you. I want to ask you if you don't mind, career advice that you give women who are starting out in the field of engineering, computer science. What do you wish you knew when you started your career? And if you could be that person now, what would you say? >> Yeah, well, a lot of things I wish I knew then that I know now, but I think there are two key aspects to a successful career in engineering. I actually got started as a math major and the reason I became a math major is a little convoluted. As a girl, I was told we were bad at math. And so for some reason I decided that I had to major in it. That's actually how I got my start, but I've had a great career. And I think there are really two key aspects. First, is that it is a discipline in which respect is gained through merit. As I had mentioned earlier, engineers are notoriously detail-oriented and most are, perfectionists. They love elegant, well thought-out solutions and give respect when they see one. So understanding this can be a very important advantage if you're always prepared and you always bring your A-game to every debate, every presentation, every conversation, you have build up respect among your team, simply through merit. While that may mean that you need to be prepared to defend every point early on, say, in your graduate career or when you're starting, over time others will learn to trust your judgment and begin to intuitively follow your lead just by reputation. The reverse is also true. If you don't bring your A-game and you don't come prepared to debate, you will quickly lose respect. And that's particularly true if you're a woman. So if you don't know your stuff, don't engage in the debate until you do. >> That's awesome advice. >> That's... >> All right, continue. >> Thank you. So my second piece of advice that I wish I could give my younger self is to understand the roles of leaders and influencers in your career and the importance of choosing and purposely working with each. I like to break it down into three types of influencers, managers, mentors, and advocates. So that first group are the people in your management chain. It's your first line manager, your director, your VP, et cetera. Their role in your career is to help you measure short-term success. And particularly with how that success aligns with their goals and the company's goals. But it's important to understand that they are not your mentors and they may not have a direct interest in your long-term career success. I like to think of them as, say, you're sixth grade math teacher. You know, you getting an A in the class and advancing to seventh grade. They own you for that. But whether you get that basketball scholarship to college or getting to Harvard or become a CEO, they have very little influence over that. So a mentor is someone who does have a shared interest in your long-term success, maybe by your relationship with him or her, or because by helping you shape your career and achieve your own success, you help advance their goals. Whether it be the company success or helping more women achieve leadership positions or getting more kids into college on a basketball scholarship, whatever it is, they have some long-term goal that aligns with helping you with your career. And they give great advice. But that mentor is not enough because they're often outside the sphere of influence in your current position. And while they can offer great advice and coaching, they may not be able to help you directly advance. That's the role of the third type of influencer. Somebody that I call an advocate. An advocate is someone that's in a position to directly influence your advancement and champion you and your capabilities to others. They are in influential positions and others place great value in their opinions. Advocates stay with you throughout your career, and they'll continue to support you and promote you wherever you are and wherever they are, whether that's the same organization or not. They're the ones who, when a leadership position opens up will say, I think Mary's the right person to take on that challenge, or we need to move in a new direction, I think Mary's the right person to lead that effort. Now advocates are the most important people to identify early on and often in your career. And they're often the most overlooked. People early on often pay too much attention and rely on their management chain for advancement. Managers change on a dime, but mentors and advocates are there for you for the long haul. And that's one of the unique things about the database culture. Those set of advocates were just there already because they had focused on building the next generation. So I consider, you know, Mike Carey as my father and Mike Stonebraker as my grandfather, and Jim Gray as my great-grandfather and they're always there to advocate for me. >> That's like a schema and a database. You got to have it all right there, kind of teed up. Beautiful. (laughing) Great advice. >> Exactly. >> Thank you for that. That was really a masterclass. And that's going to be great advice for folks, really trying to figure out how to play the cards they have and the situation, and to double down or move and find other opportunities. So great stuff there. I do have to ask you Mary, thanks for coming on the technical side and the product side. Couchbase Capella was launched in conjunction with the event. What is the bottom line for that as, as an Operations and Engineering, built the products and rolled it out. What's the main top line message for about that product? >> Yeah. Well, we're very excited about the release of Capella and what it brings to the table is that it's a fully managed and automated database cloud offering so that customers can focus on development and building and improving their applications and reducing the time to market without having to worry about the hard problems underneath, and the operational database management efforts that come with it. As I mentioned earlier, I started my career as a DBA and it was one of the most sought after and highly paid positions in IT because operating a database required so much work. So with Capella, what we're seeing is, taking that job away from me. I'm not going to be able to apply for a DBA tomorrow. >> That's great stuff. Well, great. Thanks for coming. I really appreciate it. Congratulations on the company and the public offering this past summer in July and thanks for that great commentary and insight on theCUBE here. Thank you. >> Thank you very much. >> Okay. Mary Roth, VP of Engineering Operations at Couchbase part of Couchbase ConnectONLINE. I'm John Furrier, host of theCUBE. Thanks for watching. (upbeat music playing)

Published Date : Oct 26 2021

SUMMARY :

Great to see you. It's great to be here. but for the most part it's I didn't feel the need to I love the come back And probably the key. I mean, every company's got the DNA, and the Mohan's and the that has rule in the world, in the face of distributed systems. I love the different And at the time I think it I want to ask you if you don't mind, don't engage in the debate until you do. and they'll continue to support you You got to have it all right I do have to ask you Mary, and reducing the time to market and the public offering Mary Roth, VP of Engineering Operations

ENTITIES

Entity	Category	Confidence
Jim Gray	PERSON	0.99+
Mary	PERSON	0.99+
Mike Carey	PERSON	0.99+
Mike Stonebraker	PERSON	0.99+
David Dewitt	PERSON	0.99+
Mary Roth	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
February, 2020	DATE	0.99+
IBM	ORGANIZATION	0.99+
Bruce Lindsay	PERSON	0.99+
Pat Selinger	PERSON	0.99+
Almaden Valley	LOCATION	0.99+
Don Chamberlin	PERSON	0.99+
John Furrier	PERSON	0.99+
5:00 AM	DATE	0.99+
Today	DATE	0.99+
Morgan Stanley	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
4:30 AM	DATE	0.99+
First	QUANTITY	0.99+
late sixties	DATE	0.99+
Couchbase	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
early seventies	DATE	0.99+
two daughters	QUANTITY	0.99+
last year	DATE	0.99+
two key aspects	QUANTITY	0.98+
tomorrow	DATE	0.98+
today	DATE	0.98+
East Coast	LOCATION	0.98+
one	QUANTITY	0.98+
each	QUANTITY	0.98+
an hour	QUANTITY	0.98+
four minute	QUANTITY	0.98+
four minutes	QUANTITY	0.98+
March 10th, 2020	DATE	0.97+
SEC	ORGANIZATION	0.97+
first group	QUANTITY	0.97+
seventh grade	QUANTITY	0.97+
each way	QUANTITY	0.96+
one stock exchange	QUANTITY	0.96+
two fundamental values	QUANTITY	0.96+
20 plus years	QUANTITY	0.95+
One	QUANTITY	0.95+
Couchbase	TITLE	0.95+
two values	QUANTITY	0.95+
first line	QUANTITY	0.94+
past summer	DATE	0.92+
sixth grade	QUANTITY	0.91+
Lagerhaus	ORGANIZATION	0.89+
Capella	ORGANIZATION	0.85+
theCUBE	ORGANIZATION	0.85+
third type	QUANTITY	0.85+
July	DATE	0.84+
Mohan's	ORGANIZATION	0.84+
years earlier	DATE	0.82+
Mike Carey's	ORGANIZATION	0.82+
Datacom	ORGANIZATION	0.81+
Capella	PERSON	0.8+
sixties	DATE	0.8+
Intel	ORGANIZATION	0.77+
Couchbase ConnectONLINE	ORGANIZATION	0.76+
past 18 months	DATE	0.75+
pandemic	EVENT	0.74+
2021	DATE	0.74+
Harvard	LOCATION	0.72+
seventies	DATE	0.72+
Couchbase Capella	ORGANIZATION	0.67+

Mary Roth, Couchbase | Couchbase ConnectONLINE 2021

>>And welcome to the cubes coverage of Couchbase connect online, Mary Roth, VP of engineering operations with couch basis here for Couchbase connect online. Mary. Great to see you. Thanks for coming on remotely for this segment. >>Thank you very much. It's great to be here. >>Love the fire in the background, a little fireside chat here, kind of happening, but I want to get into shooting, you know, engineering and operations with the pandemic has really kind of shown that, you know, engineers and developers have been good working remotely for a while, but for the most part it's impacted companies in general, across the organizations. How did the Couchbase engineering team adapt to the remote work? >>Uh, great question. Um, and I actually think the Couchbase team responded very well to this new model of working imposed by the pandemic. And I have a unique perspective on the couch space journey. I joined in February, 2020 after 20 plus years at IBM, which had embraced a hybrid in-office rewrote remote work model many years earlier. So in my IBM career, I live four minutes away from my research lab in almond and valley, but IBM is a global company with headquarters on the east coast and SU. So throughout my career, I often found myself on phone calls with people around the globe at 5:00 AM in the morning, I quickly learned and quickly adopted to a hybrid model. I'd go into the office to collaborate and have in-person meetings when needed. But if I was on the phone at >> 5: 00 AM in the morning, um, I didn't feel the need to get up at 4:30 AM to go in. >>I just worked from home and I discovered I could be more productive. They're doing think time work. And I really only needed the in-person time for collaboration. These hybrid model allowed me to have a great career at IBM and raise my two daughters at the same time. So when I joined Couchbase I joined a company that was all about being in-person and instead of a four minute commute, it was going to be an hour or more commute for me each way. This was going to be a really big transition for me, but I was excited enough by couch facing what it offered that I decided to give it a try. Well, that was February, 2020. I showed up early in the morning on March 10th, 2020 for an early morning meeting in person only to learn that I was one of the only few people that didn't get the memo. >>We were switching to a remote remote working model. And so over the last year, I have had the ability to watch cow's face and other companies pivot to make this remote working model possible and not only possible, but effective. And I'm really happy to see the results. Our remote work model does have its challenges that's for sure, but it also has its benefits better work-life balance and more time to interact with family members during the day and more quiet time, just to think we just did a retrospective on a major product release Couchbase server 7.0 that we did over the past 18 months. And one of the major insights by the leadership team is that working from home actually made people more effective. I don't think a full remote model is the right approach going forward, but a hybrid model that IBM adopted many years ago and that I was able to participate in for most of my career, I believe is a healthier and more productive approach. >>Well, great story. I love the, um, the, uh, you come back and now you take leverage all the best practices from the IBM days, but how did the, your team and the Couchbase engineering team react and were there any best practices or key learnings that you guys pulled out of that, >>Uh, the, the initial reaction was not good. I mean, as I mentioned, it was a culture based on in-person people had to be in person in person meetings. So it took a while to get used to it, but the, there was a forcing function, right? We had to work remotely. That was the only option. And so people made it work. I think the advancement of virtual meeting technology really, really helps a lot over earlier days in my career where I had just bad phone connections, that was very difficult. But with the virtual meetings that you have, where you can actually see people and interact, I think is really quite helpful. >>What's the DNA of the culture. What's the DNA. Every company's got the DNA entails Moore's law. Um, and at what's the engineering culture at Couchbase like if you could describe it. >>Uh, the engineering culture at Couchbase is very familiar to me. We are at our heart, a database company, and I grew up in the database world, which has a very unique culture based on two values, merit and mentorship. And we also focus on something that I like to call growing. The next generation. Now database technology started in the late sixties, early seventies with a few key players and institutions. These key players were extremely bright and they tackle it and solve really hard problems with elegant solutions long before anybody knew they were going to be necessary. Now, those original key players, people like Jim gray, Bruce Lindsey, Don Chamberlin, pat Salinger, David Dewitt, Michael Stonebraker. They just love solving hard problems. And they wanted to share that elegance with a new generation. And so they really focused on growing the next generation of leaders, which became the Mike caries and the Mohans and the lower houses of the world. And that culture grew over multiple generations with the previous generation cultivating, challenging and advocating for the next, I was really lucky to grow up in that culture. And I've advanced my career as a result, as being part of it. The reason I joined Couchbase is because I see that culture alive and well, here are two fundamental values on the engineering side, our merit and mentorship. >>One of the things I want to get your thoughts on, on the database questions. I remember, you know, back in the old glory days, you mentioned some of those luminaries, you know, there wasn't many database geeks out there, Zuri kind of small community now is databases are everywhere. So you see there's no one database that's ruling the world, but you starting to see a pattern of database kinds of things, and more emerging, more databases than ever before. They're on the internet, they're on the cloud. There are none the edge it's essentially we're living in a large distributed computing environment. So now it's cool to be in databases cause they're everywhere. So, I mean, this is kind of where we're at. What's your reaction to that? >>Uh, you're absolutely right there. There used to be a, a few small vendors and a few key technologies and it's grown over the years, but the fundamental problems are the same data, integrity, performance and scalability. And in the face of district distributed systems, those were all the hard problems that those key leaders solve back in the sixties and seventies. They're not, they're not new problems. They're still there. And they did a lot of the fundamental work that you can apply and reapply in different scenarios and situations. >>It's pretty exciting. I love that. I love the different architectures that are emerging and allows for more creativity for application developers. And this becomes like the key thing we're seeing right now, driving the business and a big conversation here at the, at the event is the powering, these modern applications that need low latency. There's no more, not many spinning disks anymore. It's all in Ram, all these kinds of different memory, you got decentralization and all kinds of new constructs. How do you make sense of it all? How do you talk to customers? What's the, what's the, what's the main core thing happening right now? If you had to describe it? >>Yeah, it depends on the type of customer you're talking to. Um, we have focused primarily on the enterprise market and in that market, there are really fundamental issues. Information for, for these enterprises is key. It's their core asset that they have and they understand very well that they need to protect it and make it available more quickly. I started as a DBA at Morgan Stanley back, um, right out of college. And at the time I think it was, it probably still is, but at the time it was the best run it shop that I'd ever seen in my life. The fundamental problems that we had to solve to get information from one stock exchange to another, to get it to the sec, um, are the same problems that we're solving today. Back then we were working on mainframes and over high-speed data comm links today, it's the same kind of problem. It's just the underlying infrastructure has changed. >>You know, the key has been a big supporter of women in tech. We've done thousands of interviews on why I got you. I want to ask you, uh, if you don't mind, um, career advice that you give women who are starting out in the field of engineering, computer science, what do you wish you knew when you started your career? And you could be that person now, what would you say? >>Yeah, well, there are a lot of things I wish I knew then, uh, that I know now, but I think there are two key aspects to a successful career in engineering. I actually got started as a math major and the reason I, I became a math major is a little convoluted. Is it as a girl, I was told we were bad at math. And so for some reason I decided that I had to major in it. That's actually how I got my start. Um, but I've had a great career and I think there are really two key aspects first. And is that it is a discipline in which respect is gained through merit. As I had mentioned earlier, engineers are notoriously detail oriented and most of our perfectionist, they love elegant, well thought out solutions and give respect when they see one. So understanding this can be a very important advantage if you're always prepared and you always bring your a game to every debate, every presentation, every conversation you have build up respect among your team, simply through merit. While that may mean that you need to be prepared to defend every point early on say, in your graduate career or when you're starting over time, others will learn to trust your judgment and begin to intuitively follow your lead just by reputation. The reverse is also true. If you don't bring your a game and you don't come prepared to debate, you will quickly lose respect. And that's particularly true if you're a woman. So if you don't know your stuff, don't engage in the debate until you do. That's awesome. >>That's >>Fine. Continue. Thank you. So my second piece of advice that I wish I could give my younger self is to understand the roles of leaders and influencers in your career and the importance of choosing and purposely working with each. I like to break it down into three types of influencers, managers, mentors, and advocates. So that first group are the people in your management chain. It's your first line manager, your director, your VP, et cetera. Their role in your career is to help you measure short-term success. And particularly with how that success aligns with their goals and the company's goals. But it's important to understand that they are not your mentors and they may not have a direct interest in your long-term career success. I like to think of them as say, you're sixth grade math teacher. You know, you're getting an a in the class and advancing to seventh grade. >>They own you for that. Um, but whether you get that basketball scholarship to college or getting to Harvard or become a CEO, they have very little influence over that. So a mentor is someone who does have a shared interest in your longterm success, maybe by your relationship with him or her, or because by helping you shape your career and achieve your own success, you help advance their goals. Whether it be the company success or helping more women achieve, we do put sip positions or getting more kids into college, on a basketball scholarship, whatever it is, they have some long-term goal that aligns with helping you with your career. And they gave great advice. But that mentor is not enough because they're often outside of the sphere of influence in your current position. And while they can offer great advice and coaching, they may not be able to help you directly advance. >>That's the role of the third type of influencer. Somebody that I call an advocate, an advocate is someone that's in a position to directly influence your advancement and champion you and your capabilities to others. They are in influential positions and others place, great value in their opinions. Advocates stay with you throughout your career, and they'll continue to support you and promote you wherever you are and wherever they are, whether that's the same organization or not. They're the ones who, when a leadership position opens up will say, I think Mary's the right person to take on that challenge, or we need to move in a new direction. I think Mary's the right person to lead that effort. Now advocates are the most important people to identify early on and often in your career. And they're often the most overlooked people early on, often pay too much attention and rely on their management chain for advanced managers, change on a dime, but mentors and advocates are there for you for the long haul. And that's one of the unique things about the database culture. Those set of advocates were just there already because they had focused on building the next generation. So I consider, you know, Mike Carey is my father and Mike Stonebraker is my grandfather. And Jim gray is my great-grandfather and they're always there to advocate for me. >>That's like a scheme and a database. You got to have it all white. They're kind of teed up. Beautiful, great advice. >>Thank you for that. That was really a masterclass. And that's going to be great advice for folks really trying to figure out how to play the cards they have a and the situation and to double down or move and find other opportunities. So great stuff there. I do have to ask you Maira, thanks for coming on the technical side and the product side Couchbase Capella was launched, uh, in conjunction with the event. What is, what is the bottom line for that as, as an operations and engineering, you know, built the products and roll it out. What's the main top line message for about that product? >>Yeah, well, we're very excited about the release of Capella and what it brings to the table is that it's a fully managed in an automated database cloud offering so that customers can focus on development and building and improving their applications and reducing the time to market without having to worry about the hard problems underneath and the operational database management efforts that come with it. Uh, as I mentioned earlier, I started my career as a UVA and it was one of the most sought after and highly paid positions in it because operating a database required so much work. So with Capella, what we're seeing is, you know, taking that job away from me, I'm not going to be able to apply for a DBA tomorrow. >>That's great stuff. Well, great. Thanks for coming. I really appreciate congratulations on the company and public offering this past summer in July and thanks for that great commentary and insight on the QPR. Thank you. >>Thank you very much. >>Okay. Mary Ross, VP of engineering operations at Couchbase part of Couchbase connect online. I'm John furry host of the cube. Thanks for watching.

Published Date : Oct 18 2021

SUMMARY :

And welcome to the cubes coverage of Couchbase connect online, Mary Roth, VP of engineering operations with Thank you very much. How did the Couchbase engineering team adapt to the I'd go into the office to collaborate and have in-person meetings when needed. And I really only needed the in-person time for collaboration. And one of the major insights by the leadership I love the, um, the, uh, you come back and now you take leverage all the best practices from the IBM But with the virtual meetings that you have, Um, and at what's the engineering culture at Couchbase like if you could describe it. and the lower houses of the world. One of the things I want to get your thoughts on, on the database questions. And in the face of district distributed I love the different architectures that are emerging and allows for more creativity for And at the time I think it was, computer science, what do you wish you knew when you started your career? So if you don't know your stuff, don't engage in the debate until you do. the people in your management chain. aligns with helping you with your career. Now advocates are the most important people to identify early on and often in your career. You got to have it all white. I do have to ask you Maira, the time to market without having to worry about the hard problems underneath and I really appreciate congratulations on the company and public offering I'm John furry host of the cube.

ENTITIES

Entity	Category	Confidence
David Dewitt	PERSON	0.99+
Mary Roth	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
February, 2020	DATE	0.99+
Mike Stonebraker	PERSON	0.99+
Bruce Lindsey	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Don Chamberlin	PERSON	0.99+
Jim gray	PERSON	0.99+
Mary Ross	PERSON	0.99+
Mike Carey	PERSON	0.99+
Maira	PERSON	0.99+
4:30 AM	DATE	0.99+
Mary	PERSON	0.99+
pat Salinger	PERSON	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Couchbase	ORGANIZATION	0.99+
5:00 AM	DATE	0.99+
two daughters	QUANTITY	0.99+
5: 00 AM	DATE	0.99+
second piece	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Moore	PERSON	0.99+
Capella	ORGANIZATION	0.99+
late sixties	DATE	0.99+
John	PERSON	0.99+
last year	DATE	0.99+
four minute	QUANTITY	0.98+
four minutes	QUANTITY	0.98+
an hour	QUANTITY	0.98+
two key aspects	QUANTITY	0.98+
one	QUANTITY	0.98+
March 10th, 2020	DATE	0.98+
first line	QUANTITY	0.98+
2021	DATE	0.98+
tomorrow	DATE	0.97+
almond	LOCATION	0.97+
two values	QUANTITY	0.97+
first group	QUANTITY	0.97+
third type	QUANTITY	0.97+
early seventies	DATE	0.97+
two fundamental values	QUANTITY	0.96+
20 plus years	QUANTITY	0.96+
seventh grade	QUANTITY	0.96+
One	QUANTITY	0.95+
each way	QUANTITY	0.95+
SU	LOCATION	0.95+
July	DATE	0.95+
one stock exchange	QUANTITY	0.95+
today	DATE	0.94+
first	QUANTITY	0.93+
Mike caries	PERSON	0.92+
each	QUANTITY	0.92+
sixth grade	QUANTITY	0.91+
Couchbase connect	ORGANIZATION	0.9+
sixties	DATE	0.87+
seventies	DATE	0.84+
past 18 months	DATE	0.83+
Couchbase server 7.0	TITLE	0.83+
Zuri	ORGANIZATION	0.81+
Mohans	PERSON	0.75+
interviews	QUANTITY	0.75+
past summer	DATE	0.69+
years earlier	DATE	0.68+
east coast	LOCATION	0.67+
pandemic	EVENT	0.66+
Harvard	LOCATION	0.63+
many years	DATE	0.57+
se	ORGANIZATION	0.55+
many	DATE	0.53+

Marc Linster, EDB | Postgres Vision 2021

(upbeat music) >> Narrator: From around the globe, it's theCUBE, with digital coverage of Postgres Vision 2021, brought to you by EDB. >> Well, good day, everybody. John Walls here on theCUBE, and continuing our CUBE conversation as part of Postgres Vision 2021, sponsored by EDB, with EDB Chief Technology Officer, Mr. Mark Linster. Mark, good morning to you. How are you doing today? >> I'm doing very fine, very good, sir. >> Excellent. Excellent. Glad you could join us. And we appreciate the time, chance, to look at what's going on in this world of data, which, as you know, continues to evolve quite rapidly. So let's just take that 30,000-foot perspective here to begin with here, and let's talk about data, and management, and what Postgres is doing in terms of accelerating all these innovative techniques, and solutions, and services that we're seeing these days. >> Yeah, so I think it's really... It's a fantastic confluence of factors that we've seen in Postgres, or are seeing in Postgres today, where Postgres has really, really matured over the last couple of years, where things like high availability, parallel processing, use of very high core counts, et cetera, have come together with the drive towards digital transformation, the enormous amounts of data that businesses are dealing with today, so, and then the third factor's really the embracing of open source, right? I mean, Linux has shown the way, and has shown that this is really, really possible. And now we're seeing Postgres as, I think, the next big open source innovation, after Linux, achieving the same type of transformation. So it's really, it's a maturing, it's an acceptance, and the big drive towards dealing with a lot more data as part of digital transformation. >> You know, part of that acceptance that you talk about is about kind of accepting the fact that you have a legacy system that maybe, if you're not going to completely overhaul, you still have to integrate, right? You've got to compliment and start this kind of migration. So in your perspective, or from your perspective, what kind of progress is Postgres allowing in the mindset of CTOs among your client base, or whatever, that their legacy systems can function in this new environment, that all is not lost, and while there is some, perhaps, catching up to do, or some patching you have to do here and there, that it's not as arduous, or not as complex, as might appear to be on the face. >> Well, I think there's, the maturing of Postgres that has really really opened this up, right? Where we're seeing that Postgres can handle these workloads, right? And at the same time, there's a growing number of success cases where companies across all industries, financial services, insurance, manufacturing, retail are using Postgres. So, so you're no longer, you're no longer the first leader who's taken a higher risk, right? Like, five or 10 years ago, Postgres knowledge was not readily available. So if you want Postgres, it was really hard to find somebody who could support you, right? Or find an employee that you could hire who would be the Postgres expert. That's no longer the case. There's plenty of books about Postgres. There's lots of conferences about Postgres. It's a big meetup topic. So, getting know how and getting acceptance amongst your team to use Postgres has become a lot easier, right? At the same time, over 90% of all enterprises today use open source in one way or the other. Which basically means they have open source policies. They have ways to bring open source into the development stream. So that makes it possible, right? Whereas before it was really hard, you had to have an individual who would be evangelized to go, get open source, et cetera, now open source is something that almost everybody is using. You know, from government to financing services, open sources use all over the place, right? So, so now you have something that really matured, right? There's a lot of references out there and then you have the policies that make it possible, right? You have the success stories and now all the pieces have come together to deal with this onslaught of data, right? And then maybe the last thing that that really plays a big role is the cloud. Postgres runs everywhere, right? I mean, it runs from an Arduino to Amazon. Everywhere. And so, which basically means if you want to drive agile business transformation, you call Postgres because you don't have to decide today where it's going to run. You're not locking into a vendor. You're not locking into a limited support system. You can run this thing anywhere. It'll run on your laptop. It'll run on every cloud in the world. You can have it managed, you can have it hosted. You can add have every flavor you want and there's lots of good Postgres support companies out there. So all of these factors together is really what makes us so interesting, right? >> Kubernetes and this marriage, this complimentary, you know relationship right now with Kubernetes, what has that done? You think in terms of providing additional services or at least providing perhaps a new approach or new philosophies, new concepts in terms of database management? >> Well, it's maybe the most the most surprising thing or surprising from the outside. Probably not from the inside, but you think that that Postgres this now 25 year old, database twenty-five year old open source project would be kind of like completely, you know, incompatible with Kubernetes, with containers. But what really happens is Postgres in containers today is the number one database, after Engine X. It is the number two software that is being deployed in containers. So it's really become the workhorse of the whole microservices transformation, right? A 25 year old software, well, it has a very small footprint. It has a lot of interesting features like GIS, document processing, now graph capabilities, common table expressions all those things that are really like cool for developers. And that's probably what leads it to be the number one database in containers. So it's absolutely compatible with Kubernetes. And the whole transformation towards microservices is is like, you know, there's nothing better out there. It runs everywhere and has the most innovative technologies in it. And that's what we're seeing. Also, you go to the annual stack overflow survey of developers, right? It's been consistently number one or number two most loved and most used database, right? So, so what's amazing is that it's this relatively old technology that is, you know, beating everybody else in this digital transformation and then the adoption by developers. >> Just like old dog new tricks, right? It's still winning, right? >> Yeah, yeah, and, and, you know, the elephant is the symbol and this elephant does dance. >> Still dancing that's right. You know, and this is kind of a loaded question but there are a lot of databases out there, a lot of options, obviously from your perspective, you know, Postgres is winning, right? And, and, and from the size of the marketplace it is certainly leading RA leader. In your opinion, you know, what, what is this confluence of factors that have influenced this, this market position if you will, of Postgres or market acceptance of Postgres? >> It's, I mean, it's the, it's a maturing of the core. As I said before, that the transaction rates et cetera, Postgres can handle, are growing every year and are growing dramatic, right? So that's one thing. And then you have it, that Postgres is really, I think, the most reliable and relational database out there as what is my opinion, I'm biased, I guess. And, and it's, it's super quality code but then you add to that the innovation drive. I mean, it was the first one out there with good JSONB support, right? And now it's brought in JSON Path as as part of the new SQL standard. So now you can address JSON data inside your database and the same way you do it inside your browser. And that's pretty cool for developers. Then you combine that with PostGIS, right, which is, I think the most advanced GIS system out there in database. Now, now you got relations, asset compliant, GIS and document. You may say what's so cool about that. Well, what's cool about it is I can do absolutely reliable asset compliant transactions. I can have a fantastic personalization engine through JSONB, and then all my applications need to know where is the transaction? Where is the next store? How far away I'm a form of the parking spot? Right? So now I got a really really nice recipe to put the applications of the future together. You add onto that movements toward supporting graph and supporting other capabilities inside the database. So now you got, you got capability, you've got reliability and you got fantastic innovation. I mean, there's nothing better out there. >> Let's hit the security angle here, 'cause you talked about the asset test, and certainly, you know, those, that criteria is being met. No question about that, whether it's isolation, durability, consistency, whatever, but, but security, I don't have to tell you what a growing concern this is. It's already paramount, but we're seeing every day write stories about, about intrusions and and invasions, if you will. So in terms of providing that layer of security that everybody's looking for right now, you know, this this ultra impenetrable force, if you will, what in your mind, what's Postgres allowing for, in that respect in terms of security, peace of mind, and maybe a little additional comfort that everybody in your space is looking for these? >> So, so look at, look at security with a database like, like multiple layers, right? There's not just, you don't do security only one place. It's like when you go into a bank branch, right? I mean, they do lock the door, they have a camera, there is a gate in front of the safe, there's a safe door. And inside the safe, there is still, again safety deposit boxes with individual locks. The same applies to Postgres, right? Where let's say we start at the heart of it where we can secure and protect tables and data. We're using access control lists and groups and usernames, et cetera. Right? So that's, that's at the heart of it. But then outside of that, we can encrypt the data when on disk or when it's in transit on disk. Most people use the Linux disc encryption systems but there's also good partners out there, like like more metric or others that we work with, that that provide security on disk. And then you go out from there and then you have the securing of the database itself again through the log-ins and the groups. You go out from there and now you have the securing of the hosts that the database is sitting on. Then you'll look at securing the data on the networks through SSL and certificates, et cetera. So that basically there's a multi-layer security model layer that positions Postgres extremely well. And then maybe the last thing is to say it certainly integrates very well with ELDAP, active directory, Kerberos, all the usual suspects that you would use to secure technology inside the enterprise or in an open network, like where people work from home, et cetera. >> You talked about the history about this 25 year old technology, you know, founded back at Cal Berkeley, you know, probably almost some 30 years ago and certainly has evolved. And, and as you have pointed out now as a very mature technology, what do you see though in terms of growth from here? Like, where does it go from here in the next 18 months, 24 months, what what do you think is that next barrier, that challenge that that you think the technology and this open source community wants to take on? >> Well, I think there's there's the continuous effort of making it faster, right? That always happens, right? Every database wants to be faster do more transactions per second, et cetera. And there's a lot of work that has been done there. I mean, just in the last couple of years, Postgres performance has increased by over 50%. Right? So, so transactions per second and that kind of scalability that is going to continue to be, to be a focus, right? And then the other one is leading the implementation of the SQL standards, right? So there'd be the most advanced database, the most innovative database, because, remember for many years now, Postgres has come up with a new release on an annual basis. Other database vendors are now catching up to that, but Postgres has done that for years. So innovation has always been at the heart of it. So we started with JSONB, Key value pair came even before that, PostGis has been around for a long time, graph extensions are going to be the next thing, ingestion of time series data is going to, is going to happen. So there's going to be an ongoing stream of innovations happening. But one thing that I can say is because Postgres is a pure open source project. There's not a hard roadmap, like where it's going to go but where it's going to go is always driven by what people want to have, right? There is no product management department. There's no, there's no great visionary that says, "Oh, this is where we're going to go." No, no. What's going to happen is what people want to have, right? If companies or contributors want to have a certain feature because they need it, well, that's how it's going to happen. And that's really been at the heart of this since Mike Stonebraker, who's an advisor to EDB today, invented it. And then, you know, the open source project got created. This has always been the movement to only focus on things that people actually want to have because if nobody wants to have it, we're just not going to build it because nobody wants it. Right? So when you asked me for the roadmap I believe it's going to be, you know, faster, obviously, always faster, right? Everybody wants faster. And then there's going to be innovation features like making the document stored even better, graph ingestion of large time series, et cetera. That's really what I believe is going to drive it forward. >> Wow. Yeah, the market has spoken and as you point out the market will continue to speak and, and drive that bus. So Mark, thank you for the time today. We certainly appreciate that. And wish EDB continued success at Postgres vision 2021. And thanks for the time. >> Thanks John, it was a pleasure. >> You bet. Mark Linster, joining us, the CTO at EDB. I'm John Walls, you've been watching theCUBE. (upbeat music)

Published Date : Jun 3 2021

SUMMARY :

brought to you by EDB. How are you doing today? data, which, as you know, and has shown that this is the fact that you have and then you have the policies technology that is, you know, the symbol and this elephant does dance. And, and, and from the and the same way you do I don't have to tell you what all the usual suspects that you would use And, and as you have pointed out now And that's really been at the heart And thanks for the time. You bet.

ENTITIES

Entity	Category	Confidence
Mike Stonebraker	PERSON	0.99+
John	PERSON	0.99+
Mark Linster	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Mark	PERSON	0.99+
Marc Linster	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
EDB	ORGANIZATION	0.99+
JSONB	TITLE	0.99+
over 50%	QUANTITY	0.99+
today	DATE	0.99+
JSON	TITLE	0.98+
over 90%	QUANTITY	0.98+
30,000-foot	QUANTITY	0.98+
five	DATE	0.98+
one way	QUANTITY	0.97+
first one	QUANTITY	0.97+
Kubernetes	TITLE	0.97+
one place	QUANTITY	0.97+
Linux	TITLE	0.96+
one thing	QUANTITY	0.96+
third factor	QUANTITY	0.95+
25 year old	QUANTITY	0.94+
SQL	TITLE	0.94+
JSON Path	TITLE	0.93+
Postgres Vision	ORGANIZATION	0.93+
30 years ago	DATE	0.91+
Arduino	ORGANIZATION	0.91+
Cal Berkeley	ORGANIZATION	0.9+
24 months	QUANTITY	0.89+
twenty-five year old	QUANTITY	0.89+
one	QUANTITY	0.89+
PostGIS	TITLE	0.89+
10 years ago	DATE	0.88+
first leader	QUANTITY	0.87+
Kubernetes	ORGANIZATION	0.86+
Chief Technology Officer	PERSON	0.85+
Vision 2021	EVENT	0.85+
months	DATE	0.81+
2021	DATE	0.81+
two software	QUANTITY	0.8+
number one	QUANTITY	0.8+
JSONB	ORGANIZATION	0.8+
CUBE	ORGANIZATION	0.79+
last couple of years	DATE	0.78+
two	QUANTITY	0.76+

Keynote Analysis | Virtual Vertica BDC 2020

(upbeat music) >> Narrator: It's theCUBE, covering the Virtual Vertica Big Data Conference 2020. Brought to you by Vertica. >> Dave Vellante: Hello everyone, and welcome to theCUBE's exclusive coverage of the Vertica Virtual Big Data Conference. You're watching theCUBE, the leader in digital event tech coverage. And we're broadcasting remotely from our studios in Palo Alto and Boston. And, we're pleased to be covering wall-to-wall this digital event. Now, as you know, originally BDC was scheduled this week at the new Encore Hotel and Casino in Boston. Their theme was "Win big with big data". Oh sorry, "Win big with data". That's right, got it. And, I know the community was really looking forward to that, you know, meet up. But look, we're making the best of it, given these uncertain times. We wish you and your families good health and safety. And this is the way that we're going to broadcast for the next several months. Now, we want to unpack Colin Mahony's keynote, but, before we do that, I want to give a little context on the market. First, theCUBE has covered every BDC since its inception, since the BDC's inception that is. It's a very intimate event, with a heavy emphasis on user content. Now, historically, the data engineers and DBAs in the Vertica community, they comprised the majority of the content at this event. And, that's going to be the same for this virtual, or digital, production. Now, theCUBE is going to be broadcasting for two days. What we're doing, is we're going to be concurrent with the Virtual BDC. We got practitioners that are coming on the show, DBAs, data engineers, database gurus, we got a security experts coming on, and really a great line up. And, of course, we'll also be hearing from Vertica Execs, Colin Mahony himself right of the keynote, folks from product marketing, partners, and a number of experts, including some from Micro Focus, which is the, of course, owner of Vertica. But I want to take a moment to share a little bit about the history of Vertica. The company, as you know, was founded by Michael Stonebraker. And, Verica started, really they started out as a SQL platform for analytics. It was the first, or at least one of the first, to really nail the MPP column store trend. Not only did Vertica have an early mover advantage in MPP, but the efficiency and scale of its software, relative to traditional DBMS, and also other MPP players, is underscored by the fact that Vertica, and the Vertica brand, really thrives to this day. But, I have to tell you, it wasn't without some pain. And, I'll talk a little bit about that, and really talk about how we got here today. So first, you know, you think about traditional transaction databases, like Oracle or IMBDB tour, or even enterprise data warehouse platforms like Teradata. They were simply not purpose-built for big data. Vertica was. Along with a whole bunch of other players, like Netezza, which was bought by IBM, Aster Data, which is now Teradata, Actian, ParAccel, which was the basis for Redshift, Amazon's Redshift, Greenplum was bought, in the early days, by EMC. And, these companies were really designed to run as massively parallel systems that smoked traditional RDBMS and EDW for particular analytic applications. You know, back in the big data days, I often joked that, like an NFL draft, there was run on MPP players, like when you see a run on polling guards. You know, once one goes, they all start to fall. And that's what you saw with the MPP columnar stores, IBM, EMC, and then HP getting into the game. So, it was like 2011, and Leo Apotheker, he was the new CEO of HP. Frankly, he has no clue, in my opinion, with what to do with Vertica, and totally missed one the biggest trends of the last decade, the data trend, the big data trend. HP picked up Vertica for a song, it wasn't disclosed, but my guess is that it was around 200 million. So, rather than build a bunch of smart tokens around Vertica, which I always call the diamond in the rough, Apotheker basically permanently altered HP for years. He kind of ruined HP, in my view, with a 12 billion dollar purchase of Autonomy, which turned out to be one of the biggest disasters in recent M&A history. HP was forced to spin merge, and ended up selling most of its software to Microsoft, Micro Focus. (laughs) Luckily, during its time at HP, CEO Meg Whitman, largely was distracted with what to do with the mess that she inherited form Apotheker. So, Vertica was left alone. Now, the upshot is Colin Mahony, who was then the GM of Vertica, and still is. By the way, he's really the CEO, and he just doesn't have the title, I actually think they should give that to him. But anyway, he's been at the helm the whole time. And Colin, as you'll see in our interview, is a rockstar, he's got technical and business jobs, people love him in the community. Vertica's culture is really engineering driven and they're all about data. Despite the fact that Vertica is a 15-year-old company, they've really kept pace, and not been polluted by legacy baggage. Vertica, early on, embraced Hadoop and the whole open-source movement. And that helped give it tailwinds. It leaned heavily into cloud, as we're going to talk about further this week. And they got a good story around machine intelligence and AI. So, whereas many traditional database players are really getting hurt, and some are getting killed, by cloud database providers, Vertica's actually doing a pretty good job of servicing its install base, and is in a reasonable position to compete for new workloads. On its last earnings call, the Micro Focus CFO, Stephen Murdoch, he said they're investing 70 to 80 million dollars in two key growth areas, security and Vertica. Now, Micro Focus is running its Suse play on these two parts of its business. What I mean by that, is they're investing and allowing them to be semi-autonomous, spending on R&D and go to market. And, they have no hardware agenda, unlike when Vertica was part of HP, or HPE, I guess HP, before the spin out. Now, let me come back to the big trend in the market today. And there's something going on around analytic databases in the cloud. You've got companies like Snowflake and AWS with Redshift, as we've reported numerous times, and they're doing quite well, they're gaining share, especially of new workloads that are merging, particularly in the cloud native space. They combine scalable compute, storage, and machine learning, and, importantly, they're allowing customers to scale, compute, and storage independent of each other. Why is that important? Because you don't have to buy storage every time you buy compute, or vice versa, in chunks. So, if you can scale them independently, you've got granularity. Vertica is keeping pace. In talking to customers, Vertica is leaning heavily into the cloud, supporting all the major cloud platforms, as we heard from Colin earlier today, adding Google. And, why my research shows that Vertica has some work to do in cloud and cloud native, to simplify the experience, it's more robust in motor stack, which supports many different environments, you know deep SQL, acid properties, and DNA that allows Vertica to compete with these cloud-native database suppliers. Now, Vertica might lose out in some of those native workloads. But, I have to say, my experience in talking with customers, if you're looking for a great MMP column store that scales and runs in the cloud, or on-prem, Vertica is in a very strong position. Vertica claims to be the only MPP columnar store to allow customers to scale, compute, and storage independently, both in the cloud and in hybrid environments on-prem, et cetera, cross clouds, as well. So, while Vertica may be at a disadvantage in a pure cloud native bake-off, it's more robust in motor stack, combined with its multi-cloud strategy, gives Vertica a compelling set of advantages. So, we heard a lot of this from Colin Mahony, who announced Vertica 10.0 in his keynote. He really emphasized Vertica's multi-cloud affinity, it's Eon Mode, which really allows that separation, or scaling of compute, independent of storage, both in the cloud and on-prem. Vertica 10, according to Mahony, is making big bets on in-database machine learning, he talked about that, AI, and along with some advanced regression techniques. He talked about PMML models, Python integration, which was actually something that they talked about doing with Uber and some other customers. Now, Mahony also stressed the trend toward object stores. And, Vertica now supports, let's see S3, with Eon, S3 Eon in Google Cloud, in addition to AWS, and then Pure and HDFS, as well, they all support Eon Mode. Mahony also stressed, as I mentioned earlier, a big commitment to on-prem and the whole cloud optionality thing. So 10.0, according to Colin Mahony, is all about really doubling down on these industry waves. As they say, enabling native PMML models, running them in Vertica, and really doing all the work that's required around ML and AI, they also announced support for TensorFlow. So, object store optionality is important, is what he talked about in Eon Mode, with the news of support for Google Cloud and, as well as HTFS. And finally, a big focus on deployment flexibility. Migration tools, which are a critical focus really on improving ease of use, and you hear this from a lot of customers. So, these are the critical aspects of Vertica 10.0, and an announcement that we're going to be unpacking all week, with some of the experts that I talked about. So, I'm going to close with this. My long-time co-host, John Furrier, and I have talked some time about this new cocktail of innovation. No longer is Moore's law the, really, mainspring of innovation. It's now about taking all these data troves, bringing machine learning and AI into that data to extract insights, and then operationalizing those insights at scale, leveraging cloud. And, one of the things I always look for from cloud is, if you've got a cloud play, you can attract innovation in the form of startups. It's part of the success equation, certainly for AWS, and I think it's one of the challenges for a lot of the legacy on-prem players. Vertica, I think, has done a pretty good job in this regard. And, you know, we're going to look this week for evidence of that innovation. One of the interviews that I'm personally excited about this week, is a new-ish company, I would consider them a startup, called Zebrium. What they're doing, is they're applying AI to do autonomous log monitoring for IT ops. And, I'm interviewing Larry Lancaster, who's their CEO, this week, and I'm going to press him on why he chose to run on Vertica and not a cloud database. This guy is a hardcore tech guru and I want to hear his opinion. Okay, so keep it right there, stay with us. We're all over the Vertica Virtual Big Data Conference, covering in-depth interviews and following all the news. So, theCUBE is going to be interviewing these folks, two days, wall-to-wall coverage, so keep it right there. We're going to be right back with our next guest, right after this short break. This is Dave Vellante and you're watching theCUBE. (upbeat music)

Published Date : Mar 31 2020

SUMMARY :

Brought to you by Vertica. and the Vertica brand, really thrives to this day.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Larry Lancaster	PERSON	0.99+
Colin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
70	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Michael Stonebraker	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Zebrium	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Verica	ORGANIZATION	0.99+
Micro Focus	ORGANIZATION	0.99+
2011	DATE	0.99+
HPE	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Mahony	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aster Data	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
First	QUANTITY	0.99+
12 billion dollar	QUANTITY	0.99+
One	QUANTITY	0.99+
this week	DATE	0.99+
John Furrier	PERSON	0.99+
15-year-old	QUANTITY	0.98+
Python	TITLE	0.98+
Oracle	ORGANIZATION	0.98+
olin Mahony	PERSON	0.98+
around 200 million	QUANTITY	0.98+
Virtual Vertica Big Data Conference 2020	EVENT	0.98+
theCUBE	ORGANIZATION	0.98+
80 million dollars	QUANTITY	0.97+
today	DATE	0.97+
two parts	QUANTITY	0.97+
Vertica Virtual Big Data Conference	EVENT	0.97+
Teradata	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Actian	ORGANIZATION	0.97+

UNLIST TILL 4/2 - Autonomous Log Monitoring

>> Sue: Hi everybody, thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled "Autonomous Monitoring Using Machine Learning". My name is Sue LeClaire, director of marketing at Vertica, and I'll be your host for this session. Joining me is Larry Lancaster, founder and CTO at Zebrium. Before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, just type your question or comment in the question box below the slide and click submit. There will be a Q&A session at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer them offline. Alternatively, you can also go and visit Vertica forums to post your questions after the session. Our engineering team is planning to join the forums to keep the conversation going. Also, just a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available for you to view on demand later this week. We'll send you a notification as soon as it's ready. So, let's get started. Larry, over to you. >> Larry: Hey, thanks so much. So hi, my name's Larry Lancaster and I'm here to talk to you today about something that I think who's time has come and that's autonomous monitoring. So, with that, let's get into it. So, machine data is my life. I know that's a sad life, but it's true. So I've spent most of my career kind of taking telemetry data from products, either in the field, we used to call it in the field or nowadays, that's been deployed, and bringing that data back, like log file stats, and then building stuff on top of it. So, tools to run the business or services to sell back to users and customers. And so, after doing that a few times, it kind of got to the point where I was really sort of sick of building the same kind of thing from scratch every time, so I figured, why not go start a company and do it so that we don't have to do it manually ever again. So, it's interesting to note, I've put a little sentence here saying, "companies where I got to use Vertica" So I've been actually kind of working with Vertica for a long time now, pretty much since they came out of alpha. And I've really been enjoying their technology ever since. So, our vision is basically that I want a system that will characterize incidents before I notice. So an incident is, you know, we used to call it a support case or a ticket in IT, or a support case in support. Nowadays, you may have a DevOps team, or a set of SREs who are monitoring a production sort of deployment. And so they'll call it an incident. So I'm looking for something that will notice and characterize an incident before I notice and have to go digging into log files and stats to figure out what happened. And so that's a pretty heady goal. And so I'm going to talk a little bit today about how we do that. So, if we look at logs in particular. Logs today, if you look at log monitoring. So monitoring is kind of that whole umbrella term that we use to talk about how we monitor systems in the field that we've shipped, or how we monitor production deployments in a more modern stack. And so basically there are log monitoring tools. But they have a number of drawbacks. For one thing, they're kind of slow in the sense that if something breaks and I need to go to a log file, actually chances are really good that if you have a new issue, if it's an unknown unknown problem, you're going to end up in a log file. So the problem then becomes basically you're searching around looking for what's the root cause of the incident, right? And so that's kind of time-consuming. So, they're also fragile and this is largely because log data is completely unstructured, right? So there's no formal grammar for a log file. So you have this situation where, if I write a parser today, and that parser is going to do something, it's going to execute some automation, it's going to open or update a ticket, it's going to maybe restart a service, or whatever it is that I want to happen. What'll happen is later upstream, someone who's writing the code that produces that log message, they might do something really useful for me, or for users. And they might go fix a spelling mistake in that log message. And then the next thing you know, all the automation breaks. So it's a very fragile source for automation. And finally, because of that, people will set alerts on, "Oh, well tell me how many thousands of errors are happening every hour." Or some horrible metric like that. And then that becomes the only visibility you have in the data. So because of all this, it's a very human-driven, slow, fragile process. So basically, we've set out to kind of up-level that a bit. So I touched on this already, right? The truth is if you do have an incident, you're going to end up in log files to do root cause. It's almost always the case. And so you have to wonder, if that's the case, why do most people use metrics only for monitoring? And the reason is related to the problems I just described. They're already structured, right? So for logs, you've got this mess of stuff, so you only want to dig in there when you absolutely have to. But ironically, it's where a lot of the information that you need actually is. So we have a model today, and this model used to work pretty well. And that model is called "index and search". And it basically means you treat log files like they're text documents. And so you index them and when there's some issue you have to drill into, then you go searching, right? So let's look at that model. So 20 years ago, we had sort of a shrink-wrap software delivery model. You had an incident. With that incident, maybe you had one customer and you had a monolithic application and a handful of log files. So it's perfectly natural, in fact, usually you could just v-item the log file, and search that way. Or if there's a lot of them, you could index them and search them that way. And that all worked very well because the developer or the support engineer had to be an expert in those few things, in those few log files, and understand what they meant. But today, everything has changed completely. So we live in a software as a service world. What that means is, for a given incident, first of all you're going to be affecting thousands of users. You're going to have, potentially, 100 services that are deployed in your environment. You're going to have 1,000 log streams to sift through. And yet, you're still kind of stuck in the situation where to go find out what's the matter, you're going to have to search through the log files. So this is kind of the unacceptable sort of position we're in today. So for us, the future will not be index and search. And that's simply because it cannot scale. And the reason I say that it can't scale is because it all kind of is bottlenecked by a person and their eyeball. So, you continue to drive up the amount of data that has to be sifted through, the complexity of the stack that has to be understood, and you still, at the end of the day, for MTTR purposes, you still have the same bottleneck, which is the eyeball. So this model, I believe, is fundamentally broken. And that's why, I believe in five years you're going to be in a situation where most monitoring of unknown unknown problems is going to be done autonomously. And those issues will be characterized autonomously because there's no other way it can happen. So now I'm going to talk a little bit about autonomous monitoring itself. So, autonomous monitoring basically means, if you can imagine in a monitoring platform and you watch the monitoring platform, maybe you watch the alerts coming from it or more importantly, you kind of watch the dashboards and try to see if something looks weird. So autonomous monitoring is the notion that the platform should do the watching for you and only let you know when something is going wrong and should kind of give you a window into what happened. So if you look at this example I have on screen, just to take it really slow and absorb the concept of autonomous monitoring. So here in this example, we've stopped the database. And as a result, down below you can see there were a bunch of fallout. This is an Atlassian Stack, so you can imagine you've got a Postgres database. And then you've got sort of Bitbucket, and Confluence, and Jira, and these various other components that need the database operating in order to function. So what this is doing is it's calling out, "Hey, the root cause is the database stopped and here's the symptoms." Now, you might be wondering, so what. I mean I could go write a script to do this sort of thing. Here's what's interesting about this very particular example, and I'll show a couple more examples that are a little more involved. But here's the interesting thing. So, in the software that came up with this incident and opened this incident and put this root cause and symptoms in there, there's no code that knows anything about timestamp formats, severities, Atlassian, Postgres, databases, Bitbucket, Confluence, there's no regexes that talk about starting, stopped, RDBMS, swallowed exception, and so on and so forth. So you might wonder how it's possible then, that something which is completely ignorant of the stack, could come up with this description, which is exactly what a human would have had to do, to figure out what happened. And I'm going to get into how we do that. But that's what autonomous monitoring is about. It's about getting into a set of telemetry from a stack with no prior information, and understanding when something breaks. And I could give you the punchline right now, which is there are fundamental ways that software behaves when it's breaking. And by looking at hundreds of data sets that people have generously allowed us to use containing incidents, we've been able to characterize that and now generalize it to apply it to any new data set and stack. So here's an interesting one right here. So there's a fella, David Gill, he's just a genius in the monitoring space. He's been working with us for the last couple of months. So he said, "You know what I'm going to do, is I'm going to run some chaos experiments." So for those of you who don't know what chaos engineering is, here's the idea. So basically, let's say I'm running a Kubernetes cluster and what I'll do is I'll use sort of a chaos injection test, something like litmus. And basically it will inject issues, it'll break things in my application randomly to see if my monitoring picks it up. And so this is what chaos engineering is built around. It's built around sort of generating lots of random problems and seeing how the stack responds. So in this particular case, David went in and he deleted, basically one of the tests that was presented through litmus did a delete of a pod delete. And so that's going to basically take out some containers that are part of the service layer. And so then you'll see all kinds of things break. And so what you're seeing here, which is interesting, this is why I like to use this example. Because it's actually kind of eye-opening. So the chaos tool itself generates logs. And of course, through Kubernetes, all the log files locations that are on the host, and the container logs are known. And those are all pulled back to us automatically. So one of the log files we have is actually the chaos tool that's doing the breaking, right? And so what the tool said here, when it went to determine what the root cause was, was it noticed that there was this process that had these messages happen, initializing deletion lists, selection a pod to kill, blah blah blah. It's saying that the root cause is the chaos test. And it's absolutely right, that is the root cause. But usually chaos tests don't get picked up themselves. You're supposed to be just kind of picking up the symptoms. But this is what happens when you're able to kind of tease out root cause from symptoms autonomously, is you end up getting a much more meaningful answer, right? So here's another example. So essentially, we collect the log files, but we also have a Prometheus scraper. So if you export Prometheus metrics, we'll scrape those and we'll collect those as well. And so we'll use those for our autonomous monitoring as well. So what you're seeing here is an issue where, I believe this is where we ran something out of disk space. So it opened an incident, but what's also interesting here is, you see that it pulled that metric to say that the spike in this metric was a symptom of this running out of space. So again, there's nothing that knows anything about file system usage, memory, CPU, any of that stuff. There's no actual hard-coded logic anywhere to explain any of this. And so the concept of autonomous monitoring is looking at a stack the way a human being would. If you can imagine how you would walk in and monitor something, how you would think about it. You'd go looking around for rare things. Things that are not normal. And you would look for indicators of breakage, and you would see, do those seem to be correlated in some dimension? That is how the system works. So as I mentioned a moment ago, metrics really do kind of complete the picture for us. We end up in a situation where we have a one-stop shop for incident root cause. So, how does that work? Well, we ingest and we structure the log files. So if we're getting the logs, we'll ingest them and we'll structure them, and I'm going to show a little bit what that structure looks like and how that goes into the database in a moment. And then of course we ingest and structure the Prometheus metrics. But here, structure really should have an asterisk next to it, because metrics are mostly structured already. They have names. If you have your own scraper, as opposed to going into the time series Prometheus database and pulling metrics from there, you can keep a lot more information about metadata about those metrics from the exporter's perspective. So we keep all of that too. Then we do our anomaly detection on both of those sets of data. And then we cross-correlate metrics and log anomalies. And then we create incidents. So this is at a high level, kind of what's happening without any sort of stack-specific logic built in. So we had some exciting recent validation. So Mayadata's a pretty big player in the Kubernetes space. Essentially, they do Kubernetes as a managed service. They have tens of thousands of customers that they manage their Kubernetes clusters for them. And then they're also involved, both in the OpenEBS project, as well as in the Litmius project I mentioned a moment ago. That's their tool for chaos engineering. So they're a pretty big player in the Kubernetes space. So essentially, they said, "Oh okay, let's see if this is real." So what they did was they set up our collectors, which took three minutes in Kubernetes. And then they went and they, using Litmus, they reproduced eight incidents that their actual, real-world customers had hit. And they were trying to remember the ones that were the hardest to figure out the root cause at the time. And we picked up and put a root cause indicator that was correct in 100% of these incidents with no training configuration or metadata required. So this is kind of what autonomous monitoring is all about. So now I'm going to talk a little bit about how it works. So, like I said, there's no information included or required about, so if you imagine a log file for example. Now, commonly, over to the left-hand side of every line, there will be some sort of a prefix. And what I mean by that is you'll see like a timestamp, or a severity, and maybe there's a PID, and maybe there's function name, and maybe there's some other stuff there. So basically that's kind of, it's common data elements for a large portion of the lines in a given log file. But you know, of course, the contents change. So basically today, like if you look at a typical log manager, they'll talk about connectors. And what connectors means is, for an application it'll generate a certain prefix format in a log. And that means what's the format of the timestamp, and what else is in the prefix. And this lets the tool pick it up. And so if you have an app that doesn't have a connector, you're out of luck. Well, what we do is we learn those prefixes dynamically with machine learning. You do not have to have a connector, right? And what that means is that if you come in with your own application, the system will just work for it from day one. You don't have to have connectors, you don't have to describe the prefix format. That's so yesterday, right? So really what we want to be doing is up-leveling what the system is doing to the point where it's kind of working like a human would. You look at a log line, you know what's a timestamp. You know what's a PID. You know what's a function name. You know where the prefix ends and where the variable parts begin. You know what's a parameter over there in the variable parts. And sometimes you may need to see a couple examples to know what was a variable, but you'll figure it out as quickly as possible, and that's exactly how the system goes about it. As a result, we kind of embrace free-text logs, right? So if you look at a typical stack, most of the logs generated in a typical stack are usually free-text. Even structured logging typically will have a message attribute, which then inside of it has the free-text message. For us, that's not a bad thing. That's okay. The purpose of a log is to inform people. And so there's no need to go rewrite the whole logging stack just because you want a machine to handle it. They'll figure it out for themselves, right? So, you give us the logs and we'll figure out the grammar, not only for the prefix but also for the variable message part. So I already went into this, but there's more that's usually required for configuring a log manager with alerts. You have to give it keywords. You have to give it application behaviors. You have to tell it some prior knowledge. And of course the problem with all of that is that the most important events that you'll ever see in a log file are the rarest. Those are the ones that are one out of a billion. And so you may not know what's going to be the right keyword in advance to pick up the next breakage, right? So we don't want that information from you. We'll figure that out for ourselves. As the data comes in, essentially we parse it and we categorize it, as I've mentioned. And when I say categorize, what I mean is, if you look at a certain given log file, you'll notice that some of the lines are kind of the same thing. So this one will say "X happened five times" and then maybe a few lines below it'll say "X happened six times" but that's basically the same event type. It's just a different instance of that event type. And it has a different value for one of the parameters, right? So when I say categorization, what I mean is figuring out those unique types and I'll show an example of that next. Anomaly detection, we do on top of that. So anomaly detection on metrics in a very sort of time series by time series manner with lots of tunables is a well-understood problem. So we also do this on the event types occurrences. So you can think of each event type occurring in time as sort of a point process. And then you can develop statistics and distributions on that, and you can do anomaly detection on those. Once we have all of that, we have extracted features, essentially, from metrics and from logs. We do pattern recognition on the correlations across different channels of information, so different event types, different log types, different hoses, different containers, and then of course across to the metrics. Based on all of this cross-correlation, we end up with a root cause identification. So that's essentially, at a high level, how it works. What's interesting, from the perspective of this call particularly, is that incident detection needs relationally structured data. It really does. You need to have all the instances of a certain event type that you've ever seen easily accessible. You need to have the values for a given sort of parameter easily, quickly available so you can figure out what's the distribution of this over time, how often does this event type happen. You can run analytical queries against that information so that you can quickly, in real-time, do anomaly detection against new data. So here's an example of that this looks like. And this kind of part of the work that we've done. At the top you see some examples of log lines, right? So that's kind of a snippet, it's three lines out of a log file. And you see one in the middle there that's kind of highlighted with colors, right? I mean, it's a little messy, but it's not atypical of the log file that you'll see pretty much anywhere. So there, you've got a timestamp, and a severity, and a function name. And then you've got some other information. And then finally, you have the variable part. And that's going to have sort of this checkpoint for memory scrubbers, probably something that's written in English, just so that the person who's reading the log file can understand. And then there's some parameters that are put in, right? So now, if you look at how we structure that, the way it looks is there's going to be three tables that correspond to the three event types that we see above. And so we're going to look at the one that corresponds to the one in the middle. So if we look at that table, there you'll see a table with columns, one for severity, for function name, for time zone, and so on. And date, and PID. And then you see over to the right with the colored columns there's the parameters that were pulled out from the variable part of that message. And so they're put in, they're typed and they're in integer columns. So this is the way structuring needs to work with logs to be able to do efficient and effective anomaly detection. And as far as I know, we're the first people to do this inline. All right, so let's talk now about Vertica and why we take those tables and put them in Vertica. So Vertica really is an MPP column store, but it's more than that, because nowadays when you say "column store", people sort of think, like, for example Cassandra's a column store, whatever, but it's not. Cassandra's not a column store in the sense that Vertica is. So Vertica was kind of built from the ground up to be... So it's the original column store. So back in the cStor project at Berkeley that Stonebraker was involved in, he said let's explore what kind of efficiencies we can get out of a real columnar database. And what he found was that, he and his grad students that started Vertica. What they found was that what they can do is they could build a database that gives orders of magnitude better query performance for the kinds of analytics I'm talking about here today. With orders of magnitude less data storage underneath. So building on top of machine data, as I mentioned, is hard, because it doesn't have any defined schemas. But we can use an RDBMS like Vertica once we've structured the data to do the analytics that we need to do. So I talked a little bit about this, but if you think about machine data in general, it's perfectly suited for a columnar store. Because, if you imagine laying out sort of all the attributes of an event type, right? So you can imagine that each occurrence is going to have- So there may be, say, three or four function names that are going to occur for all the instances of a given event type. And so if you were to sort all of those event instances by function name, what you would find is that you have sort of long, million long runs of the same function name over and over. So what you have, in general, in machine data, is lots and lots of slowly varying attributes, lots of low-cardinality data that it's almost completely compressed out when you use a real column store. So you end up with a massive footprint reduction on disk. And it also, that propagates through the analytical pipeline. Because Vertica does late materialization, which means it tries to carry that data through memory with that same efficiency, right? So the scale-out architecture, of course, is really suitable for petascale workloads. Also, I should point out, I was going to mention it in another slide or two, but we use the Vertica Eon architecture, and we have had no problems scaling that in the cloud. It's a beautiful sort of rewrite of the entire data layer of Vertica. The performance and flexibility of Eon is just unbelievable. And so I've really been enjoying using it. I was skeptical, you could get a real column store to run in the cloud effectively, but I was completely wrong. So finally, I should mention that if you look at column stores, to me, Vertica is the one that has the full SQL support, it has the ODBC drivers, it has the ACID compliance. Which means I don't need to worry about these things as an application developer. So I'm laying out the reasons that I like to use Vertica. So I touched on this already, but essentially what's amazing is that Vertica Eon is basically using S3 as an object store. And of course, there are other offerings, like the one that Vertica does with pure storage that doesn't use S3. But what I find amazing is how well the system performs using S3 as an object store, and how they manage to keep an actual consistent database. And they do. We've had issues where we've gone and shut down hosts, or hosts have been shut down on us, and we have to restart the database and we don't have any consistency issues. It's unbelievable, the work that they've done. Essentially, another thing that's great about the way it works is you can use the S3 as a shared object store. You can have query nodes kind of querying from that set of files largely independently of the nodes that are writing to them. So you avoid this sort of bottleneck issue where you've got contention over who's writing what, and who's reading what, and so on. So I've found the performance using separate subclusters for our UI and for the ingest has been amazing. Another couple of things that they have is they have a lot of in-database machine learning libraries. There's actually some cool stuff on their GitHub that we've used. One thing that we make a lot of use of is the sequence and time series analytics. For example, in our product, even though we do all of this stuff autonomously, you can also go create alerts for yourself. And one of the kinds of alerts you can do, you can say, "Okay, if this kind of event happens within so much time, and then this kind of an event happens, but not this one," Then you can be alerted. So you can have these kind of sequences that you define of events that would indicate a problem. And we use their sequence analytics for that. So it kind of gives you really good performance on some of these queries where you're wanting to pull out sequences of events from a fact table. And timeseries analytics is really useful if you want to do analytics on the metrics and you want to do gap filling interpolation on that. It's actually really fast in performance. And it's easy to use through SQL. So those are a couple of Vertica extensions that we use. So finally, I would like to encourage everybody, hey, come try us out. Should be up and running in a few minutes if you're using Kubernetes. If not, it's however long it takes you to run an installer. So you can just come to our website, pick it up and try out autonomous monitoring. And I want to thank everybody for your time. And we can open it up for Q and A.

Published Date : Mar 30 2020

SUMMARY :

Also, just a reminder that you can maximize your screen And one of the kinds of alerts you can do, you can say,

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Larry Lancaster	PERSON	0.99+
David Gill	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Sue LeClaire	PERSON	0.99+
five times	QUANTITY	0.99+
Larry	PERSON	0.99+
S3	TITLE	0.99+
three minutes	QUANTITY	0.99+
six times	QUANTITY	0.99+
Sue	PERSON	0.99+
100 services	QUANTITY	0.99+
Zebrium	ORGANIZATION	0.99+
today	DATE	0.99+
three	QUANTITY	0.99+
five years	QUANTITY	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
both	QUANTITY	0.99+
Kubernetes	TITLE	0.99+
one	QUANTITY	0.99+
thousands	QUANTITY	0.99+
two	QUANTITY	0.99+
SQL	TITLE	0.99+
one customer	QUANTITY	0.98+
three lines	QUANTITY	0.98+
three tables	QUANTITY	0.98+
each event	QUANTITY	0.98+
hundreds	QUANTITY	0.98+
first people	QUANTITY	0.98+
1,000 log streams	QUANTITY	0.98+
20 years ago	DATE	0.98+
eight incidents	QUANTITY	0.98+
tens of thousands of customers	QUANTITY	0.97+
later this week	DATE	0.97+
thousands of users	QUANTITY	0.97+
Stonebraker	ORGANIZATION	0.96+
each occurrence	QUANTITY	0.96+
Postgres	ORGANIZATION	0.96+
One thing	QUANTITY	0.95+
three event types	QUANTITY	0.94+
million	QUANTITY	0.94+
Vertica	TITLE	0.94+
one thing	QUANTITY	0.93+
4/2	DATE	0.92+
English	OTHER	0.92+
four function names	QUANTITY	0.86+
day one	QUANTITY	0.84+
Prometheus	TITLE	0.83+
one-stop	QUANTITY	0.82+
Berkeley	LOCATION	0.82+
Confluence	ORGANIZATION	0.79+
double arrow	QUANTITY	0.79+
last couple of months	DATE	0.79+
one of	QUANTITY	0.76+
cStor	ORGANIZATION	0.75+
a billion	QUANTITY	0.73+
Atlassian Stack	ORGANIZATION	0.72+
Eon	ORGANIZATION	0.71+
Bitbucket	ORGANIZATION	0.68+
couple more examples	QUANTITY	0.68+
Litmus	TITLE	0.65+

UNLIST TILL 4/2 - A Technical Overview of Vertica Architecture

>> Paige: Hello, everybody and thank you for joining us today on the Virtual Vertica BDC 2020. Today's breakout session is entitled A Technical Overview of the Vertica Architecture. I'm Paige Roberts, Open Source Relations Manager at Vertica and I'll be your host for this webinar. Now joining me is Ryan Role-kuh? Did I say that right? (laughs) He's a Vertica Senior Software Engineer. >> Ryan: So it's Roelke. (laughs) >> Paige: Roelke, okay, I got it, all right. Ryan Roelke. And before we begin, I want to be sure and encourage you guys to submit your questions or your comments during the virtual session while Ryan is talking as you think of them as you go along. You don't have to wait to the end, just type in your question or your comment in the question box below the slides and click submit. There'll be a Q and A at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to get back to you offline. Now, alternatively, you can visit the Vertica forums to post your question there after the session as well. Our engineering team is planning to join the forums to keep the conversation going, so you can have a chat afterwards with the engineer, just like any other conference. Now also, you can maximize your screen by clicking the double arrow button in the lower right corner of the slides and before you ask, yes, this virtual session is being recorded and it will be available to view on demand this week. We'll send you a notification as soon as it's ready. Now, let's get started. Over to you, Ryan. >> Ryan: Thanks, Paige. Good afternoon, everybody. My name is Ryan and I'm a Senior Software Engineer on Vertica's Development Team. I primarily work on improving Vertica's query execution engine, so usually in the space of making things faster. Today, I'm here to talk about something that's more general than that, so we're going to go through a technical overview of the Vertica architecture. So the intent of this talk, essentially, is to just explain some of the basic aspects of how Vertica works and what makes it such a great database software and to explain what makes a query execute so fast in Vertica, we'll provide some background to explain why other databases don't keep up. And we'll use that as a starting point to discuss an academic database that paved the way for Vertica. And then we'll explain how Vertica design builds upon that academic database to be the great software that it is today. I want to start by sharing somebody's approximation of an internet minute at some point in 2019. All of the data on this slide is generated by thousands or even millions of users and that's a huge amount of activity. Most of the applications depicted here are backed by one or more databases. Most of this activity will eventually result in changes to those databases. For the most part, we can categorize the way these databases are used into one of two paradigms. First up, we have online transaction processing or OLTP. OLTP workloads usually operate on single entries in a database, so an update to a retail inventory or a change in a bank account balance are both great examples of OLTP operations. Updates to these data sets must be visible immediately and there could be many transactions occurring concurrently from many different users. OLTP queries are usually key value queries. The key uniquely identifies the single entry in a database for reading or writing. Early databases and applications were probably designed for OLTP workloads. This example on the slide is typical of an OLTP workload. We have a table, accounts, such as for a bank, which tracks information for each of the bank's clients. An update query, like the one depicted here, might be run whenever a user deposits $10 into their bank account. Our second category is online analytical processing or OLAP which is more about using your data for decision making. If you have a hardware device which periodically records how it's doing, you could analyze trends of all your devices over time to observe what data patterns are likely to lead to failure or if you're Google, you might log user search activity to identify which links helped your users find the answer. Analytical processing has always been around but with the advent of the internet, it happened at scales that were unimaginable, even just 20 years ago. This SQL example is something you might see in an OLAP workload. We have a table, searches, logging user activity. We will eventually see one row in this table for each query submitted by users. If we want to find out what time of day our users are most active, then we could write a query like this one on the slide which counts the number of unique users running searches for each hour of the day. So now let's rewind to 2005. We don't have a picture of an internet minute in 2005, we don't have the data for that. We also don't have the data for a lot of other things. The term Big Data is not quite yet on anyone's radar and The Cloud is also not quite there or it's just starting to be. So if you have a database serving your application, it's probably optimized for OLTP workloads. OLAP workloads just aren't mainstream yet and database engineers probably don't have them in mind. So let's innovate. It's still 2005 and we want to try something new with our database. Let's take a look at what happens when we do run an analytic workload in 2005. Let's use as a motivating example a table of stock prices over time. In our table, the symbol column identifies the stock that was traded, the price column identifies the new price and the timestamp column indicates when the price changed. We have several other columns which, we should know that they're there, but we're not going to use them in any example queries. This table is designed for analytic queries. We're probably not going to make any updates or look at individual rows since we're logging historical data and want to analyze changes in stock price over time. Our database system is built to serve OLTP use cases, so it's probably going to store the table on disk in a single file like this one. Notice that each row contains all of the columns of our data in row major order. There's probably an index somewhere in the memory of the system which will help us to point lookups. Maybe our system expects that we will use the stock symbol and the trade time as lookup keys. So an index will provide quick lookups for those columns to the position of the whole row in the file. If we did have an update to a single row, then this representation would work great. We would seek to the row that we're interested in, finding it would probably be very fast using the in-memory index. And then we would update the file in place with our new value. On the other hand, if we ran an analytic query like we want to, the data access pattern is very different. The index is not helpful because we're looking up a whole range of rows, not just a single row. As a result, the only way to find the rows that we actually need for this query is to scan the entire file. We're going to end up scanning a lot of data that we don't need and that won't just be the rows that we don't need, there's many other columns in this table. Many information about who made the transaction, and we'll also be scanning through those columns for every single row in this table. That could be a very serious problem once we consider the scale of this file. Stocks change a lot, we probably have thousands or millions or maybe even billions of rows that are going to be stored in this file and we're going to scan all of these extra columns for every single row. If we tried out our stocks use case behind the desk for the Fortune 500 company, then we're probably going to be pretty disappointed. Our queries will eventually finish, but it might take so long that we don't even care about the answer anymore by the time that they do. Our database is not built for the task we want to use it for. Around the same time, a team of researchers in the North East have become aware of this problem and they decided to dedicate their time and research to it. These researchers weren't just anybody. The fruits of their labor, which we now like to call the C-Store Paper, was published by eventual Turing Award winner, Mike Stonebraker, along with several other researchers from elite universities. This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. That sounds exactly like what we want for our stocks use case. Reasoning about what makes our queries executions so slow brought our researchers to the Memory Hierarchy, which essentially is a visualization of the relative speeds of different parts of a computer. At the top of the hierarchy, we have the fastest data units, which are, of course, also the most expensive to produce. As we move down the hierarchy, components get slower but also much cheaper and thus you can have more of them. Our OLTP databases data is stored in a file on the hard disk. We scanned the entirety of this file, even though we didn't need most of the data and now it turns out, that is just about the slowest thing that our query could possibly be doing by over two orders of magnitude. It should be clear, based on that, that the best thing we can do to optimize our query's execution is to avoid reading unnecessary data from the disk and that's what the C-Store researchers decided to look at. The key innovation of the C-Store paper does exactly that. Instead of storing data in a row major order, in a large file on disk, they transposed the data and stored each column in its own file. Now, if we run the same select query, we read only the relevant columns. The unnamed columns don't factor into the table scan at all since we don't even open the files. Zooming out to an internet scale sized data set, we can appreciate the savings here a lot more. But we still have to read a lot of data that we don't need to answer this particular query. Remember, we had two predicates, one on the symbol column and one on the timestamp column. Our query is only interested in AAPL stock, but we're still reading rows for all of the other stocks. So what can we do to optimize our disk read even more? Let's first partition our data set into different files based on the timestamp date. This means that we will keep separate files for each date. When we query the stocks table, the database knows all of the files we have to open. If we have a simple predicate on the timestamp column, as our sample query does, then the database can use it to figure out which files we don't have to look at at all. So now all of our disk reads that we have to do to answer our query will produce rows that pass the timestamp predicate. This eliminates a lot of wasteful disk reads. But not all of them. We do have another predicate on the symbol column where symbol equals AAPL. We'd like to avoid disk reads of rows that don't satisfy that predicate either. And we can avoid those disk reads by clustering all the rows that match the symbol predicate together. If all of the AAPL rows are adjacent, then as soon as we see something different, we can stop reading the file. We won't see any more rows that can pass the predicate. Then we can use the positions of the rows we did find to identify which pieces of the other columns we need to read. One technique that we can use to cluster the rows is sorting. So we'll use the symbol column as a sort key for all of the columns. And that way we can reconstruct a whole row by seeking to the same row position in each file. It turns out, having sorted all of the rows, we can do a bit more. We don't have any more wasted disk reads but we can still be more efficient with how we're using the disk. We've clustered all of the rows with the same symbol together so we don't really need to bother repeating the symbol so many times in the same file. Let's just write the value once and say how many rows we have. This one length encoding technique can compress large numbers of rows into a small amount of space. In this example, we do de-duplicate just a few rows but you can imagine de-duplicating many thousands of rows instead. This encoding is great for reducing the amounts of disk we need to read at query time, but it also has the additional benefit of reducing the total size of our stored data. Now our query requires substantially fewer disk reads than it did when we started. Let's recap what the C-Store paper did to achieve that. First, we transposed our data to store each column in its own file. Now, queries only have to read the columns used in the query. Second, we partitioned the data into multiple file sets so that all rows in a file have the same value for the partition column. Now, a predicate on the partition column can skip non-matching file sets entirely. Third, we selected a column of our data to use as a sort key. Now rows with the same value for that column are clustered together, which allows our query to stop reading data once it finds non-matching rows. Finally, sorting the data this way enables high compression ratios, using one length encoding which minimizes the size of the data stored on the disk. The C-Store system combined each of these innovative ideas to produce an academically significant result. And if you used it behind the desk of a Fortune 500 company in 2005, you probably would've been pretty pleased. But it's not 2005 anymore and the requirements of a modern database system are much stricter. So let's take a look at how C-Store fairs in 2020. First of all, we have designed the storage layer of our database to optimize a single query in a single application. Our design optimizes the heck out of that query and probably some similar ones but if we want to do anything else with our data, we might be in a bit of trouble. What if we just decide we want to ask a different question? For example, in our stock example, what if we want to plot all the trade made by a single user over a large window of time? How do our optimizations for the previous query measure up here? Well, our data's partitioned on the trade date, that could still be useful, depending on our new query. If we want to look at a trader's activity over a long period of time, we would have to open a lot of files. But if we're still interested in just a day's worth of data, then this optimization is still an optimization. Within each file, our data is ordered on the stock symbol. That's probably not too useful anymore, the rows for a single trader aren't going to be clustered together so we will have to scan all of the rows in order to figure out which ones match. You could imagine a worse design but as it becomes crucial to optimize this new type of query, then we might have to go as far as reconfiguring the whole database. The next problem of one of scale. One server is probably not good enough to serve a database in 2020. C-Store, as described, runs on a single server and stores lots of files. What if the data overwhelms this small system? We could imagine exhausting the file system's inodes limit with lots of small files due to our partitioning scheme. Or we could imagine something simpler, just filling up the disk with huge volumes of data. But there's an even simpler problem than that. What if something goes wrong and C-Store crashes? Then our data is no longer available to us until the single server is brought back up. A third concern, another one of scalability, is that one deployment does not really suit all possible things and use cases we could imagine. We haven't really said anything about being flexible. A contemporary database system has to integrate with many other applications, which might themselves have pretty restricted deployment options. Or the demands imposed by our workloads have changed and the setup you had before doesn't suit what you need now. C-Store doesn't do anything to address these concerns. What the C-Store paper did do was lead very quickly to the founding of Vertica. Vertica's architecture and design are essentially all about bringing the C-Store designs into an enterprise software system. The C-Store paper was just an academic exercise so it didn't really need to address any of the hard problems that we just talked about. But Vertica, the first commercial database built upon the ideas of the C-Store paper would definitely have to. This brings us back to the present to look at how an analytic query runs in 2020 on the Vertica Analytic Database. Vertica takes the key idea from the paper, can we significantly improve query performance by changing the way our data is stored and give its users the tools to customize their storage layer in order to heavily optimize really important or commonly wrong queries. On top of that, Vertica is a distributed system which allows it to scale up to internet-sized data sets, as well as have better reliability and uptime. We'll now take a brief look at what Vertica does to address the three inadequacies of the C-Store system that we mentioned. To avoid locking into a single database design, Vertica provides tools for the database user to customize the way their data is stored. To address the shortcomings of a single node system, Vertica coordinates processing among multiple nodes. To acknowledge the large variety of desirable deployments, Vertica does not require any specialized hardware and has many features which smoothly integrate it with a Cloud computing environment. First, we'll look at the database design problem. We're a SQL database, so our users are writing SQL and describing their data in SQL way, the Create Table statement. Create Table is a logical description of what your data looks like but it doesn't specify the way that it has to be stored, For a single Create Table, we could imagine a lot of different storage layouts. Vertica adds some extensions to SQL so that users can go even further than Create Table and describe the way that they want the data to be stored. Using terminology from the C-Store paper, we provide the Create Projection statement. Create Projection specifies how table data should be laid out, including column encoding and sort order. A table can have multiple projections, each of which could be ordered on different columns. When you query a table, Vertica will answer the query using the projection which it determines to be the best match. Referring back to our stock example, here's a sample Create Table and Create Projection statement. Let's focus on our heavily optimized example query, which had predicates on the stock symbol and date. We specify that the table data is to be partitioned by date. The Create Projection Statement here is excellent for this query. We specify using the order by clause that the data should be ordered according to our predicates. We'll use the timestamp as a secondary sort key. Each projection stores a copy of the table data. If you don't expect to need a particular column in a projection, then you can leave it out. Our average price query didn't care about who did the trading, so maybe our projection design for this query can leave the trader column out entirely. If the question we want to ask ever does change, maybe we already have a suitable projection, but if we don't, then we can create another one. This example shows another projection which would be much better at identifying trends of traders, rather than identifying trends for a particular stock. Next, let's take a look at our second problem, that one, or excuse me, so how should you decide what design is best for your queries? Well, you could spend a lot of time figuring it out on your own, or you could use Vertica's Database Designer tool which will help you by automatically analyzing your queries and spitting out a design which it thinks is going to work really well. If you want to learn more about the Database Designer Tool, then you should attend the session Vertica Database Designer- Today and Tomorrow which will tell you a lot about what the Database Designer does and some recent improvements that we have made. Okay, now we'll move to our next problem. (laughs) The challenge that one server does not fit all. In 2020, we have several orders of magnitude more data than we had in 2005. And you need a lot more hardware to crunch it. It's not tractable to keep multiple petabytes of data in a system with a single server. So Vertica doesn't try. Vertica is a distributed system so will deploy multiple severs which work together to maintain such a high data volume. In a traditional Vertica deployment, each node keeps some of the data in its own locally-attached storage. Data is replicated so that there is a redundant copy somewhere else in the system. If any one node goes down, then the data that it served is still available on a different node. We'll also have it so that in the system, there's no special node with extra duties. All nodes are created equal. This ensures that there is no single point of failure. Rather than replicate all of your data, Vertica divvies it up amongst all of the nodes in your system. We call this segmentation. The way data is segmented is another parameter of storage customization and it can definitely have an impact upon query performance. A common way to segment data is by using a hash expression, which essentially randomizes the node that a row of data belongs to. But with a guarantee that the same data will always end up in the same place. Describing the way data is segmented is another part of the Create Projection Statement, as seen in this example. Here we segment on the hash of the symbol column so all rows with the same symbol will end up on the same node. For each row that we load into the system, we'll apply our segmentation expression. The result determines which segment the row belongs to and then we'll send the row to each node which holds the copy of that segment. In this example, our projection is marked KSAFE 1, so we will keep one redundant copy of each segment. When we load a row, we might find that its segment had copied on Node One and Node Three, so we'll send a copy of the row to each of those nodes. If Node One is temporarily disconnected from the network, then Node Three can serve the other copy of the segment so that the whole system remains available. The last challenge we brought up from the C-Store design was that one deployment does not fit all. Vertica's cluster design neatly addressed many of our concerns here. Our use of segmentation to distribute data means that a Vertica system can scale to any size of deployment. And since we lack any special hardware or nodes with special purposes, Vertica servers can run anywhere, on premise or in the Cloud. But let's suppose you need to scale out your cluster to rise to the demands of a higher workload. Suppose you want to add another node. This changes the division of the segmentation space. We'll have to re-segment every row in the database to find its new home and then we'll have to move around any data that belongs to a different segment. This is a very expensive operation, not something you want to be doing all that often. Traditional Vertica doesn't solve that problem especially well, but Vertica Eon Mode definitely does. Vertica's Eon Mode is a large set of features which are designed with a Cloud computing environment in mind. One feature of this design is elastic throughput scaling, which is the idea that you can smoothly change your cluster size without having to pay the expenses of shuffling your entire database. Vertica Eon Mode had an entire session dedicated to it this morning. I won't say any more about it here, but maybe you already attended that session or if you haven't, then I definitely encourage you to listen to the recording. If you'd like to learn more about the Vertica architecture, then you'll find on this slide links to several of the academic conference publications. These four papers here, as well as Vertica Seven Years Later paper which describes some of the Vertica designs seven years after the founding and also a paper about the innovations of Eon Mode and of course, the Vertica documentation is an excellent resource for learning more about what's going on in a Vertica system. I hope you enjoyed learning about the Vertica architecture. I would be very happy to take all of your questions now. Thank you for attending this session.

Published Date : Mar 30 2020

SUMMARY :

A Technical Overview of the Vertica Architecture. Ryan: So it's Roelke. in the question box below the slides and click submit. that the best thing we can do

ENTITIES

Entity	Category	Confidence
Ryan	PERSON	0.99+
Mike Stonebraker	PERSON	0.99+
Ryan Roelke	PERSON	0.99+
2005	DATE	0.99+
2020	DATE	0.99+
thousands	QUANTITY	0.99+
2019	DATE	0.99+
$10	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Paige	PERSON	0.99+
Node Three	TITLE	0.99+
Today	DATE	0.99+
First	QUANTITY	0.99+
each file	QUANTITY	0.99+
Roelke	PERSON	0.99+
each row	QUANTITY	0.99+
Node One	TITLE	0.99+
millions	QUANTITY	0.99+
each hour	QUANTITY	0.99+
each	QUANTITY	0.99+
Second	QUANTITY	0.99+
second category	QUANTITY	0.99+
each column	QUANTITY	0.99+
One technique	QUANTITY	0.99+
one	QUANTITY	0.99+
two predicates	QUANTITY	0.99+
each node	QUANTITY	0.99+
One server	QUANTITY	0.99+
SQL	TITLE	0.99+
C-Store	TITLE	0.99+
second problem	QUANTITY	0.99+
Ryan Role	PERSON	0.99+
Third	QUANTITY	0.99+
North East	LOCATION	0.99+
each segment	QUANTITY	0.99+
today	DATE	0.98+
single entry	QUANTITY	0.98+
each date	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
one row	QUANTITY	0.98+
one server	QUANTITY	0.98+
single server	QUANTITY	0.98+
single entries	QUANTITY	0.98+
both	QUANTITY	0.98+
20 years ago	DATE	0.98+
two paradigms	QUANTITY	0.97+
a day	QUANTITY	0.97+
this week	DATE	0.97+
billions of rows	QUANTITY	0.97+
Vertica	TITLE	0.97+
4/2	DATE	0.97+
single application	QUANTITY	0.97+
each query	QUANTITY	0.97+
Each projection	QUANTITY	0.97+

Jeff Healey, Vertica at Micro Focus | CUBEConversations, March 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with top leaders all around the world, this is theCUBE Conversation. >> Hi everybody, I'm Dave Vellante, and welcome to the Vertica Big Data Conference virtual. This is our digital presentation, wall to wall coverage actually, of the Vertica Big Data Conference. And with me is Jeff Healy, who directs product marketing at Vertica. Jeff, good to see you. >> Good to see you, Dave. Thanks for the opportunity to chat. >> You're very welcome Now I'm excited about the products that you guys announced and you're hardcore into product marketing, but we're going to talk about the Vertica Big Data Conference. It's been a while since you guys had this. Obviously, new owner, new company, some changes, but that new company Microfocus has announced that it's investing, I think the number was $70 million into two areas. One was security and the other, of course, was Vertica. So we're really excited to be back at the virtual Big Data Conference. And let's hear it from you, what are your thoughts? >> Yeah, Dave, thanks. And we love having theCUBE at all of these events. We're thrilled to have the next Vertica Big Data Conference. Actually it was a physical event, we're moving it online. We know it's going to be a big hit because we've been doing this for some time particularly with two of the webcast series we have every month. One is under the Hood Webcast Series, which is led by our engineers and the other is what we call a Data Disruptors Webcast Series, which is led by all customers. So we're really confident this is going to be a big hit we've seen the registration spike. We just hit 1,000 and we're planning on having about 1,000 at the physical event. It's growing and growing. We're going to see those big numbers and it's not going to be a one time thing. We're going to keep the conversation going, make sure there's plenty of best practices learning throughout the year. >> We've been at all the big BDCs and the first one's were really in the heart of the Big Data Movement, really exciting time and the interesting thing about this event is it was always sort of customers talking to customers. There wasn't a lot of commercials, an intimate event. Of course I loved it because it was in our hometown. But I think you're trying to carry that theme obviously into the digital sphere. Maybe you can talk about that a little bit. >> Yeah, Dave, absolutely right. Of course, nothing replaces face to face, but everything that you just mentioned that makes it special about the Big Data Conference, and you know, you guys have been there throughout and shown great support in talking to so many customers and leaders and what have you. We're doing the same thing all right. So we had about 40 plus sessions planned for the physical event. We're going to run half of those and we're not going to lose anything though, that's the key point. So what makes the Vertica Big Data Conference really special is that the only presenters that are allowed to present are either engineers, Vertica engineers, or best practices engineers and then customers. Customers that actually use the product. There's no sales or marketing pitches or anything like that. And I'll tell you as far as the customer line up that we have, we've got five or six already lined up as part of those 20 sessions, customers like Uber, customers like the Trade Desk, customers like Phillips talking about predictive maintenance, so list goes on and on. You won't want to miss it if you're on the fence or if you're trying to figure out if you want to register for this event. Best part about it, it's all free, and if you can't attend it live, it will be live Q&A chat on every single one of those sessions, we promise we'll answer every question if we don't get it live, as we always do. They'll all be available on demand. So no reason not to register and attend or watch later. >> Thinking about the content over the years, in the early days of the Big Data Conference, of course Vertica started before the whole Big Data Conference meme really took off and then as it took off, plugged right into it, but back then the discussion was a lot of what do I do with big data, Gartner's three Vs and how do I wrangle it all, and what's the best approach and this stuff is, Hadoop is really complicated. Of course Vertica was an alternative to RDBMS that really couldn't scale or give that type of performance for analytical databases so you had your foot in that door. But now the conversation that's interesting your theme, it's win big with data. Of course, the physical event was at the Encore, which is the new Casino in Boston. But my point is, the conversation is no longer about, how to wrangle all this data, you know how to lower the cost of storing this data, how to make it go faster, and actually make it work. It's really about how to turn data into insights and transform your organizations and quote and quote, win with big data. >> That's right. Yeah, that's great point, Dave. And that's why I mean, we chose the title really, because it's about our customers and what they're able to do with our platform. And it's we know, it's not just one platform, all of the ecosystem, all of our incredible partners. Yeah it's funny when I started with the organization about seven years ago, we were closing lots of deals, and I was following up on case studies and it was like, Okay, why did you choose Vertica? Well, the queries went fast. Okay, so what does that mean for your business? We knew we're kind of in the early adopter stage. And we were disrupting the data warehouse market. Now we're talking to our customers that their volumes are growing, growing and growing. And they really have these analytical use cases again, talk to the value at the entire organization is gaining from it. Like that's the difference between now and a few years ago, just like you were saying, when Vertica disrupted the database market, but also the data warehouse market, you can speak to our customers and they can tell you exactly what's happening, how it's moving the needle or really advancing the entire organization, regardless of the analytical use case, whether it's an internet of things around predictive maintenance, or customer behavior analytics, they can speak confidently of it more than just, hey, our queries went faster. >> You know, I've mentioned before the Micro Focus investment, I want to drill into that a bit because the Vertica brand stands alone. It's a Micro Focus company, but Vertica has its own sort of brand awareness. The reason I've mentioned that is because if you go back to the early days of MPP Database, there was a spate of companies, startups that formed. And many if not all of those got acquired, some lived on with the Codebase, going into the cloud, but generally speaking, many of those brands have gone away Vertica stays. And so my point is that we've seen Vertica have staying power throughout, I think it's a function of the architecture that Stonebraker originally envisioned, you guys were early on the market had a lot of good customer traction, and you've been very responsive to a lot of the trends. Colin Mahony will talk about how you adopted and really embrace cloud, for example, and different data formats. And so you've really been able to participate in a lot of the new emerging waves that have come out to the market. And I would imagine some of that's cultural. I wonder if you could just address that in the context of BDC. >> Oh, yeah, absolutely. You hit on all the key points here, Dave. So a lot of changes in the industry. We're in the hottest industry, the tech industry right now. There's lots of competition. But one of the things we'll say in terms of, Hey, who do you compete with? You compete with these players in the cloud, open source alternatives, traditional enterprise data warehouses. That's true, right. And one of the things we've stayed true within calling is really kind of led the charge for the organization is that we know who we are right. So we're an analytical database platform. And we're constantly just working on that one sole Source Code base, to make sure that we don't provide a bunch of different technologies and databases, and different types of technologies need to stitch together. This platform just has unbelievable universal capabilities from everything from running analytics at scale, to in Database Machine Learning with the different approach to all different types of deployment models that are supported, right. We don't go to our companies and we say, yeah, we take care of all your problems but you have to stitch together all these different types of technologies. It's all based on that core Vertica engine, and we've expanded it to meet all these market needs. So Colin knows and what he believes and what he tells the team what we lead with, is that it lead with that one core platform that can address all these analytical initiatives. So we know who we are, we continue to improve on it, regardless of the pivots and the drastic measures that some of the other competitors have taken. >> You know, I got to ask you, so we're in the middle of this global pandemic with Coronavirus and COVID-19, and things change daily by the hour sometimes by the minute. I mean, every day you get up to something new. So you see a lot of forecasts, you see a lot of probability models, best case worst case likely case even though nobody really knows what that likely case looks like, So there's a lot of analytics going on and a lot of data that people are crunching new data sources come in every day. Are you guys participating directly in that, specifically your customers? Are they using your technology? You can't use a traditional data warehouse for this. It's just you know, too slow to asynchronous, the process is cumbersome. What are you seeing in the customer base as it relates to this crisis? >> Sure, well, I mean naturally, we have a lot of customers that are healthcare technology companies, companies, like Cerner companies like Philips, right, that are kind of leading the charge here. And of course, our whole motto has always been, don't throw away any the data, there's value in that data, you don't have to with Vertica right. So you got petabyte scale types of analytics across many of our customers. Again, just a few years ago, we called the customers a petabyte club. Now a majority of our large enterprise software companies are approaching those petabyte volumes. So it's important to be able to run those analytics at that scale and that volume. The other thing we've been seeing from some of our partners is really putting that analytics to use with visualizations. So one of the customers that's going to be presenting as part of the Vertica Big Data conferences is Domo. Domo has a really nice stout demo around be able to track the Coronavirus the outbreak and how we're getting care and things like that in a visual manner you're seeing more of those. Well, Domo embeds Vertica, right. So that's another customer of ours. So think of Vertica is that embedded analytical engine to support those visualizations so that just anyone in the world can track this. And hopefully as we see over time, cases go down we overcome this. >> Talk a little bit more about that. Because again, the BDC has always been engineers presenting to audiences, you guys have a lot of you just mentioned the demo by Domo, you have a lot of brand names that we've interviewed on theCUBE before, but maybe you could talk a little bit more about some of the customers that are going to be speaking at the virtual event, and what people can expect. >> Sure, yeah, absolutely. So we've got Uber that's presenting just a quick fact around Uber. Really, the analytical data warehouse is all Vertica, right. And it works very closely with Open Source or what have you. Just to quick stat on on Uber, 14 million rides per day, what Uber is able to do is connect the riders with the drivers so that they can determine the appropriate pricing. So Uber is going to be a great session that everyone will want to tune in on that. Others like the Trade Desk, right massive Ad Tech company 10 billion ad auctions daily, it may even be per second or per minute, the amount of scale and analytical volume that they have, that they are running the queries across, it can really only be accomplished with a few platforms in the world and that's Vertica that's another a hot one is with the Trade Desk. Philips is going to be presenting IoT analytical workloads we're seeing more and more of those across not only telematics, which you would expect within automotive, but predictive maintenance that cuts across all the original manufacturers and Philips has got a long history of being able to handle sensor data to be able to apply to those business cases where you can improve customer satisfaction and lower costs related to services. So around their MRI machines and predictive maintenance initiative, again, Vertica is kind of that heartbeat, that analytical platform that's driving those initiatives So list goes on and on. Again, the conversation is going to continue with the Data Disruptors in the Under Hood webcast series. Any customers that weren't able to present and we had a few that just weren't able to do it, they've already signed up for future months. So we're already booked out six months out more and more customer stories you're going to hear from Vertica.com. >> Awesome, and we're going to be sharing some of those on theCUBE as well, the BDC it's always been intimate event, one of my favorites, a lot of substance and I'm sure the online version, the virtual digital version is going to be the same. Jeff Healey, thanks so much for coming on theCUBE and give us a little preview of what we can expect at the Vertica BDC 2020. >> You bet. >> Thank you. >> Yeah, Dave, thanks to you and the whole CUBE team. Appreciate it >> Alright, and thank you for watching everybody. Keep it right here for all the coverage of the virtual Big Data conference 2020. You're watching theCUBE. I'm Dave Vellante, we'll see you soon

Published Date : Mar 20 2020

SUMMARY :

connecting with top leaders all around the world, actually, of the Vertica Big Data Conference. Thanks for the opportunity to chat. Now I'm excited about the products that you guys announced and it's not going to be a one time thing. and the interesting thing about this event is that the only presenters that are allowed to present how to wrangle all this data, you know how to lower the cost all of the ecosystem, all of our incredible partners. in a lot of the new emerging waves So a lot of changes in the industry. and a lot of data that people are crunching So one of the customers that's going to be presenting that are going to be speaking at the virtual event, Again, the conversation is going to continue and I'm sure the online version, the virtual digital version Yeah, Dave, thanks to you and the whole CUBE team. of the virtual Big Data conference 2020.

ENTITIES

Entity	Category	Confidence
Jeff Healy	PERSON	0.99+
Philips	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Jeff Healey	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Dave	PERSON	0.99+
Microfocus	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
$70 million	QUANTITY	0.99+
Colin	PERSON	0.99+
20 sessions	QUANTITY	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
Boston	LOCATION	0.99+
March 2020	DATE	0.99+
Gartner	ORGANIZATION	0.99+
One	QUANTITY	0.99+
six months	QUANTITY	0.99+
Domo	ORGANIZATION	0.98+
one platform	QUANTITY	0.98+
Big Data Conference	EVENT	0.98+
two areas	QUANTITY	0.98+
one	QUANTITY	0.98+
CUBE	ORGANIZATION	0.98+
Vertica Big Data Conference	EVENT	0.98+
Coronavirus	OTHER	0.98+
Stonebraker	ORGANIZATION	0.98+
about 40 plus sessions	QUANTITY	0.97+
COVID-19	OTHER	0.96+
BDC	ORGANIZATION	0.96+
one core platform	QUANTITY	0.95+
Vertica BDC 2020	EVENT	0.95+
1,000	QUANTITY	0.95+
Vertica Big Data	EVENT	0.95+
one time	QUANTITY	0.95+
Micro Focus	ORGANIZATION	0.94+
few years ago	DATE	0.93+
about 1,000	QUANTITY	0.93+
Codebase	ORGANIZATION	0.93+
Phillips	ORGANIZATION	0.93+
Cerner	ORGANIZATION	0.92+
10 billion ad auctions	QUANTITY	0.91+
14 million rides per day	QUANTITY	0.9+
Coronavirus	EVENT	0.89+
first one	QUANTITY	0.89+
Under Hood	TITLE	0.86+
Hadoop	TITLE	0.85+
BDC	EVENT	0.83+
seven years ago	DATE	0.8+
outbreak	EVENT	0.79+

Chris Lynch, AtScale | MIT CDOIQ 2019

>> From Cambridge, Massachusetts it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by, SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. You're watching theCUBE, the leader in live tech coverage. I'm Dave Vellante with my co-host, Paul Gillan. Chris Lynch, good friend is here CEO, newly minted CEO and AtScale and legend. Good to see you. >> In my own mind. >> In mine too. >> It's great to be here. >> It's awesome, thank you for taking time. I know how busy you are, you're running around like crazy your next big thing. I was excited to hear that you got back into it. I predicted it a while ago you were a very successful venture capitalists but at heart, you're startup guy, aren't ya? >> Yeah 100%, 100%. I couldn't be more thrilled, I feel invigorated. I think I've told you many times, when you've interviewed me and asked me about the transition from being an entrepreneur to being a VC and since it's a PG show, I've got a different analog than the one I usually give you. I used to be a movie star and now I'm an executive producer of movies. Now am back to being a movie star, hopefully. >> yeah well, so you told me when you first became a VC you said, I look for startups that have a 10X impact either 10X value, 10X cost reduction. What was it that attracted you to AtScale? What's the 10X? >> AtScale, addresses $150 billion market problem which is basically bringing traditional BI to the cloud. >> That's the other thing you told me, big markets. >> Yeah, so that's the first thing massive market opportunity. The second is, the innovation component and where the 10X comes we're uniquely qualified to virtualize data into the pipeline and out. So I like to say that we're the bridge between BI and AI and back. We make every BI user, a citizen data scientist and that's a game changer. And that's sort of the new futuristic component of what we do. So one part is steeped in, that $150 billion BI marketplace in a traditional analytics platforms and then the second piece is into you delivering the data, into these BI excuse me, these AI machine learning platforms. >> Do you see that ultimately getting integrated into some kind of larger, data pipeline framework. I mean, maybe it lives in the cloud or maybe on prem, how do you see that evolving over time? >> So I believe that, with AtScale as one single pane of glass, we basically are providing an API, to the data and to the user, one single API. The reason that today we haven't seen the delivery of the promise of big data is because we don't have big data. Fortunate 2000 companies don't have big data. They have lots of data but to me big data means you can have one logical view of that data and get the best data pumped into these models in these tools, and today that's not the case. They're constricted by location they're constricted by vendor they're constricted by whether it's in the cloud or on prem. We eliminate those restrictions. >> The single API, I think is important actually. Because when you look at some of these guys what they're doing with their data pipeline they might have 10 or 15 unique API's that they're trying to manage. So there's a simplification aspect to, I suppose. >> One of the knocks on traditional BI has always been the need for extract databases and all the ETL that goes that's involved in that. Do you guys avoid that stage? You go to the production data directly or what's the-- >> It's a great question. The way I put it is, we bring Moses to the mountain the mountain being the data, Moses being the user. Traditionally, what people have been trying to do is bring the mountain to Moses, doesn't scale. At AtScale, we provide an abstraction a logical distraction between the data and the BI user. >> You don't touch, you don't move the data. >> We don't move the data. Which is what's unique and that's what's delivering I think, way more than a 10X delivery in value. >> Because you leave the data in place you bring that value to wherever the data is. Which is the original concept of Hadoop, by the way. That was what was profound about Hadoop everybody craps on it now, but that was the game changer and if you could take advantage of that that's how you tap your 10X. >> To the difference is, we're not, to your point we're not moving the data. Hadoop, in my humble opinion why it plateaued is because to get the value, you had to ask the user to bring and put data in yet another platform. And the reason that we're not delivering on big data as an industry, I believe is because we've too many data sources, too many platforms too many consumers of data and too many producers. As we build all these islands of data, with no connectivity. The idea is, we'll create this big data lake and we're going to physically put everything in there. Guess what? Someday turned out to be never. Because people aren't going to deal with the business disruption. We move thousands of users from a platform like Teradata to a platform like Snowflake or Google BigQuery, we don't care. We're a multi-cloud and we're a hybrid cloud. But we do it without any disruption. You're using Excel, you just continue and use it. You just see the results are faster. You use Tableau, same difference. >> So we had all the vertical rock stars in here. So we had Colin in yesterday, we had Stonebraker around earlier. Andy Palmer just came on and Chris here with the CEO who ultimately sold the company to HP. That really didn't do anything with it and then spun it off and now it's back. Aaron was, he had a spring in his step yesterday. So when you think about, Vertica. The technology behind Vertica go back 10 years and where we come now give us a little journey of, your data journey. >> So I think it plays into the, the original assertion is that, vertical is a best-in-class platform for analytics but it was yet another platform. The analog I give now, is now we have Snowflake and six months, 12 months from now we're going to have another one. And that creates a set of problems if you have to live in the physical world. Because you've all these islands of data and I believe, it's about the data not about the models, it's about the data. You can't get optimal results if you don't have an optimal access to the pertinent data. I believe that having that Universal API is going to make the next platform that more valuable. You're not going to be making the trade-off is, okay we have this platform that has some neat capability but the trade-off is from an enterprise architecture perspective we're never going to be able to connect all this stuff. That's how all of these things proliferated. My view is, in a world where you have that single pane of glass, that abstraction layer between the user and the data. Then innovation can be spawned quicker and you can use these tools effectively 'cause you're not compromising being able to get a logical view of the data and get access to it as a user. >> What's your issue with Snowflake you mentioned them, Mugli's company-- >> No issue, they're a great partner of ours. We eliminate the friction between the user going from an on-prem solution to the cloud. >> Slootman just took over there. So you know where that's going. >> Yep (laughing) >> Frank's got the magic touch. Okay good, you say they're a partner yours how are you guys partnering? >> They refer us into customers that, if you want to buy Snowflake now the next issue is, how do i migrate? You don't. You put our virtualization layer in and then we allow you access to Snowflake in a non-disruptive way, versus having to move data into their system or into a particular cloud which creates sales friction. >> Moving data is just, you want to avoid it at all cost. >> I do want to ask you because I met with your predecessors, Dave Mariani last year and I know he was kind of a reluctant CEO he didn't really want to be CEO but wanted to be CTO, which is what he is now. How did that come about, that they found you that you connected with them and decided this was the right opportunity. >> That's a great question. I actually looked at the company at the seed stage when I was in venture, but I had this thing as you know that, I wanted to move companies to Boston and they're about my vintage age-wise and he's married with four kids so that wasn't in the cards. I said look, it doesn't make sense for me to seed this company 'cause I can't give you the time you're out in California everything I'm instrumenting is around Boston. We parted friends. And I was skeptical whether he could build this 'cause people have been talking about building a heterogeneous universal semantic layer, for years and it's never come to fruition. And then he read in Fortune or Forbes that I was leaving Accomplice and that I was looking for one more company to operate. He reached out and he told me what they were doing that hey, we really built it but we need help and I don't want to run this. It's not right for the company and the opportunity So he said, "I'll come and I'll consult to you." I put together a plan and I had my Vertica and data robot. NekTony guys do the technical diligence to make sure that the architecture wasn't wedded to the dupe, like all the other ones were and when I saw it wasn't then I knew the market opportunity was to take that, rifle and point it at that legacy $150 billion BI market not at the billion dollar market of Hadoop. And when we did that, we've been growing at 162% quarter-over-quarter. We've built development centers in Bulgaria. We've moved all operations, non-technical to Boston here down in our South Station. We've been on fire and we are the partner of choice of every cloud manner, because we eliminate the sales friction, for customers being able to take advantage of movement to the cloud and we're able through our intelligent pipeline and capability. We're able to reduce the cost significantly of queries because we understand and we were able to intelligently cash those queries. >> Sales ops is here, all-- >> Sales marketing, customer support, customer success and we're building a machine learning team here at Dev team here. >> Where are you in that sort of Boston build-out? >> We have an office on 711 Atlantic that we opened in the fall. We're actually moving from 4,000 square feet to 10,000 this month. In less than six months and we'll house by the first year, 100 employees in Boston 100 in Bulgaria and about that same hundred in San Mateo. >> Are you going after net new business mainly? Or there's a lot of legacy BI out there are you more displacing those products? >> A couple of things. What we find is that, customers want to evolve into the cloud, they don't want a revolution they want a evolution. So we allow them, because we support hybrid cloud to keep some data behind the firewall and then experiment with moving other data to the cloud platform of choice but we're still providing that one logical view. I would say most of our customers are looking to reap platform, off of Teradata or something onto a, another platform like Snowflake. And then we have a set of customers that see that as part of the solution but not the whole solution. They're more true hybrids but I would say that 80% of our customers are traditional BI customers that are trying to contemporize their environments and be able to take advantage of tabular support and multidimensional, the things that we do in addition to the cube world. >> They can keep whatever they're using. >> Correct, that's the key. >> Did you do the series D, you did, right? >> Yes, Morgan Stanely led. >> So you're not actively but you're good for now, It was like $50 million >> Yeah we raised $50 million. >> You're good for a bit. Who's in the Chris Lynch target? (laughs) Who's the enemy? Vertica, I could say it was the traditional database guys. Who's the? >> We're in a unique position, we're almost Switzerland so we could be friend to foe, of anybody in that ecosystem because we can, non-disruptively re-platform customers between legacy platforms or from legacy platforms to the cloud. We're an interesting position. >> So similar to the file sharing. File virtualization company >> The Copier. >> Copier yeah. >> It puts us in an interesting position. They need to be friends with us and at the same time I'm sure that they're concerned about the capabilities we have but we have a number of retail customers for instance that have asked us to move down from Amazon to Google BigQuery, which we accommodate and because we can do that non-disruptively. The cost and the ability to move is eliminated. It gives customers true freedom of choice. >> How worried are you, that AWS tries to replicate what you guys do. You're in their sights. >> I think there are technical, legal and structural barriers to them doing that. The technical is, this team has been at it for six and a half years. So to do what we do, they'll have to do what we've done. Structurally from a business perspective if they could, I'm not sure they want to. The way to think about Amazon is, they're no different than Teradata, except for they want the same vendor lock-in except they want it to be the Amazon Cloud when Teradata wanted it to be, their data warehouse. >> They don't promote multi-cloud versus-- >> Yeah, they don't want multi-cloud they don't want >> On Prem >> Customers to have a freedom of choice. Would they really enable a heterogeneous abstraction layer, I don't think they would nor do I think any of the big guys would. They all claim to have this capability for their system. It's like the old IBM adage I'm in prison but the food's going to get three squares a day, I get cable TV but I'm in prison. (laughing) >> Awesome, all right, parting thoughts. >> Parting thoughts, oh geez you got to give me a question I'm not that creative. >> What's next, for you guys? What should we be paying attention to? >> I think you're going to see some significant announcements in September regarding the company and relationships that I think will validate the impact we're having in the market. >> Give you some leverage >> Yeah, will give us, better channel leverage. We have a major technical announcement that I think will be significant to the marketplace and what will be highly disruptive to some of the people you just mentioned. In terms of really raising the bar for customers to be able to have the freedom of choice without any sort of vendor lock-in. And I think that that will create some counter strike which we'll be ready for. (laughing) >> If you've never heard of AtScale before trust me you're going to in the next 18 months. Chris Lynch, thanks so much for coming on theCUBE. >> It's my pleasure. >> Great to see you. All right, keep it right there everybody we're back with our next guest, right after this short break you're watching theCUBE from MIT, right back. (upbeat music)

Published Date : Aug 2 2019

SUMMARY :

Brought to you by, SiliconANGLE Media. Good to see you. that you got back into it. and asked me about the transition What was it that attracted you to AtScale? traditional BI to the cloud. That's the other thing and then the second piece is into you I mean, maybe it lives in the cloud and get the best data Because when you look and all the ETL that goes is bring the mountain don't move the data. We don't move the data. and if you could take advantage of that is because to get the value, So when you think about, Vertica. and I believe, it's about the data We eliminate the friction between the user So you know where that's going. Frank's got the magic touch. and then we allow you access to Snowflake you want to avoid it that they found you and it's never come to fruition. and we're building a by the first year, 100 employees in Boston the things that we do Who's in the Chris Lynch target? to the cloud. So similar to the file sharing. about the capabilities we have tries to replicate what you guys do. So to do what we do, they'll I'm in prison but the food's you got to give me a question in September regarding the to some of the people you just mentioned. in the next 18 months. Great to see you.

ENTITIES

Entity	Category	Confidence
Paul Gillan	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Chris Lynch	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Bulgaria	LOCATION	0.99+
September	DATE	0.99+
Chris	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
Andy Palmer	PERSON	0.99+
Dave Mariani	PERSON	0.99+
California	LOCATION	0.99+
Aaron	PERSON	0.99+
Boston	LOCATION	0.99+
San Mateo	LOCATION	0.99+
$150 billion	QUANTITY	0.99+
$50 million	QUANTITY	0.99+
$150 billion	QUANTITY	0.99+
Moses	PERSON	0.99+
80%	QUANTITY	0.99+
4,000 square feet	QUANTITY	0.99+
last year	DATE	0.99+
second piece	QUANTITY	0.99+
162%	QUANTITY	0.99+
South Station	LOCATION	0.99+
AtScale	ORGANIZATION	0.99+
Morgan Stanely	PERSON	0.99+
100%	QUANTITY	0.99+
four kids	QUANTITY	0.99+
Excel	TITLE	0.99+
six and a half years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Tableau	TITLE	0.99+
yesterday	DATE	0.99+
first	QUANTITY	0.99+
second	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
less than six months	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Frank	PERSON	0.99+
today	DATE	0.98+
this month	DATE	0.98+
Switzerland	LOCATION	0.98+
Hadoop	TITLE	0.98+
10X	QUANTITY	0.98+
100 employees	QUANTITY	0.98+
one part	QUANTITY	0.98+
Slootman	PERSON	0.98+
10,000	QUANTITY	0.97+
Vertica	ORGANIZATION	0.97+
Mugli	ORGANIZATION	0.97+
Google	ORGANIZATION	0.97+
15 unique API	QUANTITY	0.96+
hundred	QUANTITY	0.96+
six months	QUANTITY	0.96+
three squares a day	QUANTITY	0.96+
thousands of users	QUANTITY	0.96+
NekTony	ORGANIZATION	0.96+
Fortune	TITLE	0.96+
12 months	QUANTITY	0.95+
single API	QUANTITY	0.95+
711 Atlantic	LOCATION	0.95+
2000 companies	QUANTITY	0.94+
One	QUANTITY	0.94+
next 18 months	DATE	0.94+
Colin	PERSON	0.93+
one more company	QUANTITY	0.92+
one single API	QUANTITY	0.92+
single pane	QUANTITY	0.91+

Mark Ramsey, Ramsey International LLC | MIT CDOIQ 2019

>> From Cambridge, Massachusetts. It's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. We're here at MIT, sweltering Cambridge, Massachusetts. You're watching theCUBE, the leader in live tech coverage, my name is Dave Vellante. I'm here with my co-host, Paul Gillin. Special coverage of the MITCDOIQ. The Chief Data Officer event, this is the 13th year of the event, we started seven years ago covering it, Mark Ramsey is here. He's the Chief Data and Analytics Officer Advisor at Ramsey International, LLC and former Chief Data Officer of GlaxoSmithKline. Big pharma, Mark, thanks for coming onto theCUBE. >> Thanks for having me. >> You're very welcome, fresh off the keynote. Fascinating keynote this evening, or this morning. Lot of interest here, tons of questions. And we have some as well, but let's start with your history in data. I sat down after 10 years, but I could have I could have stretched it to 20. I'll sit down with the young guns. But there was some folks in there with 30 plus year careers. How about you, what does your data journey look like? >> Well, my data journey, of course I was able to stand up for the whole time because I was in the front, but I actually started about 32, a little over 32 years ago and I was involved with building. What I always tell folks is that Data and Analytics has been a long journey, and the name has changed over the years, but we've been really trying to tackle the same problems of using data as a strategic asset. So when I started I was with an insurance and financial services company, building one of the first data warehouse environments in the insurance industry, and that was in the 87, 88 range, and then once I was able to deliver that, I ended up transitioning into being in consulting for IBM and basically spent 18 years with IBM in consulting and services. When I joined, the name had evolved from Data Warehousing to Business Intelligence and then over the years it was Master Data Management, Customer 360. Analytics and Optimization, Big Data. And then in 2013, I joined Samsung Mobile as their first Chief Data Officer. So, moving out of consulting, I really wanted to own the end-to-end delivery of advanced solutions in the Data Analytics space and so that made the transition to Samsung quite interesting, very much into consumer electronics, mobile phones, tablets and things of that nature, and then in 2015 I joined GSK as their first Chief Data Officer to deliver a Data Analytics solution. >> So you have long data history and Paul, Mark took us through. And you're right, Mark-o, it's a lot of the same narrative, same wine, new bottle but the technology's obviously changed. The opportunities are greater today. But you took us through Enterprise Data Warehouse which was ETL and then MAP and then Master Data Management which is kind of this mapping and abstraction layer, then an Enterprise Data Model, top-down. And then that all failed, so we turned to Governance which has been very very difficult and then you came up with another solution that we're going to dig into, but is it the same wine, new bottle from the industry? >> I think it has been over the last 20, 30 years, which is why I kind of did the experiment at the beginning of how long folks have been in the industry. I think that certainly, the technology has advanced, moving to reduction in the amount of schema that's required to move data so you can kind of move away from the map and move type of an approach of a data warehouse but it is tackling the same type of problems and like I said in the session it's a little bit like Einstein's phrase of doing the same thing over and over again and expecting a different answer is certainly the definition of insanity and what I really proposed at the session was let's come at this from a very different perspective. Let's actually use Data Analytics on the data to make it available for these purposes, and I do think I think it's a different wine now and so I think it's just now a matter of if folks can really take off and head that direction. >> What struck me about, you were ticking off some of the issues that have failed like Data Warehouses, I was surprised to hear you say Data Governance really hasn't worked because there's a lot of talk around that right now, but all of those are top-down initiatives, and what you did at GSK was really invert that model and go from the bottom up. What were some of the barriers that you had to face organizationally to get the cooperation of all these people in this different approach? >> Yeah, I think it's still key. It's not a complete bottoms up because then you do end up really just doing data for the sake of data, which is also something that's been tried and does not work. I think it has to be a balance and that's really striking that right balance of really tackling the data at full perspective but also making sure that you have very definitive use cases to deliver value for the organization and then striking the balance of how you do that and I think of the things that becomes a struggle is you're talking about very large breadth and any time you're covering multiple functions within a business it's getting the support of those different business functions and I think part of that is really around executive support and what that means, I did mention it in the session, that executive support to me is really stepping up and saying that the data across the organization is the organization's data. It isn't owned by a particular person or a particular scientist, and I think in a lot of organization, that gatekeeper mentality really does put barriers up to really tackling the full breadth of the data. >> So I had a question around digital initiatives. Everywhere you go, every C-level Executive is trying to get digital right, and a lot of this is top-down, a lot of it is big ideas and it's kind of the North Star. Do you think that that's the wrong approach? That maybe there should be a more tactical line of business alignment with that threaded leader as opposed to this big picture. We're going to change and transform our company, what are your thoughts? >> I think one of the struggles is just I'm not sure that organizations really have a good appreciation of what they mean when they talk about digital transformation. I think there's in most of the industries it is an initiative that's getting a lot of press within the organizations and folks want to go through digital transformation but in some cases that means having a more interactive experience with consumers and it's maybe through sensors or different ways to capture data but if they haven't solved the data problem it just becomes another source of data that we're going to mismanage and so I do think there's a risk that we're going to see the same outcome from digital that we have when folks have tried other approaches to integrate information, and if you don't solve the basic blocking and tackling having data that has higher velocity and more granularity, if you're not able to solve that because you haven't tackled the bigger problem, I'm not sure it's going to have the impact that folks really expect. >> You mentioned that at GSK you collected 15 petabytes of data of which only one petabyte was structured. So you had to make sense of all that unstructured data. What did you learn about that process? About how to unlock value from unstructured data as a result of that? >> Yeah, and I think this is something. I think it's extremely important in the unstructured data to apply advanced analytics against the data to go through a process of making sense of that information and a lot of folks talk about or have talked about historically around text mining of trying to extract an entity out of unstructured data and using that for the value. There's a few steps before you even get to that point, and first of all it's classifying the information to understand which documents do you care about and which documents do you not care about and I always use the story that in this vast amount of documents there's going to be, somebody has probably uploaded the cafeteria menu from 10 years ago. That has no scientific value, whereas a protocol document for a clinical trial has significant value, you don't want to look through manually a billion documents to separate those, so you have to apply the technology even in that first step of classification, and then there's a number of steps that ultimately lead you to understanding the relationship of the knowledge that's in the documents. >> Side question on that, so you had discussed okay, if it's a menu, get rid of it but there's certain restrictions where you got to keep data for decades. It struck me, what about work in process? Especially in the pharmaceutical industry. I mean, post Federal Rules of Civil Procedure was everybody looking for a smoking gun. So, how are organizations dealing with what to keep and what to get rid of? >> Yeah, and I think certainly the thinking has been to remove the excess and it's to your point, how do you draw the line as to what is excess, right, so you don't want to just keep every document because then if an organization is involved in any type of litigation and there's disclosure requirements, you don't want to have to have thousands of documents. At the same time, there are requirements and so it's like a lot of things. It's figuring out how do you abide by the requirements, but that is not an easy thing to do, and it really is another driver, certainly document retention has been a big thing over a number of years but I think people have not applied advanced analytics to the level that they can to really help support that. >> Another Einstein bro-mahd, you know. Keep everything you must but no more. So, you put forth a proposal where you basically had this sort of three approaches, well, combined three approaches. The crawlers to go, the spiders to go out and do the discovery and I presume that's where the classification is done? >> That's really the identification of all of the source information >> Okay, so find out what you got, okay. >> so that's kind of the start. Find out what you have. >> Step two is the data repository. Putting that in, I thought it was when I heard you I said okay it must be a logical data repository, but you said you basically told the CIO we're copying all the data and putting it into essentially one place. >> A physical location, yes. >> Okay, and then so I got another question about that and then use bots in the pipeline to move the data and then you sort of drew the diagram of the back end to all the databases. Unstructured, structured, and then all the fun stuff up front, visualization. >> Which people love to focus on the fun stuff, right? Especially, you can't tell how many articles are on you got to apply deep learning and machine learning and that's where the answers are, we have to have the data and that's the piece that people are missing. >> So, my question there is you had this tactical mindset, it seems like you picked a good workload, the clinical trials and you had at least conceptually a good chance of success. Is that a fair statement? >> Well, the clinical trials was one aspect. Again, we tackled the entire data landscape. So it was all of the data across all of R&D. It wasn't limited to just, that's that top down and bottom up, so the bottom up is tackle everything in the landscape. The top down is what's important to the organization for decision making. >> So, that's actually the entire R&D application portfolio. >> Both internal and external. >> So my follow up question there is so that largely was kind of an inside the four walls of GSK, workload or not necessarily. My question was what about, you hear about these emerging Edge applications, and that's got to be a nightmare for what you described. In other words, putting all the data into one physical place, so it must be like a snake swallowing a basketball. Thoughts on that? >> I think some of it really does depend on you're always going to have these, IOT is another example where it's a large amount of streaming information, and so I'm not proposing that all data in every format in every location needs to be centralized and homogenized, I think you have to add some intelligence on top of that but certainly from an edge perspective or an IOT perspective or sensors. The data that you want to then make decisions around, so you're probably going to have a filter level that will impact those things coming in, then you filter it down to where you're going to really want to make decisions on that and then that comes together with the other-- >> So it's a prioritization exercise, and that presumably can be automated. >> Right, but I think we always have these cases where we can say well what about this case, and you know I guess what I'm saying is I've not seen organizations tackle their own data landscape challenges and really do it in an aggressive way to get value out of the data that's within their four walls. It's always like I mentioned in the keynote. It's always let's do a very small proof of concept, let's take a very narrow chunk. And what ultimately ends up happening is that becomes the only solution they build and then they go to another area and they build another solution and that's why we end up with 15 or 25-- (all talk over each other) >> The conventional wisdom is you start small. >> And fail. >> And you go on from there, you fail and that's now how you get big things done. >> Well that's not how you support analytic algorithms like machine learning and deep learning. You can't feed those just fragmented data of one aspect of your business and expect it to learn intelligent things to then make recommendations, you've got to have a much broader perspective. >> I want to ask you about one statistic you shared. You found 26 thousand relational database schemas for capturing experimental data and you standardized those into one. How? >> Yeah, I mean we took advantage of the Tamr technology that Michael Stonebraker created here at MIT a number of years ago which is really, again, it's applying advanced analytics to the data and using the content of the data and the characteristics of the data to go from dispersed schemas into a unified schema. So if you look across 26 thousand schemas using machine learning, you then can understand what's the consolidated view that gives you one perspective across all of those different schemas, 'cause ultimately when you give people flexibility they love to take advantage of it but it doesn't mean that they're actually doing things in an extremely different way, 'cause ultimately they're capturing the same kind of data. They're just calling things different names and they might be using different formats but in that particular case we use Tamr very heavily, and that again is back to my example of using advanced analytics on the data to make it available to do the fun stuff. The visualization and the advanced analytics. >> So Mark, the last question is you well know that the CDO role emerged in these highly regulated industries and I guess in the case of pharma quasi-regulated industries but now it seems to be permeating all industries. We have Goka-lan from McDonald's and virtually every industry is at least thinking about this role or has some kind of de facto CDO, so if you were slotted in to a CDO role, let's make it generic. I know it depends on the industry but where do you start as a CDO for an organization large company that doesn't have a CDO. Even a mid-sized organization, where do you start? >> Yeah, I mean my approach is that a true CDO is maximizing the strategic value of data within the organization. It isn't a regulatory requirement. I know a lot of the banks started there 'cause they needed someone to be responsible for data quality and data privacy but for me the most critical thing is understanding the strategic objectives of the organization and how will data be used differently in the future to drive decisions and actions and the effectiveness of the business. In some cases, there was a lot of discussion around monetizing the value of data. People immediately took that to can we sell our data and make money as a different revenue stream, I'm not a proponent of that. It's internally monetizing your data. How do you triple the size of the business by using data as a strategic advantage and how do you change the executives so what is good enough today is not good enough tomorrow because they are really focused on using data as their decision making tool, and that to me is the difference that a CDO needs to make is really using data to drive those strategic decision points. >> And that nuance you mentioned I think is really important. Inderpal Bhandari, who is the Chief Data Officer of IBM often says how can you monetize the data and you're right, I don't think he means selling data, it's how does data contribute, if I could rephrase what you said, contribute to the value of the organization, that can be cutting costs, that can be driving new revenue streams, that could be saving lives if you're a hospital, improving productivity. >> Yeah, and I think what I've shared typically shared with executives when I've been in the CDO role is that they need to change their behavior, right? If a CDO comes in to an organization and a year later, the executives are still making decisions on the same data PowerPoints with spinning logos and they said ooh, we've got to have 'em. If they're still making decisions that way then the CDO has not been successful. The executives have to change what their level of expectation is in order to make a decision. >> Change agents, top down, bottom up, last question. >> Going back to GSK, now that they've completed this massive data consolidation project how are things different for that business? >> Yeah, I mean you look how Barron joined as the President of R&D about a year and a half ago and his primary focus is using data and analytics and machine learning to drive the decision making in the discovery of a new medicine and the environment that has been created is a key component to that strategic initiative and so they are actually completely changing the way they're selecting new targets for new medicines based on data and analytics. >> Mark, thanks so much for coming on theCUBE. >> Thanks for having me. >> Great keynote this morning, you're welcome. All right, keep it right there everybody. We'll be back with our next guest. This is theCUBE, Dave Vellante with Paul Gillin. Be right back from MIT. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Special coverage of the MITCDOIQ. I could have stretched it to 20. and so that made the transition to Samsung and then you came up with another solution on the data to make it available some of the issues that have failed striking the balance of how you do that and it's kind of the North Star. the bigger problem, I'm not sure it's going to You mentioned that at GSK you against the data to go through a process of Especially in the pharmaceutical industry. as to what is excess, right, so you and do the discovery and I presume Okay, so find out what you so that's kind of the start. all the data and putting it into essentially one place. and then you sort of drew the diagram of and that's the piece that people are missing. So, my question there is you had this Well, the clinical trials was one aspect. My question was what about, you hear about these and homogenized, I think you have to exercise, and that presumably can be automated. and then they go to another area and that's now how you get big things done. Well that's not how you support analytic and you standardized those into one. on the data to make it available to do the fun stuff. and I guess in the case of pharma the difference that a CDO needs to make is of the organization, that can be Yeah, and I think what I've shared and the environment that has been created This is theCUBE, Dave Vellante with Paul Gillin.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Mark	PERSON	0.99+
Mark Ramsey	PERSON	0.99+
15 petabytes	QUANTITY	0.99+
Samsung	ORGANIZATION	0.99+
Inderpal Bhandari	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Barron	PERSON	0.99+
Ramsey International, LLC	ORGANIZATION	0.99+
26 thousand schemas	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
18 years	QUANTITY	0.99+
2015	DATE	0.99+
thousands	QUANTITY	0.99+
Einstein	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
tomorrow	DATE	0.99+
Samsung Mobile	ORGANIZATION	0.99+
26 thousand	QUANTITY	0.99+
Ramsey International LLC	ORGANIZATION	0.99+
30 plus year	QUANTITY	0.99+
a year later	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Federal Rules of Civil Procedure	TITLE	0.99+
20	QUANTITY	0.99+
25	QUANTITY	0.99+
Both	QUANTITY	0.99+
first step	QUANTITY	0.99+
one petabyte	QUANTITY	0.98+
today	DATE	0.98+
15	QUANTITY	0.98+
one	QUANTITY	0.98+
three approaches	QUANTITY	0.98+
13th year	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
MIT	ORGANIZATION	0.97+
seven years ago	DATE	0.97+
McDonald's	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.95+
R&D	ORGANIZATION	0.95+
10 years ago	DATE	0.95+
this morning	DATE	0.94+
this evening	DATE	0.93+
one place	QUANTITY	0.93+
one perspective	QUANTITY	0.92+
about a year and a half ago	DATE	0.91+
over 32 years ago	DATE	0.9+
a lot of talk	QUANTITY	0.9+
a billion documents	QUANTITY	0.9+
CDO	TITLE	0.89+
decades	QUANTITY	0.88+
one statistic	QUANTITY	0.87+
2019	DATE	0.85+
first data	QUANTITY	0.84+
of years ago	DATE	0.83+
Step two	QUANTITY	0.8+
Tamr	OTHER	0.77+
Information Quality Symposium 2019	EVENT	0.77+
PowerPoints	TITLE	0.76+
documents	QUANTITY	0.75+
theCUBE	ORGANIZATION	0.75+
one physical	QUANTITY	0.73+
10 years	QUANTITY	0.72+
87, 88 range	QUANTITY	0.71+
President	PERSON	0.7+
Chief Data Officer	PERSON	0.7+
Enterprise Data Warehouse	ORGANIZATION	0.66+
Goka-lan	ORGANIZATION	0.66+
first Chief Data	QUANTITY	0.63+
first Chief Data Officer	QUANTITY	0.63+
Edge	TITLE	0.63+
tons	QUANTITY	0.62+

Lenley Hensarling & Marc Linster, EnterpriseDB - #IBMEdge

>> Announcer: Live from Las Vegas! It's theCUBE. Covering Edge 2016. Brought to you by IBM. Here's your host, Dave Vellante. >> Welcome back to IBM Edge everybody. This is theCUBE's fifth year covering IBM Edge. We were at the inaugural Edge five years ago in Orlando. Marc Linster is here and he's joined by Lenley Hensarling. Marc is the Senior Vice President of Product Development. And Lenley is the Senior Vice President of Product Management and Strategy at EDB, Enterprise Database. Gentlemen, welcome to theCUBE. Thanks for coming on. >> Male Voice: Thank you. >> Okay, who wants to start. Enterprise Database, tell us about the company and what you guys are all about. >> Well the company has been around for little over 10 years now. And our job is really to give companies the ability to use Postgres as the platform for their digital business. So think about this, Postgres is a great open source database. Great capabilities for transactional management of data. But also multi-model data management. So think about standard SQL data but think also about document oriented, think about key-value pair. Think about GIS. So a great capability that is very, very robust. Has been around for quite a few years. And is really ready to allow companies to build on them for the new digital business but also to migrate off their existing commercial databases that are too expensive. >> What's the history of Postgres? Can you sort of educate me on that? >> Sort of the same roots back with System R, where DB2 came from, Oracle came from. So Berkeley, that's where the whole thing started out. Postgres is really the successor to Ingres. >> Dave: Umhmm. >> And then it turned into PostgreSQL. And it has been licensed under open source license, the Postgres license since 1996. And it's a very, very vibrant open source community that has been driving forward for many years now. And our view is the best available relational and multi-model database today. >> It's the mainspring of relational database management systems essentially >> Marc: Yeah. >> is what you're saying. And Lindley, from a product standpoint, how do you productize that, open source. >> Open source really, companies that have a distribution of open source for database and operating system, whatever the open source company most people are acquainted with, is Red Hat and Linux right. And so, we do the same thing that they do but for Postgres database. We take the distribution, we add testing, we add some other functionality around it so you can run Postgres responsively as Marc likes to say. So high availability, capability, fail-over management, replication, a backup solution. And instead of leaving it as an exercise for a customer, who wants to use open source, we test all this together. And then we validate it and we give them a complete package with documentation and services that they can access to help them be successful it. >> So if Michael Stonebraker were sitting right here, I say Michael, what do you think about Postgres? I'd say I had to start Vertica because we needed a new way. Yet, sort of PostgreSQL, is the killer remains the killer platform in the industry, doesn't it? >> Male Voice: Umhmm. Why is that? It's interesting when you talk to guys like Stonebraker, it's sort of dogma almost. But yet, customers, talk with their wallet. >> And it is, >> He did a very, very nice job of architecting it. It is a database that is extensible. The reason we add the first JSONB or document oriented implementation in the relational database space is because it was designed to make it easy to add new capabilities, new datatypes, new indexes, et cetera, into the same transactional model. That's why we have JSONB. That's why we have PostGIS. That's why we have key-value pair. So it was really well architected. And when you think about who else, not just Vertica has taken this engine >> Dave: Yeah. >> It is in Netezza, it is in a bunch of other. >> Dave: Master Data. >> Lenley: Greenplum. >> Greenplum yes. So it's a really robust architecture. Very, very nicely designed. It just does the job and it does it really well. Which is, what you want a database to do, right. It's not that exciting but it's really stable. It really works. The data is still there tomorrow. That's what really the requirements are. >> And to translate a little bit, Marc mentioned PostGIS, which is geo spacial capability for the Postgres database. And so we distribute that along with Postgres and test it so that you know it works. And he mentioned H-Store, so that's how you can actually store internet of things data really well into Postgres. And we talk about SQL, noSQL databases, so they're document databases. And the ability to have personalization at the same level you can in a document oriented database but in a structured SQL database are the kinds of things that have been added to Postgres over the years. Again, it's because of the basic architecture that Stonebraker put in place as an object relational database. >> It's so interesting to look at the history of database. Talk about Stonebraker, he's been on a number of times. It's just fascinating to listen to one of the fathers of this industry. But 10 years ago, database was like such a boring topic. And now it's exploded. Now you got Amazon going after Oracle. Oracle fighting the good fight. So many noSQL databases coming in. SQL becoming the killer big data app if you will. >> Male Voice: Umhmm. >> Why all of a sudden did database get so interesting? >> What happened was, application models changed. Led by Facebook, led by Amazon and Google. They said, let's refactor the applications and let's refactor the way we handle storage. >> Dave: Umhmm. >> And that led to the rise of the polyglot of databases is what a lot of people are saying. You have fit for purpose solutions and you may have three or four or five of them in your overall architecture. One thing about Postgres is, we're able to, because of the datatypes support that Marc mentioned, fit into that well. We don't try and do everything so if somebody says, I'm going to use Mongo for data capture, or I'm going to use Cassandra for capturing my internet of things data. We have what we call foreign data wrappers in the Postgres world. We call them just Enterprise DB Adapters but to Mongo, to Casandra, to Hadoop and can do bidirectional data there and just keep that data at rest over there in the other world. But be able to project relational schema onto it. We can push our data into those. We've got a great use case we've been talking about with a customer who had over a petabyte of data. And in the past what you do is, you'd go buy an expensive archiving solution and add that to it. Now, you just use Hadoop distributed file system. Push the data off there as it ages and have a foreign data wrapper that allows you to still query that data when it's out of your basic operational dataset. And move forward. >> Can I call that a connector or? >> Lenley: Yeah, a connector, that's not a bad idea. >> And it's interesting because If you guys remember Hadapt, probably. [Male Voices] Yeah. Yes. >> They came out, they were the connector killer. >> Male Voice: Umhmm. >> And it failed. >> Male Voice: Yeah. >> Seems like connectors are just fine. >> Male Voice: Yeah. >> And one of the really interesting things is, we call it data federation right. With philosophy here is, leave the data where it is. There are some data that should live in Hadoop or Cassandra. If I'm doing an e-commerce site with transactions and click streams, well, the click streams really should live in Hadoop. That the night natural place for them. The transactions should be in a transactional database. With the foreign data wrapper, I can run queries without moving the data, that will allow me to say, well, before you bought the brown teddy bear, which pages did you look at? >> Dave: Yeah. >> And I can do that integrated system and I can do a fit for purpose architecture. And that's what we think is really exciting. >> And that's fundamental to this new sort of programming or application models. >> Male Voice: That's right. >> The one that you were talking about is moving five megabytes of code to a petabyte of data. As opposed to moving data which we know has gravity and speed of light issues and so forth. >> Thank you for that little brief education. Appreciate it. So let's get into your business now, your relationship with IBM. What customers are doing. You mentioned IoT data so talk more about your business and your relationship with IBM and what you guys are doing for customers. >> There are a couple of things. We mentioned Oracle. And there are all the new databases. And then there's your, dare we say, legacy, proprietary databases as well. And people are looking to become more efficient in how they spend. We've done another thing with Postgres. We've added Oracle compatibility in terms of datatypes. So we support all the datatypes that Oracle does. And we support PL/SQL, they're sort of variant of stored procedure language. And implemented a lot of the packages that they have as well. So we can migrate workloads from Oracle over into an open source based solution. And give a lot cost effectiveness options to customers. >> Dave: Steal. This is a way that I can sort of have Oracle licensed database licensed and maintenance avoidance. >> Lenley: Yes. Yeah. >> Where possible, right. >> Where it makes sense. Where it makes sense. >> Obvious my quorum, I keep, but let's face it, the number one cost component of a TCO analysis of an Oracle customer is the database license and maintenance cost. >> Male Voice: That's right. >> It's not the people. One of the few examples I can think of where that's the case. There's always the people cost. [Male Voice] That's right, that's right. IT is very labor intensive. But for an Oracle customer, it's the database license. Cuz they license by Core. >> Male Voice: Yup. Cores are going through the roof. >> Male Voice: That's right. It's been great for Oracle's business. Although, wouldn't you agree, Oracle sees the writing on the wall that the SAS is really sort of the new control point for the industry. You see the acquisition of NetSuite and competition with Workday >> Male Voice: Yup. >> and the like. >> But the database remains the heart of the business. >> And really it's movement to the cloud, both private cloud and public cloud. And so we've been doing work there. We've had public cloud database as a service solution on Amazon for, what, [Marc] Four years. >> Four years, Marc. And have gained a lot experience with that. And were running that sort of running a retail, you can license the database and we'll provision it there. And so what we've done recently is change our perspective and said, let's put this into hands of customers. And let them standup their own database as a service. But also do it in a way that they can choose what workload should go to Amazon and what workload might go to their private cloud, built on open stack. And be able to arbitrage that if you will. Because they now have a way to provision the databases and make a choice about where to put it. >> So that's a bring your own license model that you just talked about? >> Bring your own license model or >> Are you in the Marketplace and, >> We're in the Marketplace in Amazon, where we can supply it that way. But customers have shown a preference for bring your own license. They want to make the best enterprise deal they can with a vendor like us or whomever else. And then have control over it. >> Amazon obviously wants you to be in the Marketplace. I won't even mention but I talked to some CEOs of database companies and they say, you know, we're in the Marketplace but we get in the Marketplace, next thing you know, Amazon is pushing them towards DynamoDB or you know. >> Male Voice: That's right, that's right. >> Now Amazon's come out with Aurora and Oracle migration and you know the intent to go after that business. Amazon's moving up the stack and you got to be careful. >> They are. But the thing about Amazon is that, they're a pure play in the cloud company. >> Dave: Yup. >> And all of the data shows that it's like a mix, it's going to be a hybrid cloud. Half the company in this world [Dave] Not Angie Jassie's data >> Eighty percent of the people in the cloud are going to be on-prem, still continuing their journey through virtualization. >> Dave: Yeah, that's right. >> Let along going to the cloud. But we want to be something that let's them put what they want in the public cloud and let's them manage on the private cloud in the same manner. So they can provision databases with a few clicks. Just like they do on Amazon. But do it in their data center. >> You doing that with Softlayer as well or not yet? >> Lenley: Not yet. >> Marc: Not yet. >> We've built this provisioning capability ourselves. And it came out of the work we did putting up databases on Amazon. >> So what are you guys doing here at Edge. Edge is kind of infrastructure show. Database is infrastructure. >> We're talking about our work with Power. >> Power is a big partner for us. Power is I think very, very interesting for our database customers. Because of the much higher clock speeds and the capabilities that the Power processor has. When I'm looking at Power, I get more oomph out of a single core which really for a database customer is very, very interesting. Because all databases are licensed by Core. >> Dave: Right. >> So it's a much better deal for the customer. And specifically for Postgres, Postgres scales very well with higher clock speeds. So by having, let's say, by growing performance, not by adding more cores but by making the individual cores faster, that plays very, very well to the Postgres capabilities. >> Okay, so you are a Power partner, part of that ecosystem that IBM is appealing to to grow the OpenPOWER base. And what kind of workloads are you seeing your customers demand and where you're having success? >> Across the board. Database is mostly infrastructure capabilities so there's a lot of interest that we're seeing that, for all kinds of applications really. >> What's the typical Power customer look like these days? You got some Oracle, you got some DB2, you guys are running on there, what's the mix? Paint the picture for us. >> I think the typical Power customer is the typical enterprise company. And, [Dave] Little bit of everything. >> It's a little bit of everything. But one of the key things is that, people are also looking at what they've got and the skills they have in place. You were talking about people cost right. [Dave] Yeah. >> And their understanding of management. Their understanding of how to manage the relationship with the vendor even. And then saying, look, how can I move into the new world of digital transformation and start my own private cloud options and things like that in an efficient way. That makes efficient use of hardware I have in place and has a growth curve and new hardware that's coming out that fits my workloads. >> Dave: Umhmm. >> And the profiles that Marc was talking about. >> And also the resources. Which is very interesting when we look at these new digital applications with Postgres. Because you can do so much in Postgres from geographic information systems to document oriented to key-value. But you can do that with your existing developers through existing DBAs. They don't need to go to school to learn a new database. And that's also a very, very, interesting capability. So you can use your existing team to do new stuff. [Male Voice] Yup. >> What's happening in IoT, what problems are you solving there and where's the limit? >> Sensor data collection. >> Lenley: Yeah. Real interesting because sensor data tends to come in all different forms. We have customer who collects temperature sensor, temperature data. But the sensors are all sending different data packets. So because we can do document oriented or key-value, we can easily accommodate that. In the old days with the relational model, I had to do all kinds of tricks to sort of stuff all that into a relational table. My table would be almost empty at the end because I'd have to add columns for every vendor et cetera. Here, now I can use put all that into the same format and provide it for analysis. So that's a real interesting capability. >> And it's interesting too because we've got really strong geo spacial data support. And the intersection of that, with IoT is a big deal. They track your iPhone, they know where we are. They know what's going on. That's sensor data. They know which lights in which building, which you know, louvers that are controlling HVAC are malfunctioning or not. They want to know specifically where it is, not just what the sensor is. And some of that stuff moves around. And it gets replaced in a new place in the building and such. So we're well setup to handle those types of workloads. >> What's interesting, when IBM bought the weather company, [Lenley] Yeah. >> And they thought okay great, they're getting all these data scientists and weather data, that's cool. They can monetize that but it's an IoT play, isn't it? [Male Voice] Right. Right. >> Talk about sensor. >> It's reference data. It's reference data for other company specific IoT plays. To have a broader set of sensors out there in their region and understand what's happening with weather and things. And then play that against what their experience is, managing new building or manufacturing processes, everything. >> So what's the engagement model. I'm a customer, I want to do business with you. How do I do it, how do I engage? >> Well, a lot of our businesses direct with us. Others through partners. And then a lot of customers come to us because they want to get off legacy systems. But really, what they do is, once they understand the database and the capabilities, they say, okay yeah, you can do the Oracle stuff. But what I'm really going to do with you is my new things. Because that's really exciting and it helps me kind of put a lid on the commercial license growth. So maybe I'm not going to get off it, but I will stop growing it. So I will start doing my new stuff on Postgres. Whenever I modernize something, Postgres is going to be my database of choice. If I already open up an application with its whole stack, this is one of the changes I'm going to make. And then the database as service, is very, very interesting. So these four entry vectors and what happens is, quite a few customers after a short time when they started with project or applications, they end up making Postgres as one of their database standards. Not the only one. But they make it one of the database standards so it gets into the catalog and every new project then has to consider Postgres. >> It's interesting, there's a space created as Microsoft sort of put all their wood behind the era of becoming a competitor to high end Oracle. And with this last release, they probably are on there, arguable. But they've also raised their prices too. And they've made the solution more complex. So there's this space that was vacated for like a ton of workloads and Postgres fits in there just about perfectly. We see enterprise after enterprise come to us with a sheet that says, now we're going to get some of this noSQL stuff. We're going to keep Oracle or DB2 over here for these really high end things. Run my financials, run my sales order processing, my manufacturing. And then we got this space in here. We got a slot for relational database and we want to go open source. Because of the cost savings. Because of other factors. It's ability to grow and not be bound to, hey, what if the vendor decides they're going to go for a new cooler thing and make me upgrade. >> Dave: Right. >> And I want to stay there and know that there's still being an investment made. And so there's a vibrant community around it. And it just fits that slot perfectly. >> You got to pay for that digital transformation and all these IoT initiates. You can't just keep pouring [Male Voice] Somehow. >> down to database licenses. [Male Voice] That's right. >> Tell me, we have to leave it there. >> Thanks very much >> Male Voice: Alright. >> for coming to theCUBE. >> Thanks so much. >> We appreciate the time. You welcome. [Male Voice] Enjoy it. Keep it right there buddy. We'll be right back with our next guest. This is theCUBE. We're live from IBM Edge 2016, be right back. (upbeat music)

Published Date : Sep 20 2016

SUMMARY :

Brought to you by IBM. And Lenley is the Senior Vice President tell us about the company and what you guys are all about. And is really ready to allow companies to build on them Postgres is really the successor to Ingres. And it's a very, very vibrant open source community And Lindley, from a product standpoint, And then we validate it and we give them a complete package is the killer It's interesting when you talk to guys like Stonebraker, And when you think about who else, Netezza, it is in a bunch of other. It just does the job and it does it really well. And the ability to have personalization SQL becoming the killer big data app if you will. and let's refactor the way we handle storage. And in the past what you do is, And it's interesting because And one of the really interesting things is, And I can do that integrated system And that's fundamental to this new sort of is moving five megabytes of code to a petabyte of data. and what you guys are doing for customers. And implemented a lot of the packages This is a way that I can sort of have Oracle licensed Where it makes sense. is the database license and maintenance cost. But for an Oracle customer, it's the database license. Male Voice: Yup. that the SAS is really sort of And really it's movement to the cloud, And be able to arbitrage that if you will. We're in the Marketplace in Amazon, of database companies and they say, you know, and you know the intent to go after that business. But the thing about Amazon is that, And all of the data shows Eighty percent of the people in the cloud in the same manner. And it came out of the work we did So what are you guys doing here at Edge. and the capabilities that the Power processor has. So it's a much better deal for the customer. And what kind of workloads Across the board. What's the typical Power customer look like these days? is the typical enterprise company. and the skills they have in place. manage the relationship with the vendor even. And also the resources. In the old days with the relational model, And the intersection of that, with IoT is a big deal. What's interesting, when IBM bought the weather company, And they thought okay great, And then play that against what their experience is, I'm a customer, I want to do business with you. And then a lot of customers come to us Because of the cost savings. And it just fits that slot perfectly. You got to pay for that digital transformation down to database licenses. We appreciate the time.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Marc Linster	PERSON	0.99+
Marc	PERSON	0.99+
Lenley	PERSON	0.99+
Lenley Hensarling	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Michael	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Angie Jassie	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Four years	QUANTITY	0.99+
Eighty percent	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Lindley	PERSON	0.99+
Las Vegas	LOCATION	0.99+
four	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Orlando	LOCATION	0.99+
1996	DATE	0.99+
three	QUANTITY	0.99+
Postgres	ORGANIZATION	0.99+
fifth year	QUANTITY	0.99+
SQL	TITLE	0.99+
one	QUANTITY	0.98+
Stonebraker	PERSON	0.98+
10 years ago	DATE	0.98+
five years ago	DATE	0.98+
both	QUANTITY	0.98+
Berkeley	LOCATION	0.98+
first	QUANTITY	0.97+
Michael Stonebraker	PERSON	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Stonebraker: