Ben White, Domo | Virtual Vertica BDC 2020

>> Announcer: It's theCUBE covering the Virtual Vertica Big Data Conference 2020, brought to you by Vertica. >> Hi, everybody. Welcome to this digital coverage of the Vertica Big Data Conference. You're watching theCUBE and my name is Dave Volante. It's my pleasure to invite in Ben White, who's the Senior Database Engineer at Domo. Ben, great to see you, man. Thanks for coming on. >> Great to be here and here. >> You know, as I said, you know, earlier when we were off-camera, I really was hoping I could meet you face-to-face in Boston this year, but hey, I'll take it, and, you know, our community really wants to hear from experts like yourself. But let's start with Domo as the company. Share with us what Domo does and what your role is there. >> Well, if I can go straight to the official what Domo does is we provide, we process data at BI scale, we-we-we provide BI leverage at cloud scale in record time. And so what that means is, you know, we are a business-operating system where we provide a number of analytical abilities to companies of all sizes. But we do that at cloud scale and so I think that differentiates us quite a bit. >> So a lot of your work, if I understand it, and just in terms of understanding what Domo does, there's a lot of pressure in terms of being real-time. It's not, like, you sometimes don't know what's coming at you, so it's ad-hoc. I wonder if you could sort of talk about that, confirm that, maybe add a little color to it. >> Yeah, absolutely, absolutely. That's probably the biggest challenge it is to being, to operating Domo is that it is an ad hoc environment. And certainly what that means, is that you've got analysts and executives that are able to submit their own queries with out very... With very few limitations. So from an engineering standpoint, that challenge in that of course is that you don't have this predictable dashboard to plan for, when it comes to performance planning. So it definitely presents some challenges for us that we've done some pretty unique things, I think, to address those. >> So it sounds like your background fits well with that. I understand your people have called you a database whisperer and an envelope pusher. What does that mean to a DBA in this day and age? >> The whisperer part is probably a lost art, in the sense that it's not really sustainable, right? The idea that, you know, whatever it is I'm able to do with the database, it has to be repeatable. And so that's really where analytics comes in, right? That's where pushing the envelope comes in. And in a lot of ways that's where Vertica comes in with this open architecture. And so as a person who has a reputation for saying, "I understand this is what our limitations should be, but I think we can do more." Having a platform like Vertica, with such an open architecture, kind of lets you push those limits quite a bit. >> I mean I've always felt like, you know, Vertica, when I first saw the stone breaker architecture and talked to some of the early founders, I always felt like it was the Ferrari of databases, certainly at the time. And it sounds like you guys use it in that regard. But talk a little bit more about how you use Vertica, why, you know, why MPP, why Vertica? You know, why-why can't you do this with RDBMS? Educate us, a little bit, on, sort of, the basics. >> For us it was, part of what I mentioned when we started, when we talked about the very nature of the Domo platform, where there's an incredible amount of resiliency required. And so Vertica, the MPP platform, of course, allows us to build individual database clusters that can perform best for the workload that might be assigned to them. So the open, the expandable, the... The-the ability to grow Vertica, right, as your base grows, those are all important factors, when you're choosing early on, right? Without a real idea of how growth would be or what it will look like. If you were kind of, throwing up something to the dark, you look at the Vertica platform and you can see, well, as I grow, I can, kind of, build with this, right? I can do some unique things with the platform in terms of this open architecture that will allow me to not have to make all my decisions today, right? (mutters) >> So, you're using Vertica, I know, at least in part, you're working with AWS as well, can you describe sort of your environment? Do you give anything on-prem, is everything in cloud? What's your set up look like? >> Sure, we have a hybrid cloud environment where we have a significant presence in public files in our own private cloud. And so, yeah, having said that, we certainly have a really an extensive presence, I would say, in AWS. So, they're definitely the partner of our when it comes to providing the databases and the server power that we need to operate on. >> From a standpoint of engineering and architecting a database, what were some of the challenges that you faced when you had to create that hybrid architecture? What did you face and how did you overcome that? >> Well, you know, some of the... There were some things we faced in terms of, one, it made it easy that Vertica and AWS have their own... They play well together, we'll say that. And so, Vertica was designed to work on AWS. So that part of it took care of it's self. Now our own private cloud and being able to connect that to our public cloud has been a part of our own engineering abilities. And again, I don't want to make little, make light of it, it certainly not impossible. And so we... Some of the challenges that pertain to the database really were in the early days, that you mentioned, when we talked a little bit earlier about Vertica's most recent eon mode. And I'm sure you'll get to that. But when I think of early challenges, some of the early challenges were the architecture of enterprise mode. When I talk about all of these, this idea that we can have unique databases or database clusters of different sizes, or this elasticity, because really, if you know the enterprise architecture, that's not necessarily the enterprise architecture. So we had to do some unique things, I think, to overcome that, right, early. To get around the rigidness of enterprise. >> Yeah, I mean, I hear you. Right? Enterprise is complex and you like when things are hardened and fossilized but, in your ad hoc environment, that's not what you needed. So talk more about eon mode. What is eon mode for you and how do you apply it? What are some of the challenges and opportunities there, that you've found? >> So, the opportunities were certainly in this elastic architecture and the ability to separate in the storage, immediately meant that for some of the unique data paths that we wanted to take, right? We could do that fairly quickly. Certainly we could expand databases, right, quickly. More importantly, now you can reduce. Because previously, in the past, right, when I mentioned the enterprise architecture, the idea of growing a database in itself has it's pain. As far as the time it takes to (mumbles) the data, and that. Then think about taking that database back down and (telephone interference). All of a sudden, with eon, right, we had this elasticity, where you could, kind of, start to think about auto scaling, where you can go up and down and maybe you could save some money or maybe you could improve performance or maybe you could meet demand, At a time where customers need it most, in a real way, right? So it's definitely a game changer in that regard. >> I always love to talk to the customers because I get to, you know, I hear from the vendor, what they say, and then I like to, sort of, validate it. So, you know, Vertica talks a lot about separating compute and storage, and they're not the only one, from an architectural standpoint who do that. But Vertica stresses it. They're the only one that does that with a hybrid architecture. They can do it on-prem, they can do it in the cloud. From your experience, well first of all, is that true? You may or may not know, but is that advantageous to you, and if so, why? >> Well, first of all, it's certainly true. Earlier in some of the original beta testing for the on-prem eon modes that we... I was able to participate in it and be aware of it. So it certainly a realty, they, it's actually supported on Pure storage with FlashBlade and it's quite impressive. You know, for who, who will that be for, tough one. It's probably Vertica's question that they're probably still answering, but I think, obviously, some enterprise users that probably have some hybrid cloud, right? They have some architecture, they have some hardware, that they themselves, want to make use of. We certainly would probably fit into one of their, you know, their market segments. That they would say that we might be the ones to look at on-prem eon mode. Again, the beauty of it is, the elasticity, right? The idea that you could have this... So a lot of times... So I want to go back real quick to separating compute. >> Sure. Great. >> You know, we start by separating it. And I like to think of it, maybe more of, like, the up link. Because in a true way, it's not necessarily separated because ultimately, you're bringing the compute and the storage back together. But to be able to decouple it quickly, replace nodes, bring in nodes, that certainly fits, I think, what we were trying to do in building this kind of ecosystem that could respond to unknown of a customer query or of a customer demand. >> I see, thank you for that clarification because you're right, it's really not separating, it's decoupling. And that's important because you can scale them independently, but you still need compute and you still need storage to run your work load. But from a cost standpoint, you don't have to buy it in chunks. You can buy in granular segments for whatever your workload requires. Is that, is that the correct understanding? >> Yeah, and to, the ability to able to reuse compute. So in the scenario of AWS or even in the scenario of your on-prem solution, you've got this data that's safe and secure in (mumbles) computer storage, but the compute that you have, you can reuse that, right? You could have a scenario that you have some query that needs more analytic, more-more fire power, more memory, more what have you that you have. And so you can kind of move between, and that's important, right? That's maybe more important than can I grow them separately. Can I, can I borrow it. Can I borrow that compute you're using for my (cuts out) and give it back? And you can do that, when you're so easily able to decouple the compute and put it where you want, right? And likewise, if you have a down period where customers aren't using it, you'd like to be able to not use that, if you no longer require it, you're not going to get it back. 'Cause it-it opened the door to a lot of those things that allowed performance and process department to meet up. >> I wonder if I can ask you a question, you mentioned Pure a couple of times, are you using Pure FlashBlade on-prem, is that correct? >> That is the solution that is supported, that is supported by Vertica for the on-prem. (cuts out) So at this point, we have been discussing with them about some our own POCs for that. Before, again, we're back to the idea of how do we see ourselves using it? And so we certainly discuss the feasibility of bringing it in and giving it the (mumbles). But that's not something we're... Heavily on right now. >> And what is Domo for Domo? Tell us about that. >> Well it really started as this idea, even in the company, where we say, we should be using Domo in our everyday business. From the sales folk to the marketing folk, right. Everybody is going to use Domo, it's a business platform. For us in engineering team, it was kind of like, well if we use Domo, say for instance, to be better at the database engineers, now we've pointed Domo at itself, right? Vertica's running Domo in the background to some degree and then we turn around and say, "Hey Domo, how can we better at running you?" So it became this kind of cool thing we'd play with. We're now able to put some, some methods together where we can actually do that, right. Where we can monitor using our platform, that's really good at processing large amounts of data and spitting out useful analytics, right. We take those analytics down, make recommendation changes at the-- For now, you've got Domo for Domo happening and it allows us to sit at home and work. Now, even when we have to, even before we had to. >> Well, you know, look. Look at us here. Right? We couldn't meet in Boston physically, we're now meeting remote. You're on a hot spot because you've got some weather in your satellite internet in Atlanta and we're having a great conversation. So-so, we're here with Ben White, who's a senior database engineer at Domo. I want to ask you about some of the envelope pushing that you've done around autonomous. You hear that word thrown around a lot. Means a lot of things to a lot of different people. How do you look at autonomous? And how does it fit with eon and some of the other things you're doing? >> You know, I... Autonomous and the idea idea of autonomy is something that I don't even know if that I have already, ready to define. And so, even in my discussion, I often mention it as a road to it. Because exactly where it is, it's hard to pin down, because there's always this idea of how much trust do you give, right, to the system or how much, how much is truly autonomous? How much already is being intervened by us, the engineers. So I do hedge on using that. But on this road towards autonomy, when we look at, what we're, how we're using Domo. And even what that really means for Vertica, because in a lot of my examples and a lot of the things that we've engineered at Domo, were designed to maybe overcome something that I thought was a limitation thing. And so many times as we've done that, Vertica has kind of met us. Like right after we've kind of engineered our architecture stuff, that we thought that could help on our side, Vertica has a release that kind of addresses it. So, the autonomy idea and the idea that we could analyze metadata, make recommendations, and then execute those recommendations without innervation, is that road to autonomy. Once the database is properly able to do that, you could see in our ad hoc environment how that would be pretty useful, where with literally millions of queries every hour, trying to figure out what's the best, you know, profile. >> You know for- >> (overlapping) probably do a better job in that, than we could. >> For years I felt like IT folks sometimes were really, did not want that automation, they wanted the knobs to turn. But I wonder if you can comment. I feel as though the level of complexity now, with cloud, with on-prem, with, you know, hybrid, multicloud, the scale, the speed, the real time, it just gets, the pace is just too much for humans. And so, it's almost like the industry is going to have to capitulate to the machine. And then, really trust the machine. But I'm still sensing, from you, a little bit of hesitation there, but light at the end of the tunnel. I wonder if you can comment? >> Sure. I think the light at the end of the tunnel is even in the recent months and recent... We've really begin to incorporate more machine learning and artificial intelligence into the model, right. And back to what we're saying. So I do feel that we're getting closer to finding conditions that we don't know about. Because right now our system is kind of a rule, rules based system, where we've said, "Well these are the things we should be looking for, these are the things that we think are a problem." To mature to the point where the database is recognizing anomalies and taking on pattern (mutters). These are problems you didn't know happen. And that's kind of the next step, right. Identifying the things you didn't know. And that's the path we're on now. And it's probably more exciting even than, kind of, nailing down all the things you think you know. We figure out what we don't know yet. >> So I want to close with, I know you're a prominent member of the, a respected member of the Vertica Customer Advisory Board, and you know, without divulging anything confidential, what are the kinds of things that you want Vertica to do going forward? >> Oh, I think, some of the in dated base for autonomy. The ability to take some of the recommendations that we know can derive from the metadata that already exists in the platform and start to execute some of the recommendations. And another thing we've talked about, and I've been pretty open about talking to it, talking about it, is the, a new version of the database designer, I think, is something that I'm sure they're working on. Lightweight, something that can give us that database design without the overhead. Those are two things, I think, as they nail or basically the database designer, as they respect that, they'll really have all the components in play to do in based autonomy. And I think that's, to some degree, where they're heading. >> Nice. Well Ben, listen, I really appreciate you coming on. You're a thought leader, you're very open, open minded, Vertica is, you know, a really open community. I mean, they've always been quite transparent in terms of where they're going. It's just awesome to have guys like you on theCUBE to-to share with our community. So thank you so much and hopefully we can meet face-to-face shortly. >> Absolutely. Well you stay safe in Boston, one of my favorite towns and so no doubt, when the doors get back open, I'll be coming down. Or coming up as it were. >> Take care. All right, and thank you for watching everybody. Dave Volante with theCUBE, we're here covering the Virtual Vertica Big Data Conference. (electronic music)

Published Date : Mar 31 2020

SUMMARY :

brought to you by Vertica. of the Vertica Big Data Conference. I really was hoping I could meet you face-to-face And so what that means is, you know, I wonder if you could sort of talk about that, confirm that, is that you don't have this predictable dashboard What does that mean to a DBA in this day and age? The idea that, you know, And it sounds like you guys use it in that regard. that can perform best for the workload that we need to operate on. Some of the challenges that pertain to the database and you like when things are hardened and fossilized and the ability to separate in the storage, but is that advantageous to you, and if so, why? The idea that you could have this... And I like to think of it, maybe more of, like, the up link. And that's important because you can scale them the compute and put it where you want, right? that is supported by Vertica for the on-prem. And what is Domo for Domo? From the sales folk to the marketing folk, right. I want to ask you about some of the envelope pushing and a lot of the things that we've engineered at Domo, than we could. But I wonder if you can comment. nailing down all the things you think you know. And I think that's, to some degree, where they're heading. It's just awesome to have guys like you on theCUBE Well you stay safe in Boston, All right, and thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Ben White	PERSON	0.99+
Boston	LOCATION	0.99+
Vertica	ORGANIZATION	0.99+
Atlanta	LOCATION	0.99+
Ferrari	ORGANIZATION	0.99+
Domo	ORGANIZATION	0.99+
Vertica Customer Advisory Board	ORGANIZATION	0.99+
Ben	PERSON	0.99+
two things	QUANTITY	0.98+
this year	DATE	0.98+
Vertica	TITLE	0.98+
theCUBE	ORGANIZATION	0.97+
Vertica Big Data Conference	EVENT	0.97+
Domo	TITLE	0.97+
Domo	PERSON	0.96+
Virtual Vertica Big Data Conference	EVENT	0.96+
Virtual Vertica Big Data Conference 2020	EVENT	0.96+
first	QUANTITY	0.95+
eon	TITLE	0.92+
one	QUANTITY	0.87+
today	DATE	0.87+
millions of queries	QUANTITY	0.84+
FlashBlade	TITLE	0.82+
Virtual Vertica	EVENT	0.75+
couple	QUANTITY	0.7+
Pure FlashBlade	COMMERCIAL_ITEM	0.58+
BDC 2020	EVENT	0.56+
MPP	TITLE	0.55+
times	QUANTITY	0.51+
RDBMS	TITLE	0.48+

Joy King, Vertica | Virtual Vertica BDC 2020

>>Yeah, it's the queue covering the virtual vertical Big Data Conference 2020 Brought to You by vertical. >>Welcome back, everybody. My name is Dave Vellante, and you're watching the Cube's coverage of the verdict of Virtual Big Data conference. The Cube has been at every BTC, and it's our pleasure in these difficult times to be covering BBC as a virtual event. This digital program really excited to have Joy King joining us. Joy is the vice president of product and go to market strategy in particular. And if that weren't enough, he also runs marketing and education curve for him. So, Joe, you're a multi tool players. You've got the technical side and the marketing gene, So welcome to the Cube. You're always a great guest. Love to have you on. >>Thank you so much, David. The pleasure, it really is. >>So I want to get in. You know, we'll have some time. We've been talking about the conference and the virtual event, but I really want to dig in to the product stuff. It's a big day for you guys. You announced 10.0. But before we get into the announcements, step back a little bit you know, you guys are riding the waves. I've said to ah, number of our guests that that brick has always been good. It riding the wave not only the initial MPP, but you you embraced, embraced HD fs. You embrace data science and analytics and in the cloud. So one of the trends that you see the big waves that you're writing >>Well, you're absolutely right, Dave. I mean, what what I think is most interesting and important is because verdict is, at its core a true engineering culture founded by, well, a pretty famous guy, right, Dr Stone Breaker, who embedded that very technical vertical engineering culture. It means that we don't pretend to know everything that's coming, but we are committed to embracing the tech. An ology trends, the innovations, things like that. We don't pretend to know it all. We just do it all. So right now, I think I see three big imminent trends that we are addressing. And matters had we have been for a while, but that are particularly relevant right now. The first is a combination of, I guess, a disappointment in what Hadoop was able to deliver. I always feel a little guilty because she's a very reasonably capable elephant. She was designed to be HD fs highly distributed file store, but she cant be an entire zoo, so there's a lot of disappointment in the market, but a lot of data. In HD FM, you combine that with some of the well, not some the explosion of cloud object storage. You're talking about even more data, but even more data silos. So data growth and and data silos is Trend one. Then what I would say Trend, too, is the cloud Reality Cloud brings so many events. There are so many opportunities that public cloud computing delivers. But I think we've learned enough now to know that there's also some reality. The cloud providers themselves. Dave. Don't talk about it well, because not, is it more agile? Can you do things without having to manage your own data center? Of course you can. That the reality is it's a little more pricey than we expected. There are some security and privacy concerns. There's some workloads that can go to the cloud, so hybrid and also multi cloud deployments are the next trend that are mandatory. And then maybe the one that is the most exciting in terms of changing the world we could use. A little change right now is operationalize in machine learning. There's so much potential in the technology, but it's somehow has been stuck for the most part in science projects and data science lab, and the time is now to operationalize it. Those are the three big trends that vertical is focusing on right now. >>That's great. I wonder if I could ask you a couple questions about that. I mean, I like you have a soft spot in my heart for the and the thing about the Hadoop that that was, I think, profound was it got people thinking about, you know, bringing compute to the data and leaving data in place, and it really got people thinking about data driven cultures. It didn't solve all the problems, but it collected a lot of data that we can now take your third trend and apply machine intelligence on top of that data. And then the cloud is really the ability to scale, and it gives you that agility and that it's not really that cloud experience. It's not not just the cloud itself, it's bringing the cloud experience to wherever the data lives. And I think that's what I'm hearing from you. Those are the three big super powers of innovation today. >>That's exactly right. So, you know, I have to say I think we all know that Data Analytics machine learning none of that delivers real value unless the volume of data is there to be able to truly predict and influence the future. So the last 7 to 10 years has been correctly about collecting the data, getting the data into a common location, and H DFS was well designed for that. But we live in a capitalist world, and some companies stepped in and tried to make HD Fs and the broader Hadoop ecosystem be the single solution to big data. It's not true. So now that the key is, how do we take advantage of all of that data? And now that's exactly what verdict is focusing on. So as you know, we began our journey with vertical back in the day in 2007 with our first release, and we saw the growth of the dupe. So we announced many years ago verdict a sequel on that. The idea to be able to deploy vertical on Hadoop nodes and query the data in Hadoop. We wanted to help. Now with Verdict A 10. We are also introducing vertical in eon mode, and we can talk more about that. But Verdict and Ian Mode for HDs, This is a way to apply it and see sequel database management platform to H DFS infrastructure and data in each DFS file storage. And that is a great way to leverage the investment that so many companies have made in HD Fs. And I think it's fair to the elephant to treat >>her well. Okay, let's get into the hard news and auto. Um, she's got, but you got a mature stack, but one of the highlights of append auto. And then we can drill into some of the technologies >>Absolutely so in well in 2018 vertical announced vertical in Deon mode is the separation of compute from storage. Now this is a great example of vertical embracing innovation. Vertical was designed for on premises, data centers and bare metal servers, tightly coupled storage de l three eighties from Hewlett Packard Enterprises, Dell, etcetera. But we saw that cloud computing was changing fundamentally data center architectures, and it made sense to separate compute from storage. So you add compute when you need compute. You add storage when you need storage. That's exactly what the cloud's introduced, but it was only available on the club. So first thing we did was architect vertical and EON mode, which is not a new product. Eight. This is really important. It's a deployment option. And in 2018 our customers had the opportunity to deploy their vertical licenses in EON mode on AWS in September of 2019. We then broke an important record. We brought cloud architecture down to earth and we announced vertical in eon mode so vertical with communal or shared storage, leveraging pure storage flash blade that gave us all the advantages of separating compute from storage. All of the workload, isolation, the scale up scale down the ability to manage clusters. And we did that with on Premise Data Center. And now, with vertical 10 we are announcing verdict in eon mode on HD fs and vertically on mode on Google Cloud. So what we've got here, in summary, is vertical Andy on mode, multi cloud and multiple on premise data that storage, and that gives us the opportunity to help our customers both with the hybrid and multi cloud strategies they have and unifying their data silos. But America 10 goes farther. >>Well, let me stop you there, because I just wanna I want to mention So we talked to Joe Gonzalez and past Mutual, who essentially, he was brought in. And one of this task was the lead into eon mode. Why? Because I'm asking. You still had three separate data silos and they wanted to bring those together. They're investing heavily in technology. Joe is an expert, though that really put data at their core and beyond Mode was a key part of that because they're using S three and s o. So that was Ah, very important step for those guys carry on. What else do we need to know about? >>So one of the reasons, for example, that Mass Mutual is so excited about John Mode is because of the operational advantages. You think about exactly what Joe told you about multiple clusters serving must multiple use cases and maybe multiple divisions. And look, let's be clear. Marketing doesn't always get along with finance and finance doesn't necessarily get along with up, and I t is often caught the middle. Erica and Dion mode allows workload, isolation, meaning allocating the compute resource is that different use cases need without allowing them to interfere with other use cases and allowing everybody to access the data. So it's a great way to bring the corporate world together but still protect them from each other. And that's one of the things that Mass Mutual is going to benefit from, as well, so many of >>our other customers I also want to mention. So when I saw you, ah, last last year at the Pure Storage Accelerate conference just today we are the only company that separates you from storage that that runs on Prem and in the cloud. And I was like I had to think about it. I've researched. I still can't find anybody anybody else who doesn't know. I want to mention you beat actually a number of the cloud players with that capability. So good job and I think is a differentiator, assuming that you're giving me that cloud experience and the licensing and the pricing capability. So I want to talk about that a little >>bit. Well, you're absolutely right. So let's be clear. There is no question that the public cloud public clouds introduced the separation of compute storage and these advantages that they do not have the ability or the interest to replicate that on premise for vertical. We were born to be software only. We make no money on underlying infrastructure. We don't charge as a package for the hardware underneath, so we are totally motivated to be independent of that and also to continuously optimize the software to be as efficient as possible. And we do the exact same thing to your question about life. Cloud providers charge for note indignance. That's how they charge for their underlying infrastructure. Well, in some cases, if you're being, if you're talking about a use case where you have a whole lot of data, but you don't necessarily have a lot of compute for that workload, it may make sense to pay her note. Then it's unlimited data. But what if you have a huge compute need on a relatively small data set that's not so good? Vertical offers per node and four terabyte for our customers, depending on their use case, we also offer perpetual licenses for customers who want capital. But we also offer subscription for companies that they Nope, I have to have opt in. And while this can certainly cause some complexity for our field organization, we know that it's all about choice, that everybody in today's world wants it personalized just for me. And that's exactly what we're doing with our pricing in life. >>So just to clarify, you're saying I can pay by the drink if I want to. You're not going to force me necessarily into a term or Aiken choose to have, you know, more predictable pricing. Is that, Is that correct? >>Well, so it's partially correct. The first verdict, a subscription licensing is a fixed amount for the period of the subscription. We do that so many of our customers cannot, and I'm one of them, by the way, cannot tell finance what the budgets forecast is going to be for the quarter after I spent you say what it's gonna be before, So our subscription facing is a fixed amount for a period of time. However, we do respect the fact that some companies do want usage based pricing. So on AWS, you can use verdict up by the hour and you pay by the hour. We are about to launch the very same thing on Google Cloud. So for us, it's about what do you need? And we make it happen natively directly with us or through AWS and Google Cloud. >>So I want to send so the the fixed isn't some floor. And then if you want a surge above that, you can allow usage pricing. If you're on the cloud, correct. >>Well, you actually license your cluster vertical by the hour on AWS and you run your cluster there. Or you can buy a license from vertical or a fixed capacity or a fixed number of nodes and deploy it on the cloud. And then, if you want to add more nodes or add more capacity, you can. It's not usage based for the license that you bring to the cloud. But if you purchase through the cloud provider, it is usage. >>Yeah, okay. And you guys are in the marketplace. Is that right? So, again, if I want up X, I can do that. I can choose to do that. >>That's awesome. Next usage through the AWS marketplace or yeah, directly from vertical >>because every small business who then goes to a salesforce management system knows this. Okay, great. I can pay by the month. Well, yeah, Well, not really. Here's our three year term in it, right? And it's very frustrating. >>Well, and even in the public cloud you can pay for by the hour by the minute or whatever, but it becomes pretty obvious that you're better off if you have reserved instance types or committed amounts in that by vertical offers subscription. That says, Hey, you want to have 100 terabytes for the next year? Here's what it will cost you. We do interval billing. You want to do monthly orderly bi annual will do that. But we won't charge you for usage that you didn't even know you were using until after you get the bill. And frankly, that's something my finance team does not like. >>Yeah, I think you know, I know this is kind of a wonky discussion, but so many people gloss over the licensing and the pricing, and I think my take away here is Optionality. You know, pricing your way of That's great. Thank you for that clarification. Okay, so you got Google Cloud? I want to talk about storage. Optionality. If I found him up, I got history. I got I'm presuming Google now of you you're pure >>is an s three compatible storage yet So your story >>Google object store >>like Google object store Amazon s three object store HD fs pure storage flash blade, which is an object store on prim. And we are continuing on this theft because ultimately we know that our customers need the option of having next generation data center architecture, which is sort of shared or communal storage. So all the data is in one place. Workloads can be managed independently on that data, and that's exactly what we're doing. But what we already have in two public clouds and to on premise deployment options today. And as you said, I did challenge you back when we saw each other at the conference. Today, vertical is the only analytic data warehouse platform that offers that option on premise and in multiple public clouds. >>Okay, let's talk about the ah, go back through the innovation cocktail. I'll call it So it's It's the data applying machine intelligence to that data. And we've talked about scaling at Cloud and some of the other advantages of Let's Talk About the Machine Intelligence, the machine learning piece of it. What's your story there? Give us any updates on your embracing of tooling and and the like. >>Well, quite a few years ago, we began building some in database native in database machine learning algorithms into vertical, and the reason we did that was we knew that the architecture of MPP Columbia execution would dramatically improve performance. We also knew that a lot of people speak sequel, but at the time, not so many people spoke R or even Python. And so what if we could give act us to machine learning in the database via sequel and deliver that kind of performance? So that's the journey we started out. And then we realized that actually, machine learning is a lot more as everybody knows and just algorithms. So we then built in the full end to end machine learning functions from data preparation to model training, model scoring and evaluation all the way through to fold the point and all of this again sequel accessible. You speak sequel. You speak to the data and the other advantage of this approach was we realized that accuracy was compromised if you down sample. If you moved a portion of the data from a database to a specialty machine learning platform, you you were challenged by accuracy and also what the industry is calling replica ability. And that means if a model makes a decision like, let's say, credit scoring and that decision isn't anyway challenged, well, you have to be able to replicate it to prove that you made the decision correctly. And there was a bit of, ah, you know, blow up in the media not too long ago about a credit scoring decision that appeared to be gender bias. But unfortunately, because the model could not be replicated, there was no way to this Prove that, and that was not a good thing. So all of this is built in a vertical, and with vertical 10. We've taken the next step, just like with with Hadoop. We know that innovation happens within vertical, but also outside of vertical. We saw that data scientists really love their preferred language. Like python, they love their tools and platforms like tensor flow with vertical 10. We now integrate even more with python, which we have for a while, but we also integrate with tensorflow integration and PM ML. What does that mean? It means that if you build and train a model external to vertical, using the machine learning platform that you like, you can import that model into a vertical and run it on the full end to end process. But run it on all the data. No more accuracy challenges MPP Kilometer execution. So it's blazing fast. And if somebody wants to know why a model made a decision, you can replicate that model, and you can explain why those are very powerful. And it's also another cultural unification. Dave. It unifies the business analyst community who speak sequel with the data scientist community who love their tools like Tensorflow and Python. >>Well, I think joy. That's important because so much of machine intelligence and ai there's a black box problem. You can't replicate the model. Then you do run into a potential gender bias. In the example that you're talking about there in their many you know, let's say an individual is very wealthy. He goes for a mortgage and his wife goes for some credit she gets rejected. He gets accepted this to say it's the same household, but the bias in the model that may be gender bias that could be race bias. And so being able to replicate that in and open up and make the the machine intelligence transparent is very, very important, >>It really is. And that replica ability as well as accuracy. It's critical because if you're down sampling and you're running models on different sets of data, things can get confusing. And yet you don't really have a choice. Because if you're talking about petabytes of data and you need to export that data to a machine learning platform and then try to put it back and get the next at the next day, you're looking at way too much time doing it in the database or training the model and then importing it into the database for production. That's what vertical allows, and our customers are. So it right they reopens. Of course, you know, they are the ones that are sort of the Trailblazers they've always been, and ah, this is the next step. In blazing the ML >>thrill joint customers want analytics. They want functional analytics full function. Analytics. What are they pushing you for now? What are you delivering? What's your thought on that? >>Well, I would say the number one thing that our customers are demanding right now is deployment. Flexibility. What? What the what the CEO or the CFO mandated six months ago? Now shout Whatever that thou shalt is is different. And they would, I tell them is it is impossible. No, what you're going to be commanded to do or what options you might have in the future. The key is not having to choose, and they are very, very committed to that. We have a large telco customer who is multi cloud as their commit. Why multi cloud? Well, because they see innovation available in different public clouds. They want to take advantage of all of them. They also, admittedly, the that there's the risk of lock it right. Like any vendor, they don't want that either, so they want multi cloud. We have other customers who say we have some workloads that make sense for the cloud and some that we absolutely cannot in the cloud. But we want a unified analytics strategy, so they are adamant in focusing on deployment flexibility. That's what I'd say is 1st 2nd I would say that the interest in operationalize in machine learning but not necessarily forcing the analytics team to hammer the data science team about which tools or the best tools. That's the probably number two. And then I'd say Number three. And it's because when you look at companies like Uber or the Trade Desk or A T and T or Cerner performance at scale, when they say milliseconds, they think that flow. When they say petabytes, they're like, Yeah, that was yesterday. So performance at scale good enough for vertical is never good enough. And it's why we're constantly building at the core the next generation execution engine, database designer, optimization engine, all that stuff >>I wanna also ask you. When I first started following vertical, we covered the cube covering the BBC. One of things I noticed was in talking to customers and people in the community is that you have a community edition, uh, free addition, and it's not neutered ais that have you maintain that that ethos, you know, through the transitions into into micro focus. And can you talk about that a little bit >>absolutely vertical community edition is vertical. It's all of the verdict of functionality geospatial time series, pattern matching, machine learning, all of the verdict, vertical neon mode, vertical and enterprise mode. All vertical is the community edition. The only limitation is one terabyte of data and three notes, and it's free now. If you want commercial support, where you can file a support ticket and and things like that, you do have to buy the life. But it's free, and we people say, Well, free for how long? Like our field? I've asked that and I say forever and what he said, What do you mean forever? Because we want people to use vertical for use cases that are small. They want to learn that they want to try, and we see no reason to limit that. And what we look for is when they're ready to grow when they need the next set of data that goes beyond a terabyte or they need more compute than three notes, then we're here for them, and it also brings up an important thing that I should remind you or tell you about Davis. You haven't heard it, and that's about the Vertical Academy Academy that vertical dot com well, what is that? That is, well, self paced on demand as well as vertical essential certification. Training and certification means you have seven days with your hands on a vertical cluster hosted in the cloud to go through all the certification. And guess what? All of that is free. Why why would you give it for free? Because for us empowering the market, giving the market the expert East, the learning they need to take advantage of vertical, just like with Community Edition is fundamental to our mission because we see the advantage that vertical can bring. And we want to make it possible for every company all around the world that take advantage >>of it. I love that ethos of vertical. I mean, obviously great product. But it's not just the product. It's the business practices and really progressive progressive pricing and embracing of all these trends and not running away from the waves but really leaning in joy. Thanks so much. Great interview really appreciate it. And, ah, I wished we could have been faced face in Boston, but I think it's prudent thing to do, >>I promise you, Dave we will, because the verdict of BTC and 2021 is already booked. So I will see you there. >>Haas enjoyed King. Thanks so much for coming on the Cube. And thank you for watching. Remember, the Cube is running this program in conjunction with the virtual vertical BDC goto vertical dot com slash BBC 2020 for all the coverage and keep it right there. This is Dave Vellante with the Cube. We'll be right back. >>Yeah, >>yeah, yeah.

Published Date : Mar 31 2020

SUMMARY :

Yeah, it's the queue covering the virtual vertical Big Data Conference Love to have you on. Thank you so much, David. So one of the trends that you see the big waves that you're writing Those are the three big trends that vertical is focusing on right now. it's bringing the cloud experience to wherever the data lives. So now that the key is, how do we take advantage of all of that data? And then we can drill into some of the technologies had the opportunity to deploy their vertical licenses in EON mode on Well, let me stop you there, because I just wanna I want to mention So we talked to Joe Gonzalez and past Mutual, And that's one of the things that Mass Mutual is going to benefit from, I want to mention you beat actually a number of the cloud players with that capability. for the hardware underneath, so we are totally motivated to be independent of that So just to clarify, you're saying I can pay by the drink if I want to. So for us, it's about what do you need? And then if you want a surge above that, for the license that you bring to the cloud. And you guys are in the marketplace. directly from vertical I can pay by the month. Well, and even in the public cloud you can pay for by the hour by the minute or whatever, and the pricing, and I think my take away here is Optionality. And as you said, I'll call it So it's It's the data applying machine intelligence to that data. So that's the journey we started And so being able to replicate that in and open up and make the the and get the next at the next day, you're looking at way too much time doing it in the What are they pushing you for now? commanded to do or what options you might have in the future. And can you talk about that a little bit the market, giving the market the expert East, the learning they need to take advantage of vertical, But it's not just the product. So I will see you there. And thank you for watching.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
September of 2019	DATE	0.99+
Joe Gonzalez	PERSON	0.99+
Dave	PERSON	0.99+
2007	DATE	0.99+
Dell	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Joe	PERSON	0.99+
Joy	PERSON	0.99+
Uber	ORGANIZATION	0.99+
2018	DATE	0.99+
Boston	LOCATION	0.99+
Vertical Academy Academy	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
seven days	QUANTITY	0.99+
one terabyte	QUANTITY	0.99+
python	TITLE	0.99+
three notes	QUANTITY	0.99+
Today	DATE	0.99+
Hewlett Packard Enterprises	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
BBC	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
100 terabytes	QUANTITY	0.99+
Ian Mode	PERSON	0.99+
six months ago	DATE	0.99+
Python	TITLE	0.99+
first release	QUANTITY	0.99+
1st 2nd	QUANTITY	0.99+
three year	QUANTITY	0.99+
Mass Mutual	ORGANIZATION	0.99+
Eight	QUANTITY	0.99+
next year	DATE	0.99+
Stone Breaker	PERSON	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.98+
America 10	TITLE	0.98+
King	PERSON	0.98+
today	DATE	0.98+
four terabyte	QUANTITY	0.97+
John Mode	PERSON	0.97+
Haas	PERSON	0.97+
yesterday	DATE	0.97+
first verdict	QUANTITY	0.96+
one place	QUANTITY	0.96+
s three	COMMERCIAL_ITEM	0.96+
single	QUANTITY	0.95+
first thing	QUANTITY	0.95+
One	QUANTITY	0.95+
both	QUANTITY	0.95+
Tensorflow	TITLE	0.95+
Hadoop	TITLE	0.95+
third trend	QUANTITY	0.94+
MPP Columbia	ORGANIZATION	0.94+
Hadoop	PERSON	0.94+
last last year	DATE	0.92+
three big trends	QUANTITY	0.92+
vertical 10	TITLE	0.92+
two public clouds	QUANTITY	0.92+
Pure Storage Accelerate conference	EVENT	0.91+
Andy	PERSON	0.9+
few years ago	DATE	0.9+
next day	DATE	0.9+
Mutual	ORGANIZATION	0.9+
Mode	PERSON	0.89+
telco	ORGANIZATION	0.89+
three big	QUANTITY	0.88+
eon	TITLE	0.88+
Verdict	PERSON	0.88+
three separate data	QUANTITY	0.88+
Cube	COMMERCIAL_ITEM	0.87+
petabytes	QUANTITY	0.87+
Google Cloud	TITLE	0.86+

Jeff Healey, Vertica at Micro Focus | CUBEConversations, March 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with top leaders all around the world, this is theCUBE Conversation. >> Hi everybody, I'm Dave Vellante, and welcome to the Vertica Big Data Conference virtual. This is our digital presentation, wall to wall coverage actually, of the Vertica Big Data Conference. And with me is Jeff Healy, who directs product marketing at Vertica. Jeff, good to see you. >> Good to see you, Dave. Thanks for the opportunity to chat. >> You're very welcome Now I'm excited about the products that you guys announced and you're hardcore into product marketing, but we're going to talk about the Vertica Big Data Conference. It's been a while since you guys had this. Obviously, new owner, new company, some changes, but that new company Microfocus has announced that it's investing, I think the number was $70 million into two areas. One was security and the other, of course, was Vertica. So we're really excited to be back at the virtual Big Data Conference. And let's hear it from you, what are your thoughts? >> Yeah, Dave, thanks. And we love having theCUBE at all of these events. We're thrilled to have the next Vertica Big Data Conference. Actually it was a physical event, we're moving it online. We know it's going to be a big hit because we've been doing this for some time particularly with two of the webcast series we have every month. One is under the Hood Webcast Series, which is led by our engineers and the other is what we call a Data Disruptors Webcast Series, which is led by all customers. So we're really confident this is going to be a big hit we've seen the registration spike. We just hit 1,000 and we're planning on having about 1,000 at the physical event. It's growing and growing. We're going to see those big numbers and it's not going to be a one time thing. We're going to keep the conversation going, make sure there's plenty of best practices learning throughout the year. >> We've been at all the big BDCs and the first one's were really in the heart of the Big Data Movement, really exciting time and the interesting thing about this event is it was always sort of customers talking to customers. There wasn't a lot of commercials, an intimate event. Of course I loved it because it was in our hometown. But I think you're trying to carry that theme obviously into the digital sphere. Maybe you can talk about that a little bit. >> Yeah, Dave, absolutely right. Of course, nothing replaces face to face, but everything that you just mentioned that makes it special about the Big Data Conference, and you know, you guys have been there throughout and shown great support in talking to so many customers and leaders and what have you. We're doing the same thing all right. So we had about 40 plus sessions planned for the physical event. We're going to run half of those and we're not going to lose anything though, that's the key point. So what makes the Vertica Big Data Conference really special is that the only presenters that are allowed to present are either engineers, Vertica engineers, or best practices engineers and then customers. Customers that actually use the product. There's no sales or marketing pitches or anything like that. And I'll tell you as far as the customer line up that we have, we've got five or six already lined up as part of those 20 sessions, customers like Uber, customers like the Trade Desk, customers like Phillips talking about predictive maintenance, so list goes on and on. You won't want to miss it if you're on the fence or if you're trying to figure out if you want to register for this event. Best part about it, it's all free, and if you can't attend it live, it will be live Q&A chat on every single one of those sessions, we promise we'll answer every question if we don't get it live, as we always do. They'll all be available on demand. So no reason not to register and attend or watch later. >> Thinking about the content over the years, in the early days of the Big Data Conference, of course Vertica started before the whole Big Data Conference meme really took off and then as it took off, plugged right into it, but back then the discussion was a lot of what do I do with big data, Gartner's three Vs and how do I wrangle it all, and what's the best approach and this stuff is, Hadoop is really complicated. Of course Vertica was an alternative to RDBMS that really couldn't scale or give that type of performance for analytical databases so you had your foot in that door. But now the conversation that's interesting your theme, it's win big with data. Of course, the physical event was at the Encore, which is the new Casino in Boston. But my point is, the conversation is no longer about, how to wrangle all this data, you know how to lower the cost of storing this data, how to make it go faster, and actually make it work. It's really about how to turn data into insights and transform your organizations and quote and quote, win with big data. >> That's right. Yeah, that's great point, Dave. And that's why I mean, we chose the title really, because it's about our customers and what they're able to do with our platform. And it's we know, it's not just one platform, all of the ecosystem, all of our incredible partners. Yeah it's funny when I started with the organization about seven years ago, we were closing lots of deals, and I was following up on case studies and it was like, Okay, why did you choose Vertica? Well, the queries went fast. Okay, so what does that mean for your business? We knew we're kind of in the early adopter stage. And we were disrupting the data warehouse market. Now we're talking to our customers that their volumes are growing, growing and growing. And they really have these analytical use cases again, talk to the value at the entire organization is gaining from it. Like that's the difference between now and a few years ago, just like you were saying, when Vertica disrupted the database market, but also the data warehouse market, you can speak to our customers and they can tell you exactly what's happening, how it's moving the needle or really advancing the entire organization, regardless of the analytical use case, whether it's an internet of things around predictive maintenance, or customer behavior analytics, they can speak confidently of it more than just, hey, our queries went faster. >> You know, I've mentioned before the Micro Focus investment, I want to drill into that a bit because the Vertica brand stands alone. It's a Micro Focus company, but Vertica has its own sort of brand awareness. The reason I've mentioned that is because if you go back to the early days of MPP Database, there was a spate of companies, startups that formed. And many if not all of those got acquired, some lived on with the Codebase, going into the cloud, but generally speaking, many of those brands have gone away Vertica stays. And so my point is that we've seen Vertica have staying power throughout, I think it's a function of the architecture that Stonebraker originally envisioned, you guys were early on the market had a lot of good customer traction, and you've been very responsive to a lot of the trends. Colin Mahony will talk about how you adopted and really embrace cloud, for example, and different data formats. And so you've really been able to participate in a lot of the new emerging waves that have come out to the market. And I would imagine some of that's cultural. I wonder if you could just address that in the context of BDC. >> Oh, yeah, absolutely. You hit on all the key points here, Dave. So a lot of changes in the industry. We're in the hottest industry, the tech industry right now. There's lots of competition. But one of the things we'll say in terms of, Hey, who do you compete with? You compete with these players in the cloud, open source alternatives, traditional enterprise data warehouses. That's true, right. And one of the things we've stayed true within calling is really kind of led the charge for the organization is that we know who we are right. So we're an analytical database platform. And we're constantly just working on that one sole Source Code base, to make sure that we don't provide a bunch of different technologies and databases, and different types of technologies need to stitch together. This platform just has unbelievable universal capabilities from everything from running analytics at scale, to in Database Machine Learning with the different approach to all different types of deployment models that are supported, right. We don't go to our companies and we say, yeah, we take care of all your problems but you have to stitch together all these different types of technologies. It's all based on that core Vertica engine, and we've expanded it to meet all these market needs. So Colin knows and what he believes and what he tells the team what we lead with, is that it lead with that one core platform that can address all these analytical initiatives. So we know who we are, we continue to improve on it, regardless of the pivots and the drastic measures that some of the other competitors have taken. >> You know, I got to ask you, so we're in the middle of this global pandemic with Coronavirus and COVID-19, and things change daily by the hour sometimes by the minute. I mean, every day you get up to something new. So you see a lot of forecasts, you see a lot of probability models, best case worst case likely case even though nobody really knows what that likely case looks like, So there's a lot of analytics going on and a lot of data that people are crunching new data sources come in every day. Are you guys participating directly in that, specifically your customers? Are they using your technology? You can't use a traditional data warehouse for this. It's just you know, too slow to asynchronous, the process is cumbersome. What are you seeing in the customer base as it relates to this crisis? >> Sure, well, I mean naturally, we have a lot of customers that are healthcare technology companies, companies, like Cerner companies like Philips, right, that are kind of leading the charge here. And of course, our whole motto has always been, don't throw away any the data, there's value in that data, you don't have to with Vertica right. So you got petabyte scale types of analytics across many of our customers. Again, just a few years ago, we called the customers a petabyte club. Now a majority of our large enterprise software companies are approaching those petabyte volumes. So it's important to be able to run those analytics at that scale and that volume. The other thing we've been seeing from some of our partners is really putting that analytics to use with visualizations. So one of the customers that's going to be presenting as part of the Vertica Big Data conferences is Domo. Domo has a really nice stout demo around be able to track the Coronavirus the outbreak and how we're getting care and things like that in a visual manner you're seeing more of those. Well, Domo embeds Vertica, right. So that's another customer of ours. So think of Vertica is that embedded analytical engine to support those visualizations so that just anyone in the world can track this. And hopefully as we see over time, cases go down we overcome this. >> Talk a little bit more about that. Because again, the BDC has always been engineers presenting to audiences, you guys have a lot of you just mentioned the demo by Domo, you have a lot of brand names that we've interviewed on theCUBE before, but maybe you could talk a little bit more about some of the customers that are going to be speaking at the virtual event, and what people can expect. >> Sure, yeah, absolutely. So we've got Uber that's presenting just a quick fact around Uber. Really, the analytical data warehouse is all Vertica, right. And it works very closely with Open Source or what have you. Just to quick stat on on Uber, 14 million rides per day, what Uber is able to do is connect the riders with the drivers so that they can determine the appropriate pricing. So Uber is going to be a great session that everyone will want to tune in on that. Others like the Trade Desk, right massive Ad Tech company 10 billion ad auctions daily, it may even be per second or per minute, the amount of scale and analytical volume that they have, that they are running the queries across, it can really only be accomplished with a few platforms in the world and that's Vertica that's another a hot one is with the Trade Desk. Philips is going to be presenting IoT analytical workloads we're seeing more and more of those across not only telematics, which you would expect within automotive, but predictive maintenance that cuts across all the original manufacturers and Philips has got a long history of being able to handle sensor data to be able to apply to those business cases where you can improve customer satisfaction and lower costs related to services. So around their MRI machines and predictive maintenance initiative, again, Vertica is kind of that heartbeat, that analytical platform that's driving those initiatives So list goes on and on. Again, the conversation is going to continue with the Data Disruptors in the Under Hood webcast series. Any customers that weren't able to present and we had a few that just weren't able to do it, they've already signed up for future months. So we're already booked out six months out more and more customer stories you're going to hear from Vertica.com. >> Awesome, and we're going to be sharing some of those on theCUBE as well, the BDC it's always been intimate event, one of my favorites, a lot of substance and I'm sure the online version, the virtual digital version is going to be the same. Jeff Healey, thanks so much for coming on theCUBE and give us a little preview of what we can expect at the Vertica BDC 2020. >> You bet. >> Thank you. >> Yeah, Dave, thanks to you and the whole CUBE team. Appreciate it >> Alright, and thank you for watching everybody. Keep it right here for all the coverage of the virtual Big Data conference 2020. You're watching theCUBE. I'm Dave Vellante, we'll see you soon

Published Date : Mar 20 2020

SUMMARY :

connecting with top leaders all around the world, actually, of the Vertica Big Data Conference. Thanks for the opportunity to chat. Now I'm excited about the products that you guys announced and it's not going to be a one time thing. and the interesting thing about this event is that the only presenters that are allowed to present how to wrangle all this data, you know how to lower the cost all of the ecosystem, all of our incredible partners. in a lot of the new emerging waves So a lot of changes in the industry. and a lot of data that people are crunching So one of the customers that's going to be presenting that are going to be speaking at the virtual event, Again, the conversation is going to continue and I'm sure the online version, the virtual digital version Yeah, Dave, thanks to you and the whole CUBE team. of the virtual Big Data conference 2020.

ENTITIES

Entity	Category	Confidence
Jeff Healy	PERSON	0.99+
Philips	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Jeff Healey	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Dave	PERSON	0.99+
Microfocus	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
$70 million	QUANTITY	0.99+
Colin	PERSON	0.99+
20 sessions	QUANTITY	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
Boston	LOCATION	0.99+
March 2020	DATE	0.99+
Gartner	ORGANIZATION	0.99+
One	QUANTITY	0.99+
six months	QUANTITY	0.99+
Domo	ORGANIZATION	0.98+
one platform	QUANTITY	0.98+
Big Data Conference	EVENT	0.98+
two areas	QUANTITY	0.98+
one	QUANTITY	0.98+
CUBE	ORGANIZATION	0.98+
Vertica Big Data Conference	EVENT	0.98+
Coronavirus	OTHER	0.98+
Stonebraker	ORGANIZATION	0.98+
about 40 plus sessions	QUANTITY	0.97+
COVID-19	OTHER	0.96+
BDC	ORGANIZATION	0.96+
one core platform	QUANTITY	0.95+
Vertica BDC 2020	EVENT	0.95+
1,000	QUANTITY	0.95+
Vertica Big Data	EVENT	0.95+
one time	QUANTITY	0.95+
Micro Focus	ORGANIZATION	0.94+
few years ago	DATE	0.93+
about 1,000	QUANTITY	0.93+
Codebase	ORGANIZATION	0.93+
Phillips	ORGANIZATION	0.93+
Cerner	ORGANIZATION	0.92+
10 billion ad auctions	QUANTITY	0.91+
14 million rides per day	QUANTITY	0.9+
Coronavirus	EVENT	0.89+
first one	QUANTITY	0.89+
Under Hood	TITLE	0.86+
Hadoop	TITLE	0.85+
BDC	EVENT	0.83+
seven years ago	DATE	0.8+
outbreak	EVENT	0.79+

Shuyi Chen, Uber | Flink Forward 2018

>> Announcer: Live from San Francisco, it's theCUBE covering Flink Forward, brought to you by data Artisans. (upbeat music) >> This is George Gilbert. We are at Flink Forward, the user conference for the Apache Flink community, sponsored by data Artisans, the company behind Flink. And we are here with Shuyi Chen from Uber, and Shuyi works on a very important project which is the Calcite Query Optimizer, SQL Query Optimizer, that's used in Apache Flink as well as several other projects. Why don't we start with, Shuyi tell us where Calcite's used and its role. >> Calcite is basically used in the Flink Table and SQL API, as the SQL POSSTR and query optimizer in planner for Flink. >> OK. >> Yeah. >> So now let's go to Uber and talk about the pipeline or pipelines you guys have been building and then how you've been using Flink and Calcite to enable the SQL API and the Table API. What workloads are you putting on that platform, or on that pipeline? >> Yeah, so basically I'm the technical lead of the streaming platform, processing platform in Uber, and so we use Apache Flink as the stream processing engine for Uber. Basically we build two different platforms one is the, called AthenaX, which use Flink SQL. So basically enable user to use SQL to compose the stream processing logic. And we have a UI, and with one click, they can just deploy the stream processing job in production. >> When you say UI, did you build a custom UI to take essentially, turn it a business intelligence tool so you have a visual way of constructing your queries? Is that what you're describing, or? >> Yeah, so it's similar to how you compose your, write a SQL query to query database. We have a UI for you to write your SQL query, with all the syntax highlight and all the hint. To write a SQL query so that, even the data scientists and also non engineers in general can actually use that UI to compose stream processing lock jobs. >> Okay, give us an example of some applications 'cause this sounds like it's a high-level API so it makes it more accessible to a wider audience. So what are some of the things they build? >> So for example, in our Uber Eats team, they use the SQL API to, as the stream processing tool to build their Restaurant Manager Dashboard. Restaurant Manager Dashboard. >> Okay. >> So basically, the data log lives in Kafka, get real-time stream into the Flink job, which it's composed using the SQL API and then that got stored in our lab database, P notes, then when the restaurant owners opens the Restaurant Manager, they will see the dashboard of their real-time earnings and everything. And with the SQL API, they no longer need to write the Flink job, they don't need to use Java or skala code, or do any testing or debugging, It's all SQL, so they, yeah. >> And then what's the SQL coverage, the SQL semantics that are implemented in the current Calcite engine? >> So it's about basic transformation, projection, and window hopping and tumbling window and also drawing, and group eye, and having, and also not to mention about the event time and real time, processing time support. >> And you can shuffle from anywhere, you don't have to have two partitions with the same join key on one node. You can have arbitrary, the data placement can be arbitrary for the partitions? >> Well the SQL is the collective, right? And so once the user compose the logic the underlying panel will actually take care of how the key by and group by, everything. >> Okay, 'cause the reason I ask is many of the early Hadoop based MPP sequel engines had the limitation where you had to co-locate the partitions that you were going to join. >> That's the same thing for Flink. >> Oh. >> But it just the SQL part is just take care of that. >> Okay. >> So you do describe what you do, but underlying get translated into a Flink program that actually will do all the co-location. >> Oh it redoes it for you, okay >> Yeah, yeah. So now they don't even need to learn Flink, they just need to learn the SQL, yeah. >> Now you said there a second platform that Uber is building on top of Flink. >> Yeah, the second platform is the, we call it the Flink as a service platform. So the motivation is, we found that SQL actually cannot satisfy all the advanced need in Uber to build stream processing, due to the reason, like for example, they will need to call up RPC services within their stream processing application or even training the RCP call, so which is hard to express in SQL and also when they are having a complicated DAG, like a workflow, it's very difficult to debug individual stages, so they want the control to actually to use delative Flink data stream APL dataset API to build their stream of batch job. >> Is the dataset API the lowest level one? >> No it's on the same level with the data stream, so it's one for streaming, one for batch. >> Okay, data stream and then the other was table? >> Dataset. >> Oh dataset, data stream, data set. >> Yeah. >> And there's one lower than that right? >> Yeah, there's one lower API but it's usually, most people don't use that API. >> So that's system programmers? >> Yeah, yeah. >> So then tell me, who is using, like what type of programmer uses the data stream or the data set API, and what do they build at Uber? >> So for example, in one of the talk later, there's a marketplace team, marketplace dynamics team, it's actually using the platform to do online model update, machinery model update, using Flink, and so basically they need to take in the model that is trained offline and do a few group by, time and location and then apply the model, and then incrementally update the model. >> And so are they taking a window of updates and then updating the model and then somehow promoting it as the candidate or, >> Yeah, yeah, yeah. Something similar, yeah. >> Okay, that's interesting. And what type of, so are these the data scientists who are using this API? >> Well data scientists are not really, it's not designed for data scientists. >> Oh so they're just going the models off, they're preparing the models offline and then they're being updated in line on the stream processing platform. >> Yes. >> And so it's maybe, data engineers who are essentially updating the features that get fed in and are continually training, or updating the models. >> Basically it's a online model update. So as Kafka event comes in, continue to refine the model. >> Okay, and so as Uber looks out couple years, what sorts of things do you see adding to one of these, either of these pipelines, and do you see a shift away from the batch and request response type workloads towards more continuous processing. >> Yes actually there we do see that trend, actually, before becoming entirely of stream processing platform team in Uber, I was in marketplace as well and at that point we always see there's a shift, like people would love to use stream processing technology to actually replace some of the normal backhand service applications. >> Tell me some examples. >> Yeah, for example... So in our dispatch platform, we have the need to actually shard the workload by, for example, writers, to different hosts to process. For example, compute say ETA or compute some of the time average, and this is before done in back hand services and say use our internal distribution system things to do the sharding. But actually with Flink, this can be just done very easily, right. And so actually there's a shift, those people will also want to adopt stream processing technology and, so long as this is not a request response style application. >> So the key thing, just to make sure I understand it's that Flink can take care of the distributed joins, whereas when it was a data base based workload, DBA had to set up the sharding and now it's sort of more transparent like it's more automated? >> I think, it's... More of the support, so if before people writing backhand services they have to write everything: the state management, the sharding, and everything, they need to-- >> George: Oh it's not even data base based-- >> Yeah, it's not data base, it's real time. >> So they have to do the physical data management, and Flink takes care of that now? >> Yeah, yeah. >> Oh got it, got it. >> For some of the application it's real time so we don't really need to store the data all the time in the database, So it's usually keep in memory and somehow gets snapshot, But we have, for normal backhand service writer they have to do everything. But with Flink it has already built in support for state management and all the sharding, partitioning and the time window, aggregation primitive, and it's all built in and they don't need to worry about re-implement the logic and we architect the system again and again. >> So it's a new platform for real time it gives you a whole lot of services, higher abstraction for real time applications. >> Yeah, yeah. >> Okay. Alright with that, Shuyi we're going to have to call it a day. This was Shuyi Chen from Uber talking about how they're building more and more of their real time platforms on Apache Flink and using a whole bunch of services to complement it. We are at Flink Forward, the user conference of data Artisans for the Apache Flink community, we're in San Francisco, this is the second Flink Forward conference and we'll be back in a couple minutes, thanks. (upbeat music)

Published Date : Apr 11 2018

SUMMARY :

brought to you by data Artisans. the user conference for the Apache Flink community, as the SQL POSSTR and talk about the pipeline or pipelines Yeah, so basically I'm the technical lead Yeah, so it's similar to how you compose your, so it makes it more accessible to a wider audience. as the stream processing tool the Flink job, they don't need to use Java or skala code, and also not to mention about the event time the data placement can be arbitrary for the partitions? And so once the user compose the logic had the limitation where you had to co-locate So you do describe what you do, So now they don't even need to learn Flink, Now you said there a second platform all the advanced need in Uber to build stream processing, No it's on the same level with the data stream, Yeah, there's one lower API but it's usually, and so basically they need to take in the model Yeah, yeah, yeah. so are these the data scientists who are using this API? it's not designed for data scientists. on the stream processing platform. and are continually training, So as Kafka event comes in, continue to refine the model. Okay, and so as Uber looks out couple years, and at that point we always see there's a shift, or compute some of the time average, More of the support, and it's all built in and they don't need to worry about So it's a new platform for real time for the Apache Flink community, we're in San Francisco,

ENTITIES

Entity	Category	Confidence
Uber	ORGANIZATION	0.99+
Shuyi Chen	PERSON	0.99+
George Gilbert	PERSON	0.99+
San Francisco	LOCATION	0.99+
George	PERSON	0.99+
Flink	ORGANIZATION	0.99+
second platform	QUANTITY	0.99+
Shuyi	PERSON	0.99+
Java	TITLE	0.99+
SQL	TITLE	0.99+
Kafka	TITLE	0.99+
Uber Eats	ORGANIZATION	0.99+
one click	QUANTITY	0.99+
SQL Query Optimizer	TITLE	0.99+
SQL POSSTR	TITLE	0.98+
second	QUANTITY	0.98+
Calcite	TITLE	0.98+
two partitions	QUANTITY	0.97+
SQL API	TITLE	0.97+
Calcite Query Optimizer	TITLE	0.97+
Flink Forward	EVENT	0.96+
a day	QUANTITY	0.95+
one	QUANTITY	0.95+
Flink Table	TITLE	0.94+
Apache Flink	ORGANIZATION	0.94+
one node	QUANTITY	0.88+
Flink	TITLE	0.83+
two different platforms	QUANTITY	0.82+
couple years	QUANTITY	0.82+
Table	TITLE	0.82+
Apache	ORGANIZATION	0.8+
Artisans	ORGANIZATION	0.78+
2018	DATE	0.77+
Hadoop	TITLE	0.73+
one for	QUANTITY	0.69+
couple minutes	QUANTITY	0.65+
AthenaX	ORGANIZATION	0.64+
Flink Forward	TITLE	0.56+
Forward	EVENT	0.52+
DBA	ORGANIZATION	0.5+
MPP	TITLE	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for MPP: