Joe Gonzalez, MassMutual | Virtual Vertica BDC 2020
(bright music) >> Announcer: It's theCUBE. Covering the Virtual Vertica Big Data Conference 2020, brought to you by Vertica. Hello everybody, welcome back to theCUBE's coverage of the Vertica Big Data Conference, the Virtual BDC. My name is Dave Volante, and you're watching theCUBE. And we're here with Joe Gonzalez, who is a Vertica DBA, at MassMutual Financial. Joe, thanks so much for coming on theCUBE I'm sorry that we can't be face to face in Boston, but at least we're being responsible. So thank you for coming on. >> (laughs) Thank you for having me. It's nice to be here. >> Yeah, so let's set it up. We'll talk about, you know, a little bit about MassMutual. Everybody knows it's a big financial firm, but what's your role there and kind of your mission? >> So my role is Vertica DBA. I was hired January of last year to come on and manage their Vertica cluster. They've been on Vertica for probably about a year and a half before that started out on on-prem cluster and then move to AWS Enterprise in the cloud, and brought me on just as they were considering transitioning over to Vertica's EON mode. And they didn't really have anybody dedicated to Vertica, nobody who really knew and understood the product. And I've been working with Vertica for about probably six, seven years, at that point. I was looking for something new and landed a really good opportunity here with a great company. >> Yeah, you have a lot of experience in Vertica. You had a role as a market research, so you're a data guy, right? I mean that's really what you've been doing your entire career. >> I am, I've worked with Pitney Bowes, in the postage industry, I worked with healthcare auditing, after seven years in market research. And then I've been with MassMutual for a little over a year now, yeah, quite a lot. >> So tell us a little bit about kind of what your objectives are at MassMutual, what you're kind of doing with the platform, what application just supporting, paint a picture for us if you would. >> Certainly, so my role is, MassMutual just decided to make Vertica its enterprise data warehouse. So they've really bought into Vertica. And we're moving all of our data there probably about to good 80, 90% of MassMutual's data is going to be on the Vertica platform, in EON mode. So, and we have a wide usage of that data across corporation. Right now we're about 50 terabytes and growing quickly. And a wide variety of users. So there's a lot of ETLs coming in overnight, loading a lot of data, transforming a lot of data. And a lot of reporting tools are using it. So currently, Tableau MicroStrategy. We have Alteryx using it, and we also have API's running against it throughout the day, 24/7 with people coming in, especially now these days with the, you know, some financial uncertainty going on. A lot of people coming and checking their 401k's, checking their insurance and status and what not. So we have to handle a lot of concurrent traffic on top of the normal big query. So it's a quite diverse cluster. And I'm glad they're really investing in using Vertica as their overall solution for this. >> Yeah, I mean, these days your 401k like this, right? (laughing) Afraid to look. So I wonder, Joe if you could share with our audience. I mean, for those who might not be as familiar with the history of just Vertica, and specifically, about MPP, you've had historically you have, you know, traditional RDBMS, whether it's Db2 or Oracle, and then you had a spate of companies that came out with this notion of MPP Vertica is the one that, I think it's probably one of the few if only brands that they've survived, but what did that bring to the industry and why is that important for people to understand, just in terms of whatever it is, scale, performance, cost. Can you explain that? >> To me, it actually brought scale at good cost. And that's why I've been a big proponent of Vertica ever since I started using it. There's a number, like you said of different platforms where you can load big data and store and house big data. But the purpose of having that big data is not just for it to sit there, but to be used, and used in a variety of ways. And that's from, you know, something small, like the first installation I was on was about 10 terabytes. And, you know, I work with the data warehouses up to 100 terabytes, and, you know, there's Vertica installations with, you know, hundreds of petabytes on them. You want to be able to use that data, so you need a platform that's going to be able to access that data and get it to the clients, get it to the customers as quickly as possible, and not paying an arm and a leg for the privilege to do so. And Vertica allows companies to do that, not only get their data to clients and you know, in company users quickly, but save money while doing so. >> So, but so, why couldn't I just use a traditional RDBMS? Why not just throw it all into Oracle? >> One, cost, Oracle is very expensive while Vertica's a lot more affordable than that. But the column-score structure of Vertica allows for a lot more optimized queries. Some of the queries that you can run in Vertica in 2, 3, 4 seconds, will take minutes and sometimes hours in an RDBMS, like Oracle, like SQL Server. They have the capability to store that amount of data, no question, but the usability really lacks when you start querying tables that are 180 billion column, 180 billion rows rather of tables in Vertica that are over 1000 columns. Those will take hours to run on a traditional RDBMS and then running them in Vertica, I get my queries back in a sec. >> You know what's interesting to me, Joe and I wonder if you could comment, it seems that Vertica has done a good job of embracing, you know, riding the waves, whether it was HDFS and the big data in our early part of the big data era, the machine learning, machine intelligence. Whether it's, you know, TensorFlow and other data science tools, it seems like Vertica somehow in the cloud is the other one, right? A lot of times cloud is super disruptive, particularly to companies that started on-prem, it seems like Vertica somehow has been able to adopt and embrace some of these trends. Why, from your standpoint, first of all, from your standpoint, as a customer, is that true? And why do you think that is? Is it architectural? Is it true mindset engineering? I wonder if you could comment on that. >> It's absolutely true, I've started out again, on an on-prem Vertica data warehouse, and we kind of, you know, rolled kind of along with them, you know, more and more people have been using data, they want to make it accessible to people on the web now. And you know, having that, the option to provide that data from an on-prem solution, from AWS is key, and now Vertica is offering even a hybrid solution, if you want to keep some of your data behind a firewall, on-prem, and put some in the cloud as well. So data at Vertica has absolutely evolved along with the industry in ways that no other company really has that I've seen. And I think the reason for it and the reason I've stayed with Vertica, and specifically have remained at Vertica DBA for the last seven years, is because of the way Vertica stays in touch with it's persons. I've been working with the same people for the seven, eight years, I've been using Vertica, they're family. I'm part of their family, and you know, I'm good friends with some of these people. And they really are in tune not only with the customer but what they're doing. They really sit down with you and have those conversations about, you know, what are your needs? How can we make Vertica better? And they listen to their clients. You know, just having access to the data engineers who develop Vertica to be arranged on a phone call or whatnot, I've never had that with any other company. Vertica makes that available to their customers when they need it. So the personal touch is a huge for them. >> That's good, it's always good to get the confirmation from the practitioners, just not hear from the vendor. I want to ask you about the EON transition. You mentioned that MassMutual brought you in to help with that. What were some of the challenges that you faced? And how did you get over them? And what did, what is, why EON? You know, what was the goal, the outcome and some of the challenges maybe that you had to overcome? >> Right. So MassMutual had an interesting setup when I first came in. They had three different Vertica clusters to accommodate three different portions of their business. The data scientists who use the data quite extensively in very large queries, very intense queries, their work with their predictive analytics and whatnot. It was a separate one for the API's, which needed, you know, sub-second query response times. And the enterprise solution, they weren't always able to get the performance they needed, because the fast queries were being overrun by the larger queries that needed more resources. And then they had a third for starting to develop this enterprise data platform and started, you know, looking into their future. The first challenge was, first of all, bringing all those three together, and back into a single cluster, and allowing our users to have both of the heavy queries and the API queries running at the same time, on the same platform without having to completely separate them out onto different clusters. EON really helps with that because it allows to store that data in the S3 communal storage, have the main cluster set up to run the heavy queries. And then you can set up sub clusters that still point to that S3 data, but separates out the compute so that the API's really have their own resources to run and not be interfered with by the other process. >> Okay, so that, I'm hearing a couple of things. One is you're sort of busting down data silos. So you're able to have a much more coherent view of your data, which I would imagine is critical, certainly. Companies like MassMutual, have been around for 100 years, and so you've got all kinds of data dispersed. So to the extent that you can break down those silos, that's important, but also being able to I guess have granular increments of compute and storage is what I'm hearing. What does that do for you? It make that more efficient? Well, they are other business benefits? Maybe you could elucidate. >> Well, one cost is again, a huge benefit, the cost of running three different clusters in even AWS, in the enterprise solution was a little costly, you know, you had to have your dedicated servers here and there. So you're paying for like, you know, 12, 15 different servers, for example. Whereas we bring them all back into EON, I can run everything on a six-node production cluster. And you know, when things are busy, I can spin up the three-node top cluster for the API's, only paid for when I need them, and then bring them back into the main cluster when things are slowed down a bit, and they can get that performance that they need. So that saves a ton on resource costs, you know, you're not paying for the storage, you're paying for one S3 bucket, you're only paying for the nodes, these are two instances, that are up and running when you need them., and that is huge. And again, like you said, it gives us the ability to silo our data without having to completely separate our data into different storage areas. Which is a big benefit, it gives us the ability to query everything from one single cluster without having to synchronize it to, you know, three different ones. So this one going to have there's, this one going to have there's, but everyone's still looking at the same data and replicate that in QA and Devs so that people can do it outside of production and do some testing as well. >> So EON, obviously a very important innovation. And of course, Vertica touts the difference between others who separate huge storage, and you know, they're not the only one that does that, but they are really I think the only one that does it for on-prem, and virtually across clouds. So my question is, and I think you're doing a breakout session on the Virtual BDC. We're going to be in Boston, now we're doing it online. If I'm in the audience, I'm imagining I'm a junior DBA at an organization that maybe doesn't have a Joe. I haven't been an expert for seven years. How hard is it for me to get, what do I need to do to get up to speed on EON? It sounds great, I want it. I'm going to save my company money, but I'm nervous 'cause I've only been at Vertica DBA for, you know, a year, and I'm sort of, you know, not as experienced as you. What are the things that I should be thinking about? Do I need to bring in? Do I need to hire somebody? Do I need to bring in a consultant? Can I learn it myself? What would you advise? >> It's definitely easy enough that if you have at least a little bit of work experience, you can learn it yourself, okay? 'Cause the concepts are still there. There's some you know, little bits of nuances where you do need to be aware of certain changes between the Enterprise and EON edition. But I would also say consult with your Vertica Account Manager, consult with your, you know, let them bring in the right people from Vertica to help you get up to speed and if you need to, there are also resources available as far as consultants go, that will help you get up to speed very quickly. And we did work together with Vertica and with one of their partners, Clarity, in helping us to understand EON better, set it up the right way, you know, how do we take our, the number of shards for our data warehouse? You know, they helped us evaluate all that and pick the right number of shards, the right number of nodes to get set up and going. And, you know, helped us figure out the best ways to get our data over from the Enterprise Edition into EON very quickly and very efficient. So different with yourself. >> I wanted to ask you about organizational, you know, issues because, you know, the guys like you practitioners always tell me, "Look, the tech, technology comes and goes, that's kind of the easy part, we're good at that. It's the people it's the processes, the skill sets." What does your, you know, team regime look like? And do you have any sort of ideal team makeup or, you know, ideal advice, is it two piece of teams? Is it what kind of skills? What kind of interaction and communications to senior leadership? I wonder if you could just give us some color on that. >> One of the things that makes me extremely proud to be working for MassMutual right now, is that they do what a lot of companies have not been doing and that is investing in IT. They have put a lot of thought, a lot of money, and a lot of support into setting up their enterprise data platform and putting Vertica at the center. And not only did they put the money into getting the software that they needed, like Vertica, you know, MicroStrategy, and all the other tools that we were using to use that, they put the money in the people. Our managers are extremely supportive of us. We hired about 40 to 45 different people within a four-month time frame, data engineers, data analysts, data modelers, a nice mix of people across who can help shape your data and bring the data in and help the users use the data properly, and allow me as the database administrator to make sure that they're doing what they're doing most efficiently and focus on my job. So you have to have that diversity among the different data skills in order to make your team successful. >> That's awesome. Kind of a side question, and it's really not Vertica's wheelhouse, but I'm curious, you know, in the early days of the big data, you know, movement, a lot of the data scientists would complain, and they still do that, "80% of my time is spent wrangling data." The tools for the data engineer, the data scientists, the database, you know, experts, they're all different. And is that changing? And to what degree is that changing? Kind of what ending are we in and just in terms of a more facile environment for all those roles? >> Again, I think it depends on company to company, you know, what resources they make available to the data scientists. And the data scientists, we have a lot of them at MassMutual. And they're very much into doing a lot of machine learning, model training, predictive analytics. And they are, you know, used to doing it outside of Vertica too, you know, pulling that data out into Python and Scalars Bar, and tools like that. And they're also now just getting into using Vertica's in-database analytics and machine learning, which is a skill that, you know, definitely nobody else out there has. So being able to have one somebody who understands Vertica like myself, and being able to train other people to use Vertica the way that is most efficient for them is key. But also just having people who understand not only the tools that you're using, but how to model data, how to architect your tables, your schemas, the interaction between your tables and schemas and whatnot, you need to have that diversity in order to make this work. And our data scientists have benefited immensely from the struct that MassMutual put in place by our data management delivery team. >> That's great, I think I saw, somewhere in your background, that you've trained about 100 people in Vertica. Did I get that right? >> Yes, I've, since I started here, I've gone to our Boston location, our Springfield location, and our New York City location and trained, probably about this point, about 120, 140 of our Vertica users. And I'm trying to do, you know, a couple of follow-up sessions per year. >> So adoption, obviously, is a big goal of yours. Getting people to adopt the platform, but then more importantly, I guess, deliver business value and outcomes. >> Absolutely. >> Yeah, I wanted to ask you about encryption. You know, in the perfect world, everything would be encrypted, but there are trade offs. Are you using encryption? What are you doing in that regard? >> We are actually just getting into that now due to the New York and the CCPA regulations that are now in place. We do have a lot of Person Identifiable Information in our data store that does require encryption. So we are going through a month's long process that started in December, I think, it's actually a bit earlier than that, to start identifying all the columns, not only in our Vertica database, but in, you know, the other databases that we do use, you know, we have Postgres database, SQL Server, Teradata for the time being, until that moves into Vertica. And identify where that data sits, what downstream applications, pull that data from the data sources and store it locally as well, and starts encrypting that data. And because of the tight relationship between Voltage and Vertica, we settled on Voltages as the major platform to start doing that encryption. So we're going to be implementing that in Vertica probably within the next month or two, and roll it out to all the teams that have data that requires encryption. We're going to start rolling it out to the downstream application owners to make sure that they are encrypting the data as they get it pulled over. And we're also using another product for several other applications that don't mesh well as well with both. >> Voltage being micro, focuses encryption solution, correct? >> Right, yes. >> Yes, of course, like a focus for the audience's is the, it owns Vertica and if Vertica is a separate brand. So I want to ask you kind of close on what success looks like. You've been at this for a number of years, coming into MassMutual which was great to hear. I've had some past experience with MassMutual, it's an awesome company, I've been to the Springfield facility and in Boston as well, and I have great respect for them, and they've really always been a leader. So it's great to hear that they're investing in technology as a differentiator. What does success look like for you? Let's say you're at MassMutual for a few years, you're looking back, what success look like? Go. >> A good question. It's changing every day just, you know, with more and more, you know, applications coming onboard, more and more data being pulled in, more uses being found for the data that we have. I think success for me is making sure that Vertica, first of all, is always up made, is always running at its most optimal to keep our users happy. I think when I started, you know, we had a lot of processes that were running, you know, six, seven hours, some of them were taking, you know, almost a day long, because they were so complicated, we've got those running in under an hour now, some of them running in a matter of minutes. I want to keep that optimization going for all of our processes. Like I said, there's a lot of users using this data. And it's been hard over the first year of me being here to get to all of them. And thankfully, you know, I'm getting a bit of help now, I have a couple of system DBAs, and I'm training up to help out with these optimizations, you know, fixing queries, fixing projections to make sure that queries do run as quickly as possible. So getting that to its optimal stage is one. Two, getting our data encrypted and protected so that even if for whatever reasons, somehow somebody breaks into our data, they're not going to be able to get anything at all, because our data is 100% protected. And I think more companies need to be focusing on that as well. And third, I want to see our data science teams using more and more of Vertica's in-database predictive analytics, in-database machine learning products, and really helping make their jobs more efficient by doing so. >> Joe, you're awesome guest I mean, we always like I said, love having the practitioners on and getting the straight, skinny and pros. You're welcome back anytime, and as I say, I wish we could have met in Boston, maybe next year at the BDC. But it's great to have you online, and thanks for coming on theCUBE. >> And thank you for having me and hopefully we'll meet next year. >> Yeah, I hope so. And thank you everybody for watching that. Remember theCUBE is running concurrent with the Vertica Virtual BDC, it's vertica.com/bdc2020. If you want to check out all the keynotes, and all the breakout sessions, I'm Dave Volante for theCUBE. We'll be going. More interviews, for people right there. Thanks for watching. (bright music)
SUMMARY :
Big Data Conference 2020, brought to you by Vertica. (laughs) Thank you for having me. We'll talk about, you know, cluster and then move to AWS Enterprise in the cloud, Yeah, you have a lot of experience in Vertica. in the postage industry, I worked with healthcare auditing, paint a picture for us if you would. with the, you know, some financial uncertainty going on. and then you had a spate of companies that came out their data to clients and you know, Some of the queries that you can run in Vertica a good job of embracing, you know, riding the waves, And you know, having that, the option to provide and some of the challenges maybe that you had to overcome? It was a separate one for the API's, which needed, you know, So to the extent that you can break down those silos, So that saves a ton on resource costs, you know, and I'm sort of, you know, not as experienced as you. to help you get up to speed and if you need to, because, you know, the guys like you practitioners the database administrator to make sure that they're doing of the big data, you know, movement, Again, I think it depends on company to company, you know, Did I get that right? And I'm trying to do, you know, a couple of follow-up Getting people to adopt the platform, but then more What are you doing in that regard? the other databases that we do use, you know, So I want to ask you kind of close on what success looks like. And thankfully, you know, I'm getting a bit of help now, But it's great to have you online, And thank you for having me And thank you everybody for watching that.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joe Gonzalez | PERSON | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
Dave Volante | PERSON | 0.99+ |
MassMutual | ORGANIZATION | 0.99+ |
Boston | LOCATION | 0.99+ |
December | DATE | 0.99+ |
100% | QUANTITY | 0.99+ |
Joe | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
12 | QUANTITY | 0.99+ |
80% | QUANTITY | 0.99+ |
seven | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
four-month | QUANTITY | 0.99+ |
vertica.com/bdc2020 | OTHER | 0.99+ |
Springfield | LOCATION | 0.99+ |
2 | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
two instances | QUANTITY | 0.99+ |
seven hours | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Scalars Bar | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
180 billion rows | QUANTITY | 0.99+ |
Two | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
15 different servers | QUANTITY | 0.99+ |
two piece | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
180 billion column | QUANTITY | 0.98+ |
over 1000 columns | QUANTITY | 0.98+ |
eight years | QUANTITY | 0.98+ |
Voltage | ORGANIZATION | 0.98+ |
three | QUANTITY | 0.98+ |
hundreds of petabytes | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
six-node | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
one single cluster | QUANTITY | 0.98+ |
Vertica Big Data Conference | EVENT | 0.98+ |
MassMutual Financial | ORGANIZATION | 0.98+ |
4 seconds | QUANTITY | 0.98+ |
EON | ORGANIZATION | 0.98+ |
New York | LOCATION | 0.97+ |
about 10 terabytes | QUANTITY | 0.97+ |
first challenge | QUANTITY | 0.97+ |
next month | DATE | 0.97+ |
UNLIST TILL 4/2 - End-to-End Security
>> Paige: Hello everybody and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled End-to-End Security in Vertica. I'm Paige Roberts, Open Source Relations Manager at Vertica. I'll be your host for this session. Joining me is Vertica Software Engineers, Fenic Fawkes and Chris Morris. Before we begin, I encourage you to submit your questions or comments during the virtual session. You don't have to wait until the end. Just type your question or comment in the question box below the slide as it occurs to you and click submit. There will be a Q&A session at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer offline. Also, you can visit Vertica forums to post your questions there after the session. Our team is planning to join the forums to keep the conversation going, so it'll be just like being at a conference and talking to the engineers after the presentation. Also, a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this whole session is being recorded and it will be available to view on-demand this week. We'll send you a notification as soon as it's ready. I think we're ready to get started. Over to you, Fen. >> Fenic: Hi, welcome everyone. My name is Fen. My pronouns are fae/faer and Chris will be presenting the second half, and his pronouns are he/him. So to get started, let's kind of go over what the goals of this presentation are. First off, no deployment is the same. So we can't give you an exact, like, here's the right way to secure Vertica because how it is to set up a deployment is a factor. But the biggest one is, what is your threat model? So, if you don't know what a threat model is, let's take an example. We're all working from home because of the coronavirus and that introduces certain new risks. Our source code is on our laptops at home, that kind of thing. But really our threat model isn't that people will read our code and copy it, like, over our shoulders. So we've encrypted our hard disks and that kind of thing to make sure that no one can get them. So basically, what we're going to give you are building blocks and you can pick and choose the pieces that you need to secure your Vertica deployment. We hope that this gives you a good foundation for how to secure Vertica. And now, what we're going to talk about. So we're going to start off by going over encryption, just how to secure your data from attackers. And then authentication, which is kind of how to log in. Identity, which is who are you? Authorization, which is now that we know who you are, what can you do? Delegation is about how Vertica talks to other systems. And then auditing and monitoring. So, how do you protect your data in transit? Vertica makes a lot of network connections. Here are the important ones basically. There are clients talk to Vertica cluster. Vertica cluster talks to itself. And it can also talk to other Vertica clusters and it can make connections to a bunch of external services. So first off, let's talk about client-server TLS. Securing data between, this is how you secure data between Vertica and clients. It prevents an attacker from sniffing network traffic and say, picking out sensitive data. Clients have a way to configure how strict the authentication is of the server cert. It's called the Client SSLMode and we'll talk about this more in a bit but authentication methods can disable non-TLS connections, which is a pretty cool feature. Okay, so Vertica also makes a lot of network connections within itself. So if Vertica is running behind a strict firewall, you have really good network, both physical and software security, then it's probably not super important that you encrypt all traffic between nodes. But if you're on a public cloud, you can set up AWS' firewall to prevent connections, but if there's a vulnerability in that, then your data's all totally vulnerable. So it's a good idea to set up inter-node encryption in less secure situations. Next, import/export is a good way to move data between clusters. So for instance, say you have an on-premises cluster and you're looking to move to AWS. Import/Export is a great way to move your data from your on-prem cluster to AWS, but that means that the data is going over the open internet. And that is another case where an attacker could try to sniff network traffic and pull out credit card numbers or whatever you have stored in Vertica that's sensitive. So it's a good idea to secure data in that case. And then we also connect to a lot of external services. Kafka, Hadoop, S3 are three of them. Voltage SecureData, which we'll talk about more in a sec, is another. And because of how each service deals with authentication, how to configure your authentication to them differs. So, see our docs. And then I'd like to talk a little bit about where we're going next. Our main goal at this point is making Vertica easier to use. Our first objective was security, was to make sure everything could be secure, so we built relatively low-level building blocks. Now that we've done that, we can identify common use cases and automate them. And that's where our attention is going. Okay, so we've talked about how to secure your data over the network, but what about when it's on disk? There are several different encryption approaches, each depends on kind of what your use case is. RAID controllers and disk encryption are mostly for on-prem clusters and they protect against media theft. They're invisible to Vertica. S3 and GCP are kind of the equivalent in the cloud. They also invisible to Vertica. And then there's field-level encryption, which we accomplish using Voltage SecureData, which is format-preserving encryption. So how does Voltage work? Well, it, the, yeah. It encrypts values to things that look like the same format. So for instance, you can see date of birth encrypted to something that looks like a date of birth but it is not in fact the same thing. You could do cool stuff like with a credit card number, you can encrypt only the first 12 digits, allowing the user to, you know, validate the last four. The benefits of format-preserving encryption are that it doesn't increase database size, you don't need to alter your schema or anything. And because of referential integrity, it means that you can do analytics without unencrypting the data. So again, a little diagram of how you could work Voltage into your use case. And you could even work with Vertica's row and column access policies, which Chris will talk about a bit later, for even more customized access control. Depending on your use case and your Voltage integration. We are enhancing our Voltage integration in several ways in 10.0 and if you're interested in Voltage, you can go see their virtual BDC talk. And then again, talking about roadmap a little, we're working on in-database encryption at rest. What this means is kind of a Vertica solution to encryption at rest that doesn't depend on the platform that you're running on. Encryption at rest is hard. (laughs) Encrypting, say, 10 petabytes of data is a lot of work. And once again, the theme of this talk is everyone has a different key management strategy, a different threat model, so we're working on designing a solution that fits everyone. If you're interested, we'd love to hear from you. Contact us on the Vertica forums. All right, next up we're going to talk a little bit about access control. So first off is how do I prove who I am? How do I log in? So, Vertica has several authentication methods. Which one is best depends on your deployment size/use case. Again, theme of this talk is what you should use depends on your use case. You could order authentication methods by priority and origin. So for instance, you can only allow connections from within your internal network or you can enforce TLS on connections from external networks but relax that for connections from your internal network. That kind of thing. So we have a bunch of built-in authentication methods. They're all password-based. User profiles allow you to set complexity requirements of passwords and you can even reject non-TLS connections, say, or reject certain kinds of connections. Should only be used by small deployments because you probably have an LDAP server, where you manage users if you're a larger deployment and rather than duplicating passwords and users all in LDAP, you should use LDAP Auth, where Vertica still has to keep track of users, but each user can then use LDAP authentication. So Vertica doesn't store the password at all. The client gives Vertica a username and password and Vertica then asks the LDAP server is this a correct username or password. And the benefits of this are, well, manyfold, but if, say, you delete a user from LDAP, you don't need to remember to also delete their Vertica credentials. You can just, they won't be able to log in anymore because they're not in LDAP anymore. If you like LDAP but you want something a little bit more secure, Kerberos is a good idea. So similar to LDAP, Vertica doesn't keep track of who's allowed to log in, it just keeps track of the Kerberos credentials and it even, Vertica never touches the user's password. Users log in to Kerberos and then they pass Vertica a ticket that says "I can log in." It is more complex to set up, so if you're just getting started with security, LDAP is probably a better option. But Kerberos is, again, a little bit more secure. If you're looking for something that, you know, works well for applications, certificate auth is probably what you want. Rather than hardcoding a password, or storing a password in a script that you use to run an application, you can instead use a certificate. So, if you ever need to change it, you can just replace the certificate on disk and the next time the application starts, it just picks that up and logs in. Yeah. And then, multi-factor auth is a feature request we've gotten in the past and it's not built-in to Vertica but you can do it using Kerberos. So, security is a whole application concern and fitting MFA into your workflow is all about fitting it in at the right layer. And we believe that that layer is above Vertica. If you're interested in more about how MFA works and how to set it up, we wrote a blog on how to do it. And now, over to Chris, for more on identity and authorization. >> Chris: Thanks, Fen. Hi everyone, I'm Chris. So, we're a Vertica user and we've connected to Vertica but once we're in the database, who are we? What are we? So in Vertica, the answer to that questions is principals. Users and roles, which are like groups in other systems. Since roles can be enabled and disabled at will and multiple roles can be active, they're a flexible way to use only the privileges you need in the moment. For example here, you've got Alice who has Dbadmin as a role and those are some elevated privileges. She probably doesn't want them active all the time, so she can set the role and add them to her identity set. All of this information is stored in the catalog, which is basically Vertica's metadata storage. How do we manage these principals? Well, depends on your use case, right? So, if you're a small organization or maybe only some people or services need Vertica access, the solution is just to manage it with Vertica. You can see some commands here that will let you do that. But what if we're a big organization and we want Vertica to reflect what's in our centralized user management system? Sort of a similar motivating use case for LDAP authentication, right? We want to avoid duplication hassles, we just want to centralize our management. In that case, we can use Vertica's LDAPLink feature. So with LDAPLink, principals are mirrored from LDAP. They're synced in a considerable fashion from the LDAP into Vertica's catalog. What this does is it manages creating and dropping users and roles for you and then mapping the users to the roles. Once that's done, you can do any Vertica-specific configuration on the Vertica side. It's important to note that principals created in Vertica this way, support multiple forms of authentication, not just LDAP. This is a separate feature from LDAP authentication and if you created a user via LDAPLink, you could have them use a different form of authentication, Kerberos, for example. Up to you. Now of course this kind of system is pretty mission-critical, right? You want to make sure you get the right roles and the right users and the right mappings in Vertica. So you probably want to test it. And for that, we've got new and improved dry run functionality, from 9.3.1. And what this feature offers you is new metafunctions that let you test various parameters without breaking your real LDAPLink configuration. So you can mess around with parameters and the configuration as much as you want and you can be sure that all of that is strictly isolated from the live system. Everything's separated. And when you use this, you get some really nice output through a Data Collector table. You can see some example output here. It runs the same logic as the real LDAPLink and provides detailed information about what would happen. You can check the documentation for specifics. All right, so we've connected to the database, we know who we are, but now, what can we do? So for any given action, you want to control who can do that, right? So what's the question you have to ask? Sometimes the question is just who are you? It's a simple yes or no question. For example, if I want to upgrade a user, the question I have to ask is, am I the superuser? If I'm the superuser, I can do it, if I'm not, I can't. But sometimes the actions are more complex and the question you have to ask is more complex. Does the principal have the required privileges? If you're familiar with SQL privileges, there are things like SELECT, INSERT, and Vertica has a few of their own, but the key thing here is that an action can require specific and maybe even multiple privileges on multiple objects. So for example, when selecting from a table, you need USAGE on the schema and SELECT on the table. And there's some other examples here. So where do these privileges come from? Well, if the action requires a privilege, these are the only places privileges can come from. The first source is implicit privileges, which could come from owning the object or from special roles, which we'll talk about in a sec. Explicit privileges, it's basically a SQL standard GRANT system. So you can grant privileges to users or roles and optionally, those users and roles could grant them downstream. Discretionary access control. So those are explicit and they come from the user and the active roles. So the whole identity set. And then we've got Vertica-specific inherited privileges and those come from the schema, and we'll talk about that in a sec as well. So these are the special roles in Vertica. First role, DBADMIN. This isn't the Dbadmin user, it's a role. And it has specific elevated privileges. You can check the documentation for those exact privileges but it's less than the superuser. The PSEUDOSUPERUSER can do anything the real superuser can do and you can grant this role to whomever. The DBDUSER is actually a role, can run Database Designer functions. SYSMONITOR gives you some elevated auditing permissions and we'll talk about that later as well. And finally, PUBLIC is a role that everyone has all the time so anything you want to be allowed for everyone, attach to PUBLIC. Imagine this scenario. I've got a really big schema with lots of relations. Those relations might be changing all the time. But for each principal that uses this schema, I want the privileges for all the tables and views there to be roughly the same. Even though the tables and views come and go, for example, an analyst might need full access to all of them no matter how many there are or what there are at any given time. So to manage this, my first approach I could use is remember to run grants every time a new table or view is created. And not just you but everyone using this schema. Not only is it a pain, it's hard to enforce. The second approach is to use schema-inherited privileges. So in Vertica, schema grants can include relational privileges. For example, SELECT or INSERT, which normally don't mean anything for a schema, but they do for a table. If a relation's marked as inheriting, then the schema grants to a principal, for example, salespeople, also apply to the relation. And you can see on the diagram here how the usage applies to the schema and the SELECT technically but in Sales.foo table, SELECT also applies. So now, instead of lots of GRANT statements for multiple object owners, we only have to run one ALTER SCHEMA statement and three GRANT statements and from then on, any time that you grant some privileges or revoke privileges to or on the schema, to or from a principal, all your new tables and views will get them automatically. So it's dynamically calculated. Now of course, setting it up securely, is that you want to know what's happened here and what's going on. So to monitor the privileges, there are three system tables which you want to look at. The first is grants, which will show you privileges that are active for you. That is your user and active roles and theirs and so on down the chain. Grants will show you the explicit privileges and inherited_privileges will show you the inherited ones. And then there's one more inheriting_objects which will show all tables and views which inherit privileges so that's useful more for not seeing privileges themselves but managing inherited privileges in general. And finally, how do you see all privileges from all these sources, right? In one go, you want to see them together? Well, there's a metafunction added in 9.3.1. Get_privileges_description which will, given an object, it will sum up all the privileges for a current user on that object. I'll refer you to the documentation for usage and supported types. Now, the problem with SELECT. SELECT let's you see everything or nothing. You can either read the table or you can't. But what if you want some principals to see subset or a transformed version of the data. So for example, I have a table with personnel data and different principals, as you can see here, need different access levels to sensitive information. Social security numbers. Well, one thing I could do is I could make a view for each principal. But I could also use access policies and access policies can do this without introducing any new objects or dependencies. It centralizes your restriction logic and makes it easier to manage. So what do access policies do? Well, we've got row and column access policies. Rows will hide and column access policies will transform data in the row or column, depending on who's doing the SELECTing. So it transforms the data, as we saw on the previous slide, to look as requested. Now, if access policies let you see the raw data, you can still modify the data. And the implication of this is that when you're crafting access policies, you should only use them to refine access for principals that need read-only access. That is, if you want a principal to be able to modify it, the access policies you craft should let through the raw data for that principal. So in our previous example, the loader service should be able to see every row and it should be able to see untransformed data in every column. And as long as that's true, then they can continue to load into this table. All of this is of course monitorable by a system table, in this case access_policy. Check the docs for more information on how to implement these. All right, that's it for access control. Now on to delegation and impersonation. So what's the question here? Well, the question is who is Vertica? And that might seem like a silly question, but here's what I mean by that. When Vertica's connecting to a downstream service, for example, cloud storage, how should Vertica identify itself? Well, most of the time, we do the permissions check ourselves and then we connect as Vertica, like in this diagram here. But sometimes we can do better. And instead of connecting as Vertica, we connect with some kind of upstream user identity. And when we do that, we let the service decide who can do what, so Vertica isn't the only line of defense. And in addition to the defense in depth benefit, there are also benefits for auditing because the external system can see who is really doing something. It's no longer just Vertica showing up in that external service's logs, it's somebody like Alice or Bob, trying to do something. One system where this comes into play is with Voltage SecureData. So, let's look at a couple use cases. The first one, I'm just encrypting for compliance or anti-theft reasons. In this case, I'll just use one global identity to encrypt or decrypt with Voltage. But imagine another use case, I want to control which users can decrypt which data. Now I'm using Voltage for access control. So in this case, we want to delegate. The solution here is on the Voltage side, give Voltage users access to appropriate identities and these identities control encryption for sets of data. A Voltage user can access multiple identities like groups. Then on the Vertica side, a Vertica user can set their Voltage username and password in a session and Vertica will talk to Voltage as that Voltage user. So in the diagram here, you can see an example of how this is leverage so that Alice could decrypt something but Bob cannot. Another place the delegation paradigm shows up is with storage. So Vertica can store and interact with data on non-local file systems. For example, HGFS or S3. Sometimes Vertica's storing Vertica-managed data there. For example, in Eon mode, you might store your projections in communal storage in S3. But sometimes, Vertica is interacting with external data. For example, this usually maps to a user storage location in the Vertica side and it might, on the external storage side, be something like Parquet files on Hadoop. And in that case, it's not really Vertica's data and we don't want to give Vertica more power than it needs, so let's request the data on behalf of who needs it. Lets say I'm an analyst and I want to copy from or export to Parquet, using my own bucket. It's not Vertica's bucket, it's my data. But I want Vertica to manipulate data in it. So the first option I have is to give Vertica as a whole access to the bucket and that's problematic because in that case, Vertica becomes kind of an AWS god. It can see any bucket, any Vertica user might want to push or pull data to or from any time Vertica wants. So it's not good for the principals of least access and zero trust. And we can do better than that. So in the second option, use an ID and secret key pair for an AWS, IAM, if you're familiar, principal that does have access to the bucket. So I might use my, the analyst, credentials, or I might use credentials for an AWS role that has even fewer privileges than I do. Sort of a restricted subset of my privileges. And then I use that. I set it in Vertica at the session level and Vertica will use those credentials for the copy export commands. And it gives more isolation. Something that's in the works is support for keyless delegation, using assumable IAM roles. So similar benefits to option two here, but also not having to manage keys at the user level. We can do basically the same thing with Hadoop and HGFS with three different methods. So first option is Kerberos delegation. I think it's the most secure. It definitely, if access control is your primary concern here, this will give you the tightest access control. The downside is it requires the most configuration outside of Vertica with Kerberos and HGFS but with this, you can really determine which Vertica users can talk to which HGFS locations. Then, you've got secure impersonation. If you've got a highly trusted Vertica userbase, or at least some subset of it is, and you're not worried about them doing things wrong but you want to know about auditing on the HGFS side, that's your primary concern, you can use this option. This diagram here gives you a visual overview of how that works. But I'll refer you to the docs for details. And then finally, option three, this is bringing your own delegation token. It's similar to what we do with AWS. We set something in the session level, so it's very flexible. The user can do it at an ad hoc basis, but it is manual, so that's the third option. Now on to auditing and monitoring. So of course, we want to know, what's happening in our database? It's important in general and important for incident response, of course. So your first stop, to answer this question, should be system tables. And they're a collection of information about events, system state, performance, et cetera. They're SELECT-only tables, but they work in queries as usual. The data is just loaded differently. So there are two types generally. There's the metadata table, which stores persistent information or rather reflects persistent information stored in the catalog, for example, users or schemata. Then there are monitoring tables, which reflect more transient information, like events, system resources. Here you can see an example of output from the resource pool's storage table which, these are actually, despite that it looks like system statistics, they're actually configurable parameters for using that. If you're interested in resource pools, a way to handle users' resource allocation and various principal's resource allocation, again, check that out on the docs. Then of course, there's the followup question, who can see all of this? Well, some system information is sensitive and we should only show it to those who need it. Principal of least privilege, right? So of course the superuser can see everything, but what about non-superusers? How do we give access to people that might need additional information about the system without giving them too much power? One option's SYSMONITOR, as I mentioned before, it's a special role. And this role can always read system tables but not change things like a superuser would be able to. Just reading. And another option is the RESTRICT and RELEASE metafunctions. Those grant and revoke access to from a certain system table set, to and from the PUBLIC role. But the downside of those approaches is that they're inflexible. So they only give you, they're all or nothing. For a specific preset of tables. And you can't really configure it per table. So if you're willing to do a little more setup, then I'd recommend using your own grants and roles. System tables support GRANT and REVOKE statements just like any regular relations. And in that case, I wouldn't even bother with SYSMONITOR or the metafunctions. So to do this, just grant whatever privileges you see fit to roles that you create. Then go ahead and grant those roles to the users that you want. And revoke access to the system tables of your choice from PUBLIC. If you need even finer-grained access than this, you can create views on top of system tables. For example, you can create a view on top of the user system table which only shows the current user's information, uses a built-in function that you can use as part of the view definition. And then, you can actually grant this to PUBLIC, so that each user in Vertica could see their own user's information and never give access to the user system table as a whole, just that view. Now if you're a superuser or if you have direct access to nodes in the cluster, filesystem/OS, et cetera, then you have more ways to see events. Vertica supports various methods of logging. You can see a few methods here which are generally outside of running Vertica, you'd interact with them in a different way, with the exception of active events which is a system table. We've also got the data collector. And that sorts events by subjects. So what the data collector does, it extends the logging and system table functionality, by the component, is what it's called in the documentation. And it logs these events and information to rotating files. For example, AnalyzeStatistics is a function that could be of use by users and as a database administrator, you might want to monitor that so you can use the data collector for AnalyzeStatistics. And the files that these create can be exported into a monitoring database. One example of that is with the Management Console Extended Monitoring. So check out their virtual BDC talk. The one on the management console. And that's it for the key points of security in Vertica. Well, many of these slides could spawn a talk on their own, so we encourage you to check out our blog, check out the documentation and the forum for further investigation and collaboration. Hopefully the information we provided today will inform your choices in securing your deployment of Vertica. Thanks for your time today. That concludes our presentation. Now, we're ready for Q&A.
SUMMARY :
in the question box below the slide as it occurs to you So for instance, you can see date of birth encrypted and the question you have to ask is more complex.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Chris | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Chris Morris | PERSON | 0.99+ |
second option | QUANTITY | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
Paige Roberts | PERSON | 0.99+ |
two types | QUANTITY | 0.99+ |
first option | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
Alice | PERSON | 0.99+ |
second approach | QUANTITY | 0.99+ |
Paige | PERSON | 0.99+ |
third option | QUANTITY | 0.99+ |
AWS' | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Today | DATE | 0.99+ |
first approach | QUANTITY | 0.99+ |
second half | QUANTITY | 0.99+ |
each service | QUANTITY | 0.99+ |
Bob | PERSON | 0.99+ |
10 petabytes | QUANTITY | 0.99+ |
Fenic | PERSON | 0.99+ |
first | QUANTITY | 0.99+ |
first source | QUANTITY | 0.99+ |
first one | QUANTITY | 0.99+ |
Fen | PERSON | 0.98+ |
S3 | TITLE | 0.98+ |
One system | QUANTITY | 0.98+ |
first objective | QUANTITY | 0.98+ |
each user | QUANTITY | 0.98+ |
First role | QUANTITY | 0.97+ |
each principal | QUANTITY | 0.97+ |
4/2 | DATE | 0.97+ |
each | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
Vertica | TITLE | 0.97+ |
First | QUANTITY | 0.97+ |
one | QUANTITY | 0.96+ |
this week | DATE | 0.95+ |
three different methods | QUANTITY | 0.95+ |
three system tables | QUANTITY | 0.94+ |
one thing | QUANTITY | 0.94+ |
Fenic Fawkes | PERSON | 0.94+ |
Parquet | TITLE | 0.94+ |
Hadoop | TITLE | 0.94+ |
One example | QUANTITY | 0.93+ |
Dbadmin | PERSON | 0.92+ |
10.0 | QUANTITY | 0.92+ |
UNLIST TILL 4/2 - Keep Data Private
>> Paige: Hello everybody and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled Keep Data Private Prepare and Analyze Without Unencrypting With Voltage SecureData for Vertica. I'm Paige Roberts, Open Source Relations Manager at Vertica, and I'll be your host for this session. Joining me is Rich Gaston, Global Solutions Architect, Security, Risk, and Government at Voltage. And before we begin, I encourage you to submit your questions or comments during the virtual session, you don't have to wait till the end. Just type your question as it occurs to you, or comment, in the question box below the slide and then click Submit. There'll be a Q&A session at the end of the presentation where we'll try to answer as many of your questions as we're able to get to during the time. Any questions that we don't address we'll do our best to answer offline. Now, if you want, you can visit the Vertica Forum to post your questions there after the session. Now, that's going to take the place of the Developer Lounge, and our engineering team is planning to join the Forum, to keep the conversation going. So as a reminder, you can also maximize your screen by clicking the double arrow button, in the lower-right corner of the slides. That'll allow you to see the slides better. And before you ask, yes, this virtual session is being recorded and it will be available to view on-demand this week. We'll send you a notification as soon as it's ready. All right, let's get started. Over to you, Rich. >> Rich: Hey, thank you very much, Paige, and appreciate the opportunity to discuss this topic with the audience. My name is Rich Gaston and I'm a Global Solutions Architect, within the Micro Focus team, and I work on global Data privacy and protection efforts, for many different organizations, looking to take that journey toward breach defense and regulatory compliance, from platforms ranging from mobile to mainframe, everything in between, cloud, you name it, we're there in terms of our solution sets. Vertica is one of our major partners in this space, and I'm very excited to talk with you today about our solutions on the Vertica platform. First, let's talk a little bit about what you're not going to learn today, and that is, on screen you'll see, just part of the mathematics that goes into, the format-preserving encryption algorithm. We are the originators and authors and patent holders on that algorithm. Came out of research from Stanford University, back in the '90s, and we are very proud, to take that out into the market through the NIST standard process, and license that to others. So we are the originators and maintainers, of both standards and athureader in the industry. We try to make this easy and you don't have to learn any of this tough math. Behind this there are also many other layers of technology. They are part of the security, the platform, such as stateless key management. That's a really complex area, and we make it very simple for you. We have very mature and powerful products in that space, that really make your job quite easy, when you want to implement our technology within Vertica. So today, our goal is to make Data protection easy for you, to be able to understand the basics of Voltage Secure Data, you're going to be learning how the Vertica UDx, can help you get started quickly, and we're going to see some examples of how Vertica plus Voltage Secure Data, are going to be working together, in our customer cases out in the field. First, let's take you through a quick introduction to Voltage Secure Data. The business drivers and what's this all about. First of all, we started off with Breach Defense. We see that despite continued investments, in personal perimeter and platform security, Data breaches continue to occur. Voltage Secure Data plus Vertica, provides defense in depth for sensitive Data, and that's a key concept that we're going to be referring to. in the security field defense in depth, is a standard approach to be able to provide, more layers of protection around sensitive assets, such as your Data, and that's exactly what Secure Data is designed to do. Now that we've come through many of these breach examples, and big ticket items, getting the news around breaches and their impact, the business regulators have stepped up, and regulatory compliance, is now a hot topic in Data privacy. Regulations such as GDPR came online in 2018 for the EU. CCPA came online just this year, a couple months ago for California, and is the de-facto standard for the United States now, as organizations are trying to look at, the best practices for providing, regulatory compliance around Data privacy and protection. These gives massive new rights to consumers, but also obligations to organizations, to protect that personal Data. Secure Data Plus Vertica provides, fine grained authorization around sensitive Data, And we're going to show you exactly how that works, within the Vertica platform. At the bottom, you'll see some of the snippets there, of the news articles that just keep racking up, and our goal is to keep you off the news, to keep your company safe, so that you can have the assurance, that even if there is an unintentional, or intentional breach of Data out of the corporation, if it is protected by voltage Secure Data, it will be of no value to those hackers, and then you have no impact, in terms of risk to the organization. What do we mean by defense in depth? Let's take a look first at the encryption types, and the benefits that they provide, and we see our customers implementing, all kinds of different protection mechanisms, within the organization. You could be looking at disk level protection, file system protection, protection on the files themselves. You could protect the entire Database, you could protect our transmissions, as they go from the client to the server via TLS, or other protected tunnels. And then we look at Field-level Encryption, and that's what we're talking about today. That's all the above protections, at the perimeter level at the platform level. Plus, we're giving you granular access control, to your sensitive Data. Our main message is, keep the Data protected for at the earliest possible point, and only access it, when you have a valid business need to do so. That's a really critical aspect as we see Vertica customers, loading terabytes, petabytes of Data, into clusters of Vertica console, Vertica Database being able to give access to that Data, out to a wide variety of end users. We started off with organizations having, four people in an office doing Data science, or analytics, or Data warehousing, or whatever it's called within an organization, and that's now ballooned out, to a new customer coming in and telling us, we're going to have 1000 people accessing it, plus service accounts accessing Vertica, we need to be able to provide fine level access control, and be able to understand what are folks doing with that sensitive Data? And how can we Secure it, the best practices possible. In very simple state, voltage protect Data at rest and in motion. The encryption of Data facilitates compliance, and it reduces your risk of breach. So if you take a look at what we mean by feel level, we could take a name, that name might not just be in US ASCII. Here we have a sort of Latin one extended, example of Harold Potter, and we could take a look at the example protected Data. Notice that we're taking a character set approach, to protecting it, meaning, I've got an alphanumeric option here for the format, that I'm applying to that name. That gives me a mix of alpha and numeric, and plus, I've got some of that Latin one extended alphabet in there as well, and that's really controllable by the end customer. They can have this be just US ASCII, they can have it be numbers for numbers, you can have a wide variety, of different protection mechanisms, including ignoring some characters in the alphabet, in case you want to maintain formatting. We've got all the bells and whistles, that you would ever want, to put on top of format preserving encryption, and we continue to add more to that platform, as we go forward. Taking a look at tax ID, there's an example of numbers for numbers, pretty basic, but it gives us the sort of idea, that we can very quickly and easily keep the Data protected, while maintaining the format. No schema changes are going to be required, when you want to protect that Data. If you look at credit card number, really popular example, and the same concept can be applied to tax ID, often the last four digits will be used in a tax ID, to verify someone's identity. That could be on an automated telephone system, it could be a customer service representative, just trying to validate the security of the customer, and we can keep that Data in the clear for that purpose, while protecting the entire string from breach. Dates are another critical area of concern, for a lot of medical use cases. But we're seeing Date of Birth, being included in a lot of Data privacy conversations, and we can protect dates with dates, they're going to be a valid date, and we have some really nifty tools, to maintain offsets between dates. So again, we've got the real depth of capability, within our encryption, that's not just saying, here's a one size fits all approach, GPS location, customer ID, IP address, all of those kinds of Data strings, can be protected by voltage Secure Data within Vertica. Let's take a look at the UDx basics. So what are we doing, when we add Voltage to Vertica? Vertica stays as is in the center. In fact, if you get the Vertical distribution, you're getting the Secure Data UDx onboard, you just need to enable it, and have Secure Data virtual appliance, that's the box there on the middle right. That's what we come in and add to the mix, as we start to be able to add those capabilities to Vertica. On the left hand side, you'll see that your users, your service accounts, your analytics, are still typically doing Select, Update, Insert, Delete, type of functionality within Vertica. And they're going to come into Vertica's access control layer, they're going to also access those services via SQL, and we simply extend SQL for Vertica. So when you add the UDx, you get additional syntax that we can provide, and we're going to show you examples of that. You can also integrate that with concepts, like Views within Vertica. So that we can say, let's give a view of Data, that gives the Data in the clear, using the UDx to decrypt that Data, and let's give everybody else, access to the raw Data which is protected. Third parties could be brought in, folks like contractors or folks that aren't vetted, as closely as a security team might do, for internal sensitive Data access, could be given access to the Vertical cluster, without risk of them breaching and going into some area, they're not supposed to take a look at. Vertica has excellent control for access, down even to the column level, which is phenomenal, and really provides you with world class security, around the Vertical solution itself. Secure Data adds another layer of protection, like we're mentioning, so that we can have Data protected in use, Data protected at rest, and then we can have the ability, to share that protected Data throughout the organization. And that's really where Secure Data shines, is the ability to protect that Data on mainframe, on mobile, and open systems, in the cloud, everywhere you want to have that Data move to and from Vertica, then you can have Secure Data, integrated with those endpoints as well. That's an additional solution on top, the Secure Data Plus Vertica solution, that is bundled together today for a sales purpose. But we can also have that conversation with you, about those wider Secure Data use cases, we'd be happy to talk to you about that. Security to the virtual appliance, is a lightweight appliance, sits on something like eight cores, 16 gigs of RAM, 100 gig of disk or 200 gig of disk, really a lightweight appliance, you can have one or many. Most customers have four in production, just for redundancy, they don't need them for scale. But we have some customers with 16 or more in production, because they're running such high volumes of transaction load. They're running a lot of web service transactions, and they're running Vertica as well. So we're going to have those virtual appliances, as co-located around the globe, hooked up to all kinds of systems, like Syslog, LDAP, load balancers, we've got a lot of capability within the appliance, to fit into your enterprise IP landscape. So let me get you directly into the neat, of what does the UDx do. If you're technical and you know SQL, this is probably going to be pretty straightforward to you, you'll see the copy command, used widely in Vertica to get Data into Vertica. So let's try to protect that Data when we're ingesting it. Let's grab it from maybe a CSV file, and put it straight into Vertica, but protected on the way and that's what the UDx does. We have Voltage Secure protectors, an added syntax, like I mentioned, to the Vertica SQL. And that allows us to say, we're going to protect the customer first name, using the parameters of hyper alphanumeric. That's our internal lingo of a format, within Secure Data, this part of our API, the API is require very few inputs. The format is the one, that you as a developer will be supplying, and you'll have different ones for maybe SSN, you'll have different formats for street address, but you can reuse a lot of your formats, across a lot of your PII, PHI Data types. Protecting after ingest is also common. So I've got some Data, that's already been put into a staging area, perhaps I've got a landing zone, a sandbox of some sort, now I want to be able to move that, into a different zone in Vertica, different area of the schema, and I want to have that Data protected. We can do that with the update command, and simply again, you'll notice Voltage Secure protect, nothing too wild there, basically the same syntax. We're going to query unprotected Data. How do we search once I've encrypted all my Data? Well, actually, there's a pretty nifty trick to do so. If you want to be able to query unprotected Data, and we have the search string, like a phone number there in this example, simply call Voltage Secure protect on that, now you'll have the cipher text, and you'll be able to search the stored cipher text. Again, we're just format preserving encrypting the Data, and it's just a string, and we can always compare those strings, using standard syntax and SQL. Using views to decrypt Data, again a powerful concept, in terms of how to make this work, within the Vertica Landscape, when you have a lot of different groups of users. Views are very powerful, to be able to point a BI tool, for instance, business intelligence tools, Cognos, Tableau, etc, might be accessing Data from Vertica with simple queries. Well, let's point them to a view that does the hard work, and uses the Vertical nodes, and its horsepower of CPU and RAM, to actually run that Udx, and do the decryption of the Data in use, temporarily in memory, and then throw that away, so that it can't be breached. That's a nice way to keep your users active and working and going forward, with their Data access and Data analytics, while also keeping the Data Secure in the process. And then we might want to export some Data, and push it out to someone in a clear text manner. We've got a third party, needs to take the tax ID along with some Data, to do some processing, all we need to do is call Voltage Secure Access, again, very similar to the protect call, and you're writing the parameter again, and boom, we have decrypted the Data and used again, the Vertical resources of RAM and CPU and horsepower, to do the work. All we're doing with Voltage Secure Data Appliance, is a real simple little key fetch, across a protected tunnel, that's a tiny atomic transaction, gets done very quick, and you're good to go. This is it in terms of the UDx, you have a couple of calls, and one parameter to pass, everything else is config driven, and really, you're up and running very quickly. We can even do demos and samples of this Vertical Udx, using hosted appliances, that we put up for pre sales purposes. So folks want to get up and get a demo going. We could take that Udx, configure it to point to our, appliance sitting on the internet, and within a couple of minutes, we're up and running with some simple use cases. Of course, for on-prem deployment, or deployment in the cloud, you'll want your own appliance in your own crypto district, you have your own security, but it just shows, that we can easily connect to any appliance, and get this working in a matter of minutes. Let's take a look deeper at the voltage plus Vertica solution, and we'll describe some of the use cases and path to success. First of all your steps to, implementing Data-centric security and Vertica. Want to note there on the left hand side, identify sensitive Data. How do we do this? I have one customer, where they look at me and say, Rich, we know exactly what our sensitive Data is, we develop the schema, it's our own App, we have a customer table, we don't need any help in this. We've got other customers that say, Rich, we have a very complex Database environment, with multiple Databases, multiple schemas, thousands of tables, hundreds of thousands of columns, it's really, really complex help, and we don't know what people have been doing exactly, with some of that Data, We've got various teams that share this resource. There, we do have additional tools, I wanted to give a shout out to another microfocus product, which is called Structured Data Manager. It's a great tool that helps you identify sensitive Data, with some really amazing technology under the hood, that can go into a Vertica repository, scan those tables, take a sample of rows or a full table scan, and give you back some really good reports on, we think this is sensitive, let's go confirm it, and move forward with Data protection. So if you need help on that, we've got the tools to do it. Once you identify that sensitive Data, you're going to want to understand, your Data flows and your use cases. Take a look at what analytics you're doing today. What analytics do you want to do, on sensitive Data in the future? Let's start designing our analytics, to work with sensitive Data, and there's some tips and tricks that we can provide, to help you mitigate, any kind of concerns around performance, or any kind of concerns around rewriting your SQL. As you've noted, you can just simply insert our SQL additions, into your code and you're off and running. You want to install and configure the Udx, and secure Data software plants. Well, the UDx is pretty darn simple. The documentation on Vertica is publicly available, you could see how that works, and what you need to configure it, one file here, and you're ready to go. So that's pretty straightforward to process, either grant some access to the Udx, and that's really up to the customer, because there are many different ways, to handle access control in Vertica, we're going to be flexible to fit within your model, of access control and adding the UDx to your mix. Each customer is a little different there, so you might want to talk with us a little bit about, the best practices for your use cases. But in general, that's going to be up and running in just a minute. The security software plants, hardened Linux appliance today, sits on-prem or in the cloud. And you can deploy that. I've seen it done in 15 minutes, but that's what the real tech you had, access to being able to generate a search, and do all this so that, your being able to set the firewall and all the DNS entries, the basically blocking and tackling of a software appliance, you get that done, corporations can take care of that, in just a couple of weeks, they get it all done, because they have wait waiting on other teams, but the software plants are really fast to get stood up, and they're very simple to administer, with our web based GUI. Then finally, you're going to implement your UDx use cases. Once the software appliance is up and running, we can set authentication methods, we could set up the format that you're going to use in Vertica, and then those two start talking together. And it should be going in dev and test in about half a day, and then you're running toward production, in just a matter of days, in most cases. We've got other customers that say, Hey, this is going to be a bigger migration project for us. We might want to split this up into chunks. Let's do the real sensitive and scary Data, like tax ID first, as our sort of toe in the water approach, and then we'll come back and protect other Data elements. That's one way to slice and dice, and implement your solution in a planned manner. Another way is schema based. Let's take a look at this section of the schema, and implement protection on these Data elements. Now let's take a look at the different schema, and we'll repeat the process, so you can iteratively move forward with your deployment. So what's the added value? When you add full Vertica plus voltage? I want to highlight this distinction because, Vertica contains world class security controls, around their Database. I'm an old time DBA from a different product, competing against Vertica in the past, and I'm really aware of the granular access controls, that are provided within various platforms. Vertica would rank at the very top of the list, in terms of being able to give me very tight control, and a lot of different AWS methods, being able to protect the Data, in a lot of different use cases. So Vertica can handle a lot of your Data protection needs, right out of the box. Voltage Secure Data, as we keep mentioning, adds that defense in-Depth, and it's going to enable those, enterprise wide use cases as well. So first off, I mentioned this, the standard of FF1, that is format preserving encryption, we're the authors of it, we continue to maintain that, and we want to emphasize that customers, really ought to be very, very careful, in terms of choosing a NIST standard, when implementing any kind of encryption, within the organization. So 8 ES was one of the first, and Hallmark, benchmark encryption algorithms, and in 2016, we were added to that mix, as FF1 with CS online. If you search NIST, and Voltage Security, you'll see us right there as the author of the standard, and all the processes that went along with that approval. We have centralized policy for key management, authentication, audit and compliance. We can now see that Vertica selected or fetch the key, to be able to protect some Data at this date and time. We can track that and be able to give you audit, and compliance reporting against that Data. You can move protected Data into and out of Vertica. So if we ingest via Kafka, and just via NiFi and Kafka, ingest on stream sets. There are a variety of different ingestion methods, and streaming methods, that can get Data into Vertica. We can integrate secure Data with all of those components. We're very well suited to integrate, with any Hadoop technology or any big Data technology, as we have API's in a variety of languages, bitness and platforms. So we've got that all out of the box, ready to go for you, if you need it. When you're moving Data out of Vertica, you might move it into an open systems platform, you might move it to the cloud, we can also operate and do the decryption there, you're going to get the same plaintext back, and if you protect Data over in the cloud, and move it into Vertica, you're going to be able to decrypt it in Vertica. That's our cross platform promise. We've been delivering on that for many, many years, and we now have many, many endpoints that do that, in production for the world's largest organization. We're going to preserve your Data format, and referential integrity. So if I protect my social security number today, I can protect another batch of Data tomorrow, and that same ciphertext will be generated, when I put that into Vertica, I can have absolute referential integrity on that Data, to be able to allow for analytics to occur, without even decrypting Data in many cases. And we have decrypt access for authorized users only, with the ability to add LDAP authentication authorization, for UDx users. So you can really have a number of different approaches, and flavors of how you implement voltage within Vertica, but what you're getting is the additional ability, to have that confidence, that we've got the Data protected at rest, even if I have a DBA that's not vetted or someone new, or I don't know where this person is from a third party, and being provided access as a DBA level privilege. They could select star from all day long, and they're going to get ciphertext, they're going to have nothing of any value, and if they want to use the UDF to decrypt it, they're going to be tracked and traced, as to their utilization of that. So it allows us to have that control, and additional layer of security on your sensitive Data. This may be required by regulatory agencies, and it's seeming that we're seeing compliance audits, get more and more strict every year. GDPR was kind of funny, because they said in 2016, hey, this is coming, they said in 2018, it's here, and now they're saying in 2020, hey, we're serious about this, and the fines are mounting. And let's give you some examples to kind of, help you understand, that these regulations are real, the fines are real, and your reputational damage can be significant, if you were to be in breach, of a regulatory compliance requirements. We're finding so many different use cases now, popping up around regional protection of Data. I need to protect this Data so that it cannot go offshore. I need to protect this Data, so that people from another region cannot see it. That's all the kind of capability that we have, within secure Data that we can add to Vertica. We have that broad platform support, and I mentioned NiFi and Kafka, those would be on the left hand side, as we start to ingest Data from applications into Vertica. We can have landing zone approaches, where we provide some automated scripting at an OS level, to be able to protect ETL batch transactions coming in. We could protect within the Vertica UDx, as I mentioned, with the copy command, directly using Vertica. Everything inside that dot dash line, is the Vertical Plus Voltage Secure Data combo, that's sold together as a single package. Additionally, we'd love to talk with you, about the stuff that's outside the dash box, because we have dozens and dozens of endpoints, that could protect and access Data, on many different platforms. And this is where you really start to leverage, some of the extensive power of secure Data, to go across platform to handle your web based apps, to handle apps in the cloud, and to handle all of this at scale, with hundreds of thousands of transactions per second, of format preserving encryption. That may not sound like much, but when you take a look at the algorithm, what we're doing on the mathematics side, when you look at everything that goes into that transaction, to me, that's an amazing accomplishment, that we're trying to reach those kinds of levels of scale, and with Vertica, it scales horizontally. So the more nodes you add, the more power you get, the more throughput you're going to get, from voltage secure Data. I want to highlight the next steps, on how we can continue to move forward. Our secure Data team is available to you, to talk about the landscape, your use cases, your Data. We really love the concept that, we've got so many different organizations out there, using secure Data in so many different and unique ways. We have vehicle manufacturers, who are protecting not just the VIN, not just their customer Data, but in fact they're protecting sensor Data from the vehicles, which is sent over the network, down to the home base every 15 minutes, for every vehicle that's on the road, and every vehicle of this customer of ours, since 2017, has included that capability. So now we're talking about, an additional millions and millions of units coming online, as those cars are sold and distributed, and used by customers. That sensor Data is critical to the customer, and they cannot let that be ex-filled in the clear. So they protect that Data with secure Data, and we have a great track record of being able to meet, a variety of different unique requirements, whether it's IoT, whether it's web based Apps, E-commerce, healthcare, all kinds of different industries, we would love to help move the conversations forward, and we do find that it's really a three party discussion, the customer, secure Data experts in some cases, and the Vertica team. We have great enablement within Vertica team, to be able to explain and present, our secure Data solution to you. But we also have that other ability to add other experts in, to keep that conversation going into a broader perspective, of how can I protect my Data across all my platforms, not just in Vertica. I want to give a shout out to our friends at Vertica Academy. They're building out a great demo and training facilities, to be able to help you learn more about these UDx's, and how they're implemented. The Academy, is a terrific reference and resource for your teams, to be able to learn more, about the solution in a self guided way, and then we'd love to have your feedback on that. How can we help you more? What are the topics you'd like to learn more about? How can we look to the future, in protecting unstructured Data? How can we look to the future, of being able to protect Data at scale? What are the requirements that we need to be meeting? Help us through the learning processes, and through feedback to the team, get better, and then we'll help you deliver more solutions, out to those endpoints and protect that Data, so that we're not having Data breach, we're not having regulatory compliance concerns. And then lastly, learn more about the Udx. I mentioned, that all of our content there, is online and available to the public. So vertica.com/secureData , you're going to be able to walk through the basics of the UDX. You're going to see how simple it is to set up, what the UDx syntax looks like, how to grant access to it, and then you'll start to be able to figure out, hey, how can I start to put this, into a PLC in my own environment? Like I mentioned before, we have publicly available hosted appliance, for demo purposes, that we can make available to you, if you want to PLC this. Reach out to us. Let's get a conversation going, and we'll get you the address and get you some instructions, we can have a quick enablement session. We really want to make this accessible to you, and help demystify the concept of encryption, because when you see it as a developer, and you start to get your hands on it and put it to use, you can very quickly see, huh, I could use this in a variety of different cases, and I could use this to protect my Data, without impacting my analytics. Those are some of the really big concerns that folks have, and once we start to get through that learning process, and playing around with it in a PLC way, that we can start to really put it to practice into production, to say, with confidence, we're going to move forward toward Data encryption, and have a very good result, at the end of the day. This is one of the things I find with customers, that's really interesting. Their biggest stress, is not around the timeframe or the resource, it's really around, this is my Data, I have been working on collecting this Data, and making it available in a very high quality way, for many years. This is my job and I'm responsible for this Data, and now you're telling me, you're going to encrypt that Data? It makes me nervous, and that's common, everybody feels that. So we want to have that conversation, and that sort of trial and error process to say, hey, let's get your feet wet with it, and see how you like it in a sandbox environment. Let's now take that into analytics, and take a look at how we can make this, go for a quick 1.0 release, and let's then take a look at, future expansions to that, where we start adding Kafka on the ingest side. We start sending Data off, into other machine learning and analytics platforms, that we might want to utilize outside of Vertica, for certain purposes, in certain industries. Let's take a look at those use cases together, and through that journey, we can really chart a path toward the future, where we can really help you protect that Data, at rest, in use, and keep you safe, from both the hackers and the regulators, and that I think at the end of the day, is really what it's all about, in terms of protecting our Data within Vertica. We're going to have a little couple minutes for Q&A, and we would encourage you to have any questions here, and we'd love to follow up with you more, about any questions you might have, about Vertica Plus Voltage Secure Data. They you very much for your time today.
SUMMARY :
and our engineering team is planning to join the Forum, and our goal is to keep you off the news,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Vertica | ORGANIZATION | 0.99+ |
100 gig | QUANTITY | 0.99+ |
16 | QUANTITY | 0.99+ |
16 gigs | QUANTITY | 0.99+ |
200 gig | QUANTITY | 0.99+ |
Paige Roberts | PERSON | 0.99+ |
2016 | DATE | 0.99+ |
Paige | PERSON | 0.99+ |
Rich Gaston | PERSON | 0.99+ |
dozens | QUANTITY | 0.99+ |
2018 | DATE | 0.99+ |
Vertica Academy | ORGANIZATION | 0.99+ |
2020 | DATE | 0.99+ |
SQL | TITLE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
First | QUANTITY | 0.99+ |
1000 people | QUANTITY | 0.99+ |
Hallmark | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Harold Potter | PERSON | 0.99+ |
Rich | PERSON | 0.99+ |
millions | QUANTITY | 0.99+ |
Stanford University | ORGANIZATION | 0.99+ |
15 minutes | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
Each customer | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
California | LOCATION | 0.99+ |
Kafka | TITLE | 0.99+ |
Vertica | TITLE | 0.99+ |
Latin | OTHER | 0.99+ |
tomorrow | DATE | 0.99+ |
2017 | DATE | 0.99+ |
eight cores | QUANTITY | 0.99+ |
two | QUANTITY | 0.98+ |
GDPR | TITLE | 0.98+ |
first | QUANTITY | 0.98+ |
one customer | QUANTITY | 0.98+ |
Tableau | TITLE | 0.98+ |
United States | LOCATION | 0.97+ |
this week | DATE | 0.97+ |
Vertica | LOCATION | 0.97+ |
4/2 | DATE | 0.97+ |
Linux | TITLE | 0.97+ |
one file | QUANTITY | 0.96+ |
vertica.com/secureData | OTHER | 0.96+ |
four | QUANTITY | 0.95+ |
about half a day | QUANTITY | 0.95+ |
Cognos | TITLE | 0.95+ |
four people | QUANTITY | 0.94+ |
Udx | ORGANIZATION | 0.94+ |
one way | QUANTITY | 0.94+ |