Barak Schoster, Palo Alto Networks | CUBE Conversation 2022

>>Hello, everyone. Welcome to this cube conversation. I'm here in Palo Alto, California. I'm John furrier, host of the cube, and we have a great guest here. Barack Shuster. Who's in Tel-Aviv senior director of chief architect at bridge crew, a part of Palo Alto networks. He was formerly the co-founder of the company, then sold to Palo Alto networks Brock. Thanks for coming on this cube conversation. >>Thanks John. Great to be here. >>So one of the things I love about open source, and you're seeing a lot more of the trend now that talking about, you know, people doing incubators all over the world, having open source and having a builder, people who are starting companies, it's coming more and more, you you're one of them. And you've been part of this security open source cloud infrastructure infrastructure as code going back a while, and you guys had a lot of success. Now, open source infrastructure as code has moved up to the stack, certainly lot going down at the network layer, but developers just want to build security from day one, right? They don't want to have to get into the, the, the waiting game of slowing down their pipelining of code in the CIC D they want to move faster. And this has been one of the core conversations this year is how to make developers more productive and not just a cliche, but actually more productive and not have to wait to implement cloud native. Right. So you're in the middle of it. And you've got you're in, tell us, tell us what you guys are dealing with that, >>Right? Yeah. So I hear these needles working fast, having a large velocity of releases from many of my friends, the SRAs, the DevOps, and the security practitioners in different companies. And the thing that we asked ourselves three years ago was how can we simplify the process and make the security teams an enabler instead of a gatekeeper that blocks the releases? And the thing that we've done, then we understood that we should do is not only doing runtime scanning of the cloud infrastructure and the cloud native clusters, but also shift left the findings and fixings the remediation of security issues to the level of the code. So we started doing infrastructure is good. We Terraform Kubernetes manifests cloud formation, server less, and the list goes on and we created an open source product around it, named checkup, which has an amazing community of hundreds of contributors. Not all of them are Palo Alto employees. Most of them are community users from various companies. And we tried to and succeeded to the democratic side is the creation of policy as code the ability to inspect your infrastructure as code and tell you, Hey, this is the best practice that you should use consider using it before applying a misconfigured S3 bucket into production, or before applying a misconfigured Kubernetes cluster into your production or dev environment. And the goal, >>The goal, >>The goal is to do that from the ID from the moment that you write code and also to inspect your configuration in CGI and CD and in runtime. And also understand that if there is any drift out there and the ability to fix that in the source code, in the blueprint itself. >>So what I hear you saying is really two problems you're solving. One is the organizational policies around how things were done in a environment before all the old way. You know, the security teams do a review, you send a ticket, things are waiting, stop, wait, hurry up and wait kind of thing. And then there's the technical piece of it, right? Is that there's two pieces to that. >>Yeah, I think that one thing is the change of the methodologies. We understood that we should just work differently than what we used to do. Tickets are slow. They have priorities. You have a bottleneck, which is a small team of security practitioners. And honestly, a lot of the work is repetitive and can be democratized into the engineering teams. They should be able to understand, Hey, I wrote the code piece that provision this instance, I am the most suitable person as a developer to fix that piece of code and reapply it to the runtime environment. >>And then it also sets the table for our automation. It sets the table for policies, things that make things more efficient scaling. Cause you mentioned SRS are a big part of this to dev ops and SRE. Those, those folks are, are trying to move as fast as possible at scale, huge scale challenge. How does that impact the scale piece become into here? >>So both themes Esri's and security teams are about a link to deploying application, but new application releases into the production environment. And the thing that you can do is you can inspect all kinds of best practices, not only security, best practices, but also make sure that you have provision concurrencies on your serverless functions or the amount of auto-scaling groups is what you expect it to be. And you can scan all of those things in the level of your code before applying it to production. >>That's awesome. So good, good benefits scales a security team. It sounds like too as well. You could get that policy out there. So great stuff. I want to really quickly ask you about the event. You're hosting code two cloud summit. What are we going to see there? I'm going to host a panel. Of course, I'm looking forward to that as well. You get a lot of experts coming in there. Why are you having this event and what topics will be covered? >>So we wanted to talk on all of the shifts, left movement and all of the changes that have happened in the cloud security market since inception till today. And we brought in great people and great practitioners from both the dev ops side, the chaos engineering and the security practitioners, and everybody are having their opinion on what's the current status state, how things should be implemented in a mature environment and what the future might hold for the code and cloud security markets. The thing that we're going to focus on is all of the supply chain from securing the CCD itself, making sure your actions are not vulnerable to a shut injection or making sure your version control system are configured correctly with single sign-on MFA and having branch protection rules, but also open source security like SCA software composition analysis infrastructure as code security. Obviously Ron thinks security drifts and Kubernetes security. So we're going to talk on all of those different aspects and how each and every team is mitigating. The different risks that come with. >>You know, one of the things that you bring up when you hear you talking is that's the range of, of infrastructure as code. How has infrastructure as code changed? Cause you're, you know, there's dev ops and SRS now application developers, you still have to have programmable infrastructure. I mean, if infrastructure code is real realize up and down the stack, all aspects need to be programmable, which means you got to have the data, you got to have the ability to automate. How would you summarize kind of the state of infrastructure as code? >>So a few years ago, we started with physical servers where you carried the infrastructure on our back. I, I mounted them on the rack myself a few years ago and connected all of the different cables then came the revolution of BMS. We didn't do that anymore. We had one beefy appliance and we had 60 virtual servers running on one appliance. So we didn't have to carry new servers every time into the data center then came the cloud and made everything API first. And they bill and enabled us to write the best scripts to provision those resources. But it was not enough because he wanted to have a reproducible environment. The is written either in declarative language like Terraform or CloudFormation or imperative like CDK or polluted, but having a consistent way to deploy your application to multiple environments. And the stage after that is having some kind of a service catalog that will allow application developer to get the new releases up and running. >>And the way that it has evolved mass adoption of infrastructure as code is already happening. But that introduces the ability for velocity in deployment, but also new kinds of risks that we haven't thought about before as security practitioners, for example, you should vet all of the open source Terraform modules that you're using because you might have a leakage. Our form has a lot of access to secrets in your environment. And the state really contains sensitive objects like passwords. The other thing that has changed is we today we rely a lot on cloud infrastructure and on the past year we've seen the law for shell attack, for example, and also cloud providers have disclosed that they were vulnerable to log for shell attack. So we understand today that when we talk about cloud security, it's not only about the infrastructure itself, but it's also about is the infrastructure that we're using is using an open source package that is vulnerable. Are we using an open source package that is vulnerable, is our development pipeline is configured and the list goes on. So it's really a new approach of analyzing the entire software bill of material also called Asbell and understanding the different risks there. >>You know, I think this is a really great point and great insight because new opera, new solutions for new problems are new opportunities, right? So open source growth has been phenomenal. And you mentioned some of those Terraform and one of the projects and you started one checkoff, they're all good, but there's some holes in there and it's open source, it's free, everyone's building on it. So, you know, you have, and that's what it's for. And I think now is open source goes to the next level again, another generational inflection point it's it's, there's more contributors there's companies are involved. People are using it more. It becomes a really strong integration opportunity. So, so it's all free and it's how you use it. So this is a new kind of extension of how open source is used. And if you can factor in some of the things like, like threat vectors, you have to know the code. >>So there's no way to know it all. So you guys are scanning it doing things, but it's also huge system. It's not just one piece of code. You talking about cloud is becoming an operating system. It's a distributed computing environment, so whole new area of problem space to solve. So I love that. Love that piece. Where are you guys at on this now? How do you feel in terms of where you are in the progress bar of the solution? Because the supply chain is usually a hardware concept. People can relate to, but when you bring in software, how you source software is like sourcing a chip or, or a piece of hardware, you got to watch where it came from and you gotta track track that. So, or scan it and validate it, right? So these are new, new things. Where are we with? >>So you're, you're you're right. We have a lot of moving parts. And really the supply chain terms of came from the automobile industry. You have a car, you have an engine engine might be created by a different vendor. You have the wheels, they might be created by a different vendor. So when you buy your next Chevy or Ford, you might have a wheels from continental or other than the first. And actually software is very similar. When we build software, we host it on a cloud provider like AWS, GCP, Azure, not on our own infrastructure anymore. And when we're building software, we're using open-source packages that are maintained in the other half of the war. And we don't always know in person, the people who've created that piece. And we do not have a vetting process, even a human vetting process on these, everything that we've created was really made by us or by a trusted source. >>And this is where we come in. We help you empower you, the engineer, we tools to analyze all of the dependency tree of your software, bill of materials. We will scan your infrastructure code, your application packages that you're using from package managers like NPM or PI. And we scan those open source dependencies. We would verify that your CIC is secure. Your version control system is secure. And the thing that we will always focus on is making a fixed accessible to you. So let's say that you're using a misconfigured backup. We have a bot that will fix the code for you. And let's say that you have a, a vulnerable open-source package and it was fixed in a later version. We will bump the version for you to make your code secure. And we will also have the same process on your run time environment. So we will understand that your environment is secure from code to cloud, or if there are any three out there that your engineering team should look at, >>That's a great service. And I think this is cutting edge from a technology perspective. What's what are some of the new cloud native technologies that you see in emerging fast, that's getting traction and ultimately having a product market fit in, in this area because I've seen Cooper. And you mentioned Kubernetes, that's one of the areas that have a lot more work to do or being worked on now that customers are paying attention to. >>Yeah, so definitely Kubernetes is, has started in growth companies and now it's existing every fortune 100 companies. So you can find anything, every large growler scale organization and also serverless functions are, are getting into a higher adoption rate. I think that the thing that we seeing the most massive adoption off is actually infrastructure as code during COVID. A lot of organization went through a digital transformation and in that process, they have started to work remotely and have agreed on migrating to a new infrastructure, not the data center, but the cloud provider. So at other teams that were not experienced with those clouds are now getting familiar with it and getting exposed to new capabilities. And with that also new risks. >>Well, great stuff. Great to chat with you. I want to ask you while you're here, you mentioned depth infrastructure as code for the folks that get it right. There's some significant benefits. We don't get it. Right. We know what that looks like. What are some of the benefits that can you share personally, or for the folks watching out there, if you get it for sure. Cause code, right? What does the future look like? What does success look like? What's that path look like when you get it right versus not doing it or getting it wrong? >>I think that every engineer dream is wanting to be impactful, to work fast and learn new things and not to get a PagerDuty on a Friday night. So if you get infrastructure ride, you have a process where everything is declarative and is peer reviewed both by you and automated frameworks like bridge and checkoff. And also you have the ability to understand that, Hey, once I re I read it once, and from that point forward, it's reproducible and it also have a status. So only changes will be applied and it will enable myself and my team to work faster and collaborate in a better way on the cloud infrastructure. Let's say that you'd done doing infrastructure as code. You have one resource change by one team member and another resource change by another team member. And the different dependencies between those resources are getting fragmented and broken. You cannot change your database without your application being aware of that. You cannot change your load Bonser without the obligation being aware of that. So infrastructure skullduggery enables you to do those changes in a, in a mature fashion that will foes Le less outages. >>Yeah. A lot of people getting PagerDuty's on Friday, Saturday, and Sunday, and on the old way, new way, new, you don't want to break up your Friday night after a nice dinner, either rock, do you know? Well, thanks for coming in all the way from Tel-Aviv really appreciate it. I wish you guys, everything the best over there in Delhi, we will see you at the event that's coming up. We're looking forward to the code to cloud summit and all the great insight you guys will have. Thanks for coming on and sharing the story. Looking forward to talking more with you Brock thanks for all the insight on security infrastructures code and all the cool things you're doing at bridge crew. >>Thank you, John. >>Okay. This is the cube conversation here at Palo Alto, California. I'm John furrier hosted the cube. Thanks for watching.

Published Date : Mar 18 2022

SUMMARY :

host of the cube, and we have a great guest here. So one of the things I love about open source, and you're seeing a lot more of the trend now that talking about, And the thing that we asked ourselves The goal is to do that from the ID from the moment that you write code and also You know, the security teams do a review, you send a ticket, things are waiting, stop, wait, hurry up and wait kind of thing. And honestly, a lot of the work is repetitive and can How does that impact the scale piece become into here? And the thing that you can do is you can inspect all kinds of best practices, I want to really quickly ask you about the event. all of the supply chain from securing the CCD itself, You know, one of the things that you bring up when you hear you talking is that's the range of, of infrastructure as code. And the stage after that is having some kind of And the way that it has evolved mass adoption of infrastructure as code And if you can factor in some of the things like, like threat vectors, So you guys are scanning it doing things, but it's also huge system. So when you buy your next Chevy And the thing that we will And you mentioned Kubernetes, that's one of the areas that have a lot more work to do or being worked So you can find anything, every large growler scale What are some of the benefits that can you share personally, or for the folks watching And the different dependencies between and all the great insight you guys will have. I'm John furrier hosted the cube.

ENTITIES

Entity	Category	Confidence
Barack Shuster	PERSON	0.99+
John	PERSON	0.99+
Delhi	LOCATION	0.99+
Barak Schoster	PERSON	0.99+
Brock	PERSON	0.99+
two pieces	QUANTITY	0.99+
Ford	ORGANIZATION	0.99+
Ron	PERSON	0.99+
Tel-Aviv	LOCATION	0.99+
Sunday	DATE	0.99+
Saturday	DATE	0.99+
Palo Alto, California	LOCATION	0.99+
Friday night	DATE	0.99+
two problems	QUANTITY	0.99+
60 virtual servers	QUANTITY	0.99+
Friday	DATE	0.99+
hundreds of contributors	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
Chevy	ORGANIZATION	0.99+
both	QUANTITY	0.99+
both themes	QUANTITY	0.99+
One	QUANTITY	0.98+
100 companies	QUANTITY	0.98+
one	QUANTITY	0.98+
Friday night	DATE	0.98+
one appliance	QUANTITY	0.98+
today	DATE	0.98+
Brock	ORGANIZATION	0.98+
three	QUANTITY	0.98+
AWS	ORGANIZATION	0.98+
three years ago	DATE	0.97+
this year	DATE	0.97+
first	QUANTITY	0.97+
John furrier	PERSON	0.97+
one thing	QUANTITY	0.96+
past year	DATE	0.95+
Kubernetes	ORGANIZATION	0.94+
single	QUANTITY	0.94+
one resource	QUANTITY	0.91+
few years ago	DATE	0.91+
Terraform	ORGANIZATION	0.91+
one piece of code	QUANTITY	0.86+
day one	QUANTITY	0.86+
one team member	QUANTITY	0.83+
PagerDuty	ORGANIZATION	0.83+
once	QUANTITY	0.8+
GCP	ORGANIZATION	0.78+
Azure	ORGANIZATION	0.76+
each	QUANTITY	0.72+
Palo Alto	LOCATION	0.71+
Palo	LOCATION	0.71+
SRS	TITLE	0.71+
beefy	ORGANIZATION	0.7+
CDK	ORGANIZATION	0.68+
2022	DATE	0.68+
Kubernetes	TITLE	0.67+
D	EVENT	0.58+
CloudFormation	TITLE	0.58+
Alto	ORGANIZATION	0.55+
two cloud	EVENT	0.55+
every team	QUANTITY	0.54+
Asbell	TITLE	0.53+
S3	TITLE	0.52+
CGI	TITLE	0.5+
Cooper	ORGANIZATION	0.5+
Esri	PERSON	0.5+
bridge	ORGANIZATION	0.49+
Conversation	EVENT	0.42+
COVID	TITLE	0.39+

Nimrod Vax, BigID | AWS re:Invent 2020 Partner Network Day

>> Announcer: From around the globe, it's theCUBE. With digital coverage of AWS re:Invent 2020. Special coverage sponsored by AWS global partner network. >> Okay, welcome back everyone to theCUBE virtual coverage of re:Invent 2020 virtual. Normally we're in person, this year because of the pandemic we're doing remote interviews and we've got a great coverage here of the APN, Amazon Partner Network experience. I'm your host John Furrier, we are theCUBE virtual. Got a great guest from Tel Aviv remotely calling in and videoing, Nimrod Vax, who is the chief product officer and co-founder of BigID. This is the beautiful thing about remote, you're in Tel Aviv, I'm in Palo Alto, great to see you. We're not in person but thanks for coming on. >> Thank you. Great to see you as well. >> So you guys have had a lot of success at BigID, I've noticed a lot of awards, startup to watch, company to watch, kind of a good market opportunity data, data at scale, identification, as the web evolves beyond web presence identification, authentication is super important. You guys are called BigID. What's the purpose of the company? Why do you exist? What's the value proposition? >> So first of all, best startup to work at based on Glassdoor worldwide, so that's a big achievement too. So look, four years ago we started BigID when we realized that there is a gap in the market between the new demands from organizations in terms of how to protect their personal and sensitive information that they collect about their customers, their employees. The regulations were becoming more strict but the tools that were out there, to the large extent still are there, were not providing to those requirements and organizations have to deal with some of those challenges in manual processes, right? For example, the right to be forgotten. Organizations need to be able to find and delete a person's data if they want to be deleted. That's based on GDPR and later on even CCPA. And organizations have no way of doing it because the tools that were available could not tell them whose data it is that they found. The tools were very siloed. They were looking at either unstructured data and file shares or windows and so forth, or they were looking at databases, there was nothing for Big Data, there was nothing for cloud business applications. And so we identified that there is a gap here and we addressed it by building BigID basically to address those challenges. >> That's great, great stuff. And I remember four years ago when I was banging on the table and saying, you know regulation can stunt innovation because you had the confluence of massive platform shifts combined with the business pressure from society. That's not stopping and it's continuing today. You seeing it globally, whether it's fake news in journalism, to privacy concerns where modern applications, this is not going away. You guys have a great market opportunity. What is the product? What is smallID? What do you guys got right now? How do customers maintain the success as the ground continues to shift under them as platforms become more prevalent, more tools, more platforms, more everything? >> So, I'll start with BigID. What is BigID? So BigID really helps organizations better manage and protect the data that they own. And it does that by connecting to everything you have around structured databases and unstructured file shares, big data, cloud storage, business applications and then providing very deep insight into that data. Cataloging all the data, so you know what data you have where and classifying it so you know what type of data you have. Plus you're analyzing the data to find similar and duplicate data and then correlating them to an identity. Very strong, very broad solution fit for IT organization. We have some of the largest organizations out there, the biggest retailers, the biggest financial services organizations, manufacturing and et cetera. What we are seeing is that there are, with the adoption of cloud and business success obviously of AWS, that there are a lot of organizations that are not as big, that don't have an IT organization, that have a very well functioning DevOps organization but still have a very big footprint in Amazon and in other kind of cloud services. And they want to get visibility and they want to do it quickly. And the SmallID is really built for that. SmallID is a lightweight version of BigID that is cloud-native built for your AWS environment. And what it means is that you can quickly install it using CloudFormation templates straight from the AWS marketplace. Quickly stand up an environment that can scan, discover your assets in your account automatically and give you immediate visibility into that, your S3 bucket, into your DynamoDB environments, into your EMR clusters, into your Athena databases and immediately building a full catalog of all the data, so you know what files you have where, you know where what tables, what technical metadata, operational metadata, business metadata and also classified data information. So you know where you have sensitive information and you can immediately address that and apply controls to that information. >> So this is data discovery. So the use case is, I'm an Amazon partner, I mean we use theCUBE virtuals on Amazon, but let's just say hypothetically, we're growing like crazy. Got S3 buckets over here secure, encrypted and the rest, all that stuff. Things are happening, we're growing like a weed. Do we just deploy smallIDs and how it works? Is that use cases, SmallID is for AWS and BigID for everything else or? >> You can start small with SmallID, you get the visibility you need, you can leverage the automation of AWS so that you automatically discover those data sources, connect to them and get visibility. And you could grow into BigID using the same deployment inside AWS. You don't have to switch migrate and you use the same container cluster that is running inside your account and automatically scale it up and then connect to other systems or benefit from the more advanced capabilities the BigID can offer such as correlation, by connecting to maybe your Salesforce, CRM system and getting the ability to correlate to your customer data and understand also whose data it is that you're storing. Connecting to your on-premise mainframe, with the same deployment connecting to your Google Drive or office 365. But the point is that with the smallID you can really start quickly, small with a very small team and get that visibility very quickly. >> Nimrod, I want to ask you a question. What is the definition of cloud native data discovery? What does that mean to you? >> So cloud native means that it leverages all the benefits of the cloud. Like it gets all of the automation and visibility that you get in a cloud environment versus any traditional on-prem environment. So one thing is that BigID is installed directly from your marketplace. So you could browse, find its solution on the AWS marketplace and purchase it. It gets deployed using CloudFormation templates very easily and very quickly. It runs on a elastic container service so that once it runs you can automatically scale it up and down to increase the scan and the scale capabilities of the solution. It connects automatically behind the scenes into the security hub of AWS. So you get those alerts, the policy alerts fed into your security hub. It has integration also directly into the native logging capabilities of AWS. So your existing Datadog or whatever you're using for monitoring can plug into it automatically. That's what we mean by cloud native. >> And if you're cloud native you got to be positioned to take advantage of the data and machine learning in particular. Can you expand on the role of machine learning in your solution? Customers are leaning in heavily this year, you're seeing more uptake on machine learning which is basically AI, AI is machine learning, but it's all tied together. ML is big on all the deployments. Can you share your thoughts? >> Yeah, absolutely. So data discovery is a very tough problem and it has been around for 20 years. And the traditional methods of classifying the data or understanding what type of data you have has been, you're looking at the pattern of the data. Typically regular expressions or types of kind of pattern-matching techniques that look at the data. But sometimes in order to know what is personal or what is sensitive it's not enough to look at the pattern of the data. How do you distinguish between a date of birth and any other date. Date of birth is much more sensitive. How do you find country of residency or how do you identify even a first name from the last name? So for that, you need more advanced, more sophisticated capabilities that go beyond just pattern matching. And BigID has a variety of those techniques, we call that discovery-in-depth. What it means is that very similar to security-in-depth where you can not rely on a single security control to protect your environment, you can not rely on a single discovery method to truly classify the data. So yes, we have regular expression, that's the table state basic capability of data classification but if you want to find data that is more contextual like a first name, last name, even a phone number and distinguish between a phone number and just a sequence of numbers, you need more contextual NLP based discovery, name entity recognition. We're using (indistinct) to extract and find data contextually. We also apply deep learning, CNN capable, it's called CNN, which is basically deep learning in order to identify and classify document types. Which is basically being able to distinguish between a resume and a application form. Finding financial records, finding medical records. So RA are advanced NLP classifiers can find that type of data. The more advanced capabilities that go beyond the smallID into BigID also include cluster analysis which is an unsupervised machine learning method of finding duplicate and similar data correlation and other techniques that are more contextual and need to use machine learning for that. >> Yeah, and unsupervised that's a lot harder than supervised. You need to have that ability to get that what you can't see. You got to get the blind spots identified and that's really the key observational data you need. This brings up the kind of operational you heard cluster, I hear governance security you mentioned earlier GDPR, this is an operational impact. Can you talk about how it impacts on specifically on the privacy protection and governance side because certainly I get the clustering side of it, operationally just great. Everyone needs to get that. But now on the business model side, this is where people are spending a lot of time scared and worried actually. What the hell to do? >> One of the things that we realized very early on when we started with BigID is that everybody needs a discovery. You need discovery and we actually started with privacy. You need discovery in route to map your data and apply the privacy controls. You need discovery for security, like we said, right? Find and identify sensitive data and apply controls. And you also need discovery for data enablement. You want to discover the data, you want to enable it, to govern it, to make it accessible to the other parts of your business. So discovery is really a foundation and starting point and that you get there with smallID. How do you operationalize that? So BigID has the concept of an application framework. Think about it like an Apple store for data discovery where you can run applications inside your kind of discovery iPhone in order to run specific (indistinct) use cases. So, how do you operationalize privacy use cases? We have applications for privacy use cases like subject access requests and data rights fulfillment, right? Under the CCPA, you have the right to request your data, what data is being stored about you. BigID can help you find all that data in the catalog that after we scan and find that information we can find any individual data. We have an application also in the privacy space for consent governance right under CCP. And you have the right to opt out. If you opt out, your data cannot be sold, cannot be used. How do you enforce that? How do you make sure that if someone opted out, that person's data is not being pumped into Glue, into some other system for analytics, into Redshift or Snowflake? BigID can identify a specific person's data and make sure that it's not being used for analytics and alert if there is a violation. So that's just an example of how you operationalize this knowledge for privacy. And we have more examples also for data enablement and data management. >> There's so much headroom opportunity to build out new functionality, make it programmable. I really appreciate what you guys are doing, totally needed in the industry. I could just see endless opportunities to make this operationally scalable, more programmable, once you kind of get the foundation out there. So congratulations, Nimrod and the whole team. The question I want to ask you, we're here at re:Invent's virtual, three weeks we're here covering Cube action, check out theCUBE experience zone, the partner experience. What is the difference between BigID and say Amazon's Macy? Let's think about that. So how do you compare and contrast, in Amazon they say we love partnering, but we promote our ecosystem. You guys sure have a similar thing. What's the difference? >> There's a big difference. Yes, there is some overlap because both a smallID and Macy can classify data in S3 buckets. And Macy does a pretty good job at it, right? I'm not arguing about it. But smallID is not only about scanning for sensitive data in S3. It also scans anything else you have in your AWS environment, like DynamoDB, like EMR, like Athena. We're also adding Redshift soon, Glue and other rare data sources as well. And it's not only about identifying and alerting on sensitive data, it's about building full catalog (indistinct) It's about giving you almost like a full registry of your data in AWS, where you can look up any type of data and see where it's found across structured, unstructured big data repositories that you're handling inside your AWS environment. So it's broader than just for security. Apart from the fact that they're used for privacy, I would say the biggest value of it is by building that catalog and making it accessible for data enablement, enabling your data across the board for other use cases, for analytics in Redshift, for Glue, for data integrations, for various other purposes. We have also integration into Kinesis to be able to scan and let you know which topics, use what type of data. So it's really a very, very robust full-blown catalog of the data that across the board that is dynamic. And also like you mentioned, accessible to APIs. Very much like the AWS tradition. >> Yeah, great stuff. I got to ask you a question while you're here. You're the co-founder and again congratulations on your success. Also the chief product officer of BigID, what's your advice to your colleagues and potentially new friends out there that are watching here? And let's take it from the entrepreneurial perspective. I have an application and I start growing and maybe I have funding, maybe I take a more pragmatic approach versus raising billions of dollars. But as you grow the pressure for AppSec reviews, having all the table stakes features, how do you advise developers or entrepreneurs or even business people, small medium-sized enterprises to prepare? Is there a way, is there a playbook to say, rather than looking back saying, oh, I didn't do with all the things I got to go back and retrofit, get BigID. Is there a playbook that you see that will help companies so they don't get killed with AppSec reviews and privacy compliance reviews? Could be a waste of time. What's your thoughts on all this? >> Well, I think that very early on when we started BigID, and that was our perspective is that we knew that we are a security and privacy company. So we had to take that very seriously upfront and be prepared. Security cannot be an afterthought. It's something that needs to be built in. And from day one we have taken all of the steps that were needed in order to make sure that what we're building is robust and secure. And that includes, obviously applying all of the code and CI/CD tools that are available for testing your code, whether it's (indistinct), these type of tools. Applying and providing, penetration testing and working with best in line kind of pen testing companies and white hat hackers that would look at your code. These are kind of the things that, that's what you get funding for, right? >> Yeah. >> And you need to take advantage of that and use them. And then as soon as we got bigger, we also invested in a very, kind of a very strong CSO that comes from the industry that has a lot of expertise and a lot of credibility. We also have kind of CSO group. So, each step of funding we've used extensively also to make RM kind of security poster a lot more robust and invisible. >> Final question for you. When should someone buy BigID? When should they engage? Is it something that people can just download immediately and integrate? Do you have to have, is the go-to-market kind of a new target the VP level or is it the... How does someone know when to buy you and download it and use the software? Take us through the use case of how customers engage with. >> Yeah, so customers directly have those requirements when they start hitting and having to comply with regulations around privacy and security. So very early on, especially organizations that deal with consumer information, get to a point where they need to be accountable for the data that they store about their customers and they want to be able to know their data and provide the privacy controls they need to their consumers. For our BigID product this typically is a kind of a medium size and up company, and with an IT organization. For smallID, this is a good fit for companies that are much smaller, that operate mostly out of their, their IT is basically their DevOps teams. And once they have more than 10, 20 data sources in AWS, that's where they start losing count of the data that they have and they need to get more visibility and be able to control what data is being stored there. Because very quickly you start losing count of data information, even for an organization like BigID, which isn't a bigger organization, right? We have 200 employees. We are at the point where it's hard to keep track and keep control of all the data that is being stored in all of the different data sources, right? In AWS, in Google Drive, in some of our other sources, right? And that's the point where you need to start thinking about having that visibility. >> Yeah, like all growth plan, dream big, start small and get big. And I think that's a nice pathway. So small gets you going and you lead right into the BigID. Great stuff. Final, final question for you while I gatchu here. Why the awards? Someone's like, hey, BigID is this cool company, love the founder, love the team, love the value proposition, makes a lot of sense. Why all the awards? >> Look, I think one of the things that was compelling about BigID from the beginning is that we did things differently. Our whole approach for personal data discovery is unique. And instead of looking at the data, we started by looking at the identities, the people and finally looking at their data, learning how their data looks like and then searching for that information. So that was a very different approach to the traditional approach of data discovery. And we continue to innovate and to look at those problems from a different perspective so we can offer our customers an alternative to what was done in the past. It's not saying that we don't do the basic stuffs. The Reg X is the connectivity that that is needed. But we always took a slightly different approach to diversify, to offer something slightly different and more comprehensive. And I think that was the thing that really attracted us from the beginning with the RSA Innovation Sandbox award that we won in 2018, the Gartner Cool Vendor award that we received. And later on also the other awards. And I think that's the unique aspect of BigID. >> You know you solve big problems than certainly as needed. We saw this early on and again I don't think that the problem is going to go away anytime soon, platforms are emerging, more tools than ever before that converge into platforms and as the logic changes at the top all of that's moving onto the underground. So, congratulations, great insight. >> Thank you very much. >> Thank you. Thank you for coming on theCUBE. Appreciate it Nimrod. Okay, I'm John Furrier. We are theCUBE virtual here for the partner experience APN virtual. Thanks for watching. (gentle music)

Published Date : Dec 3 2020

SUMMARY :

Announcer: From around the globe, of the APN, Amazon Partner Great to see you as well. So you guys have had a For example, the right to be forgotten. What is the product? of all the data, so you know and the rest, all that stuff. and you use the same container cluster What is the definition of Like it gets all of the automation of the data and machine and need to use machine learning for that. and that's really the key and that you get there with smallID. Nimrod and the whole team. of the data that across the things I got to go back These are kind of the things that, and a lot of credibility. is the go-to-market kind of And that's the point where you need and you lead right into the BigID. And instead of looking at the data, and as the logic changes at the top for the partner experience APN virtual.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Nimrod Vax	PERSON	0.99+
Nimrod	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Tel Aviv	LOCATION	0.99+
2018	DATE	0.99+
Glassdoor	ORGANIZATION	0.99+
BigID	TITLE	0.99+
200 employees	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
BigID	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
SmallID	TITLE	0.99+
GDPR	TITLE	0.99+
four years ago	DATE	0.98+
billions of dollars	QUANTITY	0.98+
Redshift	TITLE	0.98+
CloudFormation	TITLE	0.97+
both	QUANTITY	0.97+
DynamoDB	TITLE	0.97+
single	QUANTITY	0.97+
CNN	ORGANIZATION	0.97+
this year	DATE	0.97+
EMR	TITLE	0.97+
one thing	QUANTITY	0.97+
One	QUANTITY	0.96+
one	QUANTITY	0.96+
each step	QUANTITY	0.95+
Amazon Partner Network	ORGANIZATION	0.95+
three weeks	QUANTITY	0.95+
APN	ORGANIZATION	0.95+
20 years	QUANTITY	0.95+
S3	TITLE	0.94+
Athena	TITLE	0.94+
office 365	TITLE	0.94+
today	DATE	0.93+
first name	QUANTITY	0.92+
smallIDs	TITLE	0.91+
Gartner Cool Vendor	TITLE	0.91+
Kinesis	TITLE	0.91+
20 data sources	QUANTITY	0.9+
RSA Innovation Sandbox	TITLE	0.88+
CCP	TITLE	0.88+
Invent 2020 Partner Network Day	EVENT	0.88+
smallID	TITLE	0.88+
more than 10,	QUANTITY	0.88+
Macy	ORGANIZATION	0.86+

Patrick Hetherton, Jobcase | CUBE Conversation, May 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world this is theCUBE conversation. >> Hi, I'm Stu Miniman, and coming to you from our Boston area studio. theCUBE is happy to participate in the CloudHealth CloudLIVE event, Corey Quinn and myself going head to head with The Great Cloud Debate but of course, one of the things we always love is that talk to the practitioners and so thank you to the CloudHealth team for bringing us the guest that I'm about to speak to you with, Patrick Hetherton. He is the vice president of tech ops at Jobcase, also a Boston area company. Patrick, thanks so much for joining us. >> Thanks for having me, I appreciate it. >> All right, so let's start, if you could just give our audience a little bit, Jobcase, what the company does and your role in the art? >> Sure, so you know, Jobcase we like to position ourselves as the company that is the people first social platform for, you know, empowering America's workers. So we've been working with the frontline workers for a number of years, helping them secure jobs. When you looked at companies like LinkedIn or other companies that cater towards more advanced degrees, we're doing more of the frontline workers, the blue collar workers. About 80 to 85% of our members don't have advanced degrees and we are, you know, currently at about 110 million members right now. We get 25 million unique visitors a month but, you know, we're basically trying to help those frontline workers navigate through these challenging times right now. >> Well, yeah, Patrick, I have to imagine that right now with the global pandemic going on, and jobs in a bit of flux, your team must be really busy, especially if you talk about frontline, you know, there's some very large manufacturing and service companies that are doing massive hiring, I know, I poked around the Jobcase site quite a bit and saw plenty in the in the Boston area. So if you could, you know, is that architecturally, are there anything you need to do differently is, you now, how are you thinking scale and adjust and to manage with kind of the spike in traffic that I expect you're seeing? >> Yeah, so it's been interesting, you know, we've seen a lot of different peaks and valleys throughout but right now, what we're doing is we're trying to help a lot of folks, there's certain folks who aren't comfortable going back to the workforce at this time or can't because of daycare situations. So we've done a lot of things about filtering for jobs that are remote only. We've done a lot of things about navigating the unemployment lines and things like that on how to make sure that you're focused on getting things and those who do want to go back to work we've been working with some partners to make sure that those opportunities are presented to our user base. >> Excellent, well, your session for the CloudHealth CloudLIVE event is about security. Before we get into the security piece, just your role as tech ops, can you give us a little bit of how that fits into the landscape at Jobcase? You know, how do you look at tech ops? You know, my understanding tech ops is very similar to SRE, big, buzz job lately for a site reliability engineer, so what's your responsibility? How does that fit through the rest of your work? >> Sure, so I joined Jobcase about four years ago and you know, it was, I was given the role of technical operations or tech ops, which basically meant everything that the developers weren't doing from the technology side. So it was more of IT, onboarding and security of the laptops and systems there, a little bit of facility work as far as making sure the office was set up properly and things like that, but also the DevOps or SRE team reported to me. When I first started four years ago, it was one IT person and one DevOps person and now we have six DevOps engineers and three IT people. >> Excellent, well, security, of course, you know, in general it has been a very important topic, something you're speaking on, you know, I've been hearing for years, the discussion of security can't be a bolt-on, it can't be an afterthought, it is everyone's responsibility. You know, the DevOps movement, of course, has put that fully front and center. So tell us a little bit about, you know, how cybersecurity fits into your role and a little taste of what you're going to be sharing at the CloudHealth CloudLIVE? >> Sure, so you know, it's gone through so many iterations, I mean, you've got DevSecOps, you've got the SREs, you've got risk ops. You know, we don't tend to get caught up in the buzzwords too much but more about roles and responsibilities. So, we started off as traditional Dev and Ops teams that basically dev wrote the code, we deployed the code. We found that we didn't scale very well at that and we wanted to make sure that we could get a little bit more velocity in the place so we rolled out the DevOps model and things like that and started giving more responsibility to the development team. That freed up a lot of my team's time to basically go out and start looking at more secure ways of letting our software go out. So that whole shift left mentality where we wanted to find things a little bit quicker. Make sure we were doing some baseline examples of secure practices and things like that. So that was really where we started focusing in on and what we've been doing for the past year and trying to roll this process out. >> Okay, I did a little poking around online, I understand you're also involved in the Kubernetes rollout in your company. In early days of container security was, you know, a hot button topic. Feels like we've made some good progress on that but maybe if you could connect the dots between what you're doing on the security side and general containers, we'd love to hear more about your Kubernetes deployment tool? >> Sure, so we do everything through templates, through CloudFormation so we kind of lock them down to a certain security groups and things like that but we're also having a rollout of making sure things are patched in a cohesive manner. So we have a rollout process for, you know, running the latest versions of code, updating everything. And now the, that's what my team really focuses on is making sure that we have a clear, concise process for the development team to focus in on and roll out so that they're comfortable with it, whether it goes through all the environments, our dev environment, our integration environment, our staging environment, all the way up to production, it's the same process. >> Yeah, and how do you look at that kind of a line between the developers and the infrastructure? So tech ops usually is building the place and not the ones that are actually building the new products and how does Kubernetes fit into that overall discussion? >> So we have a bunch of different teams that we work with, three primarily, and each one's at a different phase, and I think that's the thing that you have to realize, you have to do what works well for your company. So certain teams do more on the infrastructure side, where we kind of give them base guidelines as to what sizes and things like that for infrastructure they should be using and others, we have to do a little more hand holding and make sure that they understand, you know, okay, let's take a step back and understand what you're trying to accomplish, what kind of traffic patterns you want to roll out and get a little bit deeper understanding and work with them. So each team's a little different but we really blur the line a lot between Dev and Ops, I mean, it's the only way you really develop fast and secure. >> Excellent, what about automation? How does that fit into everything we've been talking about here? >> Yeah, so we've spent a ton of time on that. So again, with the CloudFormation templates, it's basically you could blow up an account and just rerun the scripts and recreate the account from scratch with a bunch of auto scaling groups so if nodes go down, they get replaced automatically. So there's all sorts of automation built in, I think we've cut down on our alerts tenfold over the past year just by all these automation scripts, and we get notifications that things have happened but there's usually no human interaction anymore, you know, for simple hardware failures. We're mostly getting more of a hardware problem right now, as far as some incompatibilities or difference that may have come with an upgrade. >> All right, so Patrick, how does your organization look at cloud? Are you all in a public cloud or using multiple clouds? You know, what's that environment? >> Yeah, so we won in AWS and we've stayed in AWS, we're not multicloud, but we do our DR plan in there and everything but 100% in the cloud. >> Okay, excellent, so you know, you're obviously using CloudHealth as part of your overall solution. How do they fit into that discussion? And give us a little bit about how long you've been using them and what you've been seeing? >> Sure, so we started with their cost program, CloudHealth, and we wanted to get a better understanding of all of our costs, especially when we're going into this more distributed model where developers had the ability to roll out infrastructure, we wanted to make sure not so much that they had budgets, but had an understanding of how much they were spending. So when you go from that centralized control as to releasing controlled individuals, with that control comes responsibility, and you know, we wanted to make sure we're making good business decisions and so we rolled out CloudHealth to all the users to be able to see what each program was costing. We did that about two years ago and we've really just finalized it the last couple of months, I've making sure that everything was tagged appropriately and engineers can see how much each application costs to run. Then last year, we decided to look at some security programs to kind of help us launch that. We're doing a lot of stuff by hand and using some of the AWS services but we wanted something to kind of roll out more to the executive team to be able to see how we're doing, as far as you know, benchmarks and things like that. So we looked at a couple different programs but we had such a really good experience with CloudHealth on the onboarding, we decided to use VMware Secure State and have been rolling that out and using that to my team primarily right now and started rolling out to all the dev teams. >> It's really interesting, Patrick, you know, you've been around long enough, I'm sure that there have been times where security or the billing or all those other things is something that somebody else took care of, if I'm kind of a typical business person, what you're laying out sounds like, there's communication, collaboration, you know the business and the technical side working together. You know, are we are we getting closer to that, you know, we're all pulling in the same direction and have clear visibility as to what the business needs and what the kind of the technical and financial pieces are? >> Yeah, absolutely, I mean, it's definitely been a joint effort, I work with the finance team on a regular basis to kind of give forecasts and things like that, especially during these challenging times, you have to know how much you're spending on these bills, I mean, the cloud is one of our biggest bills, obviously, for Jobcase. So we wanted to have a good understanding there but we also want to drive the business forward. We're working with partners right now, during these times to make sure we're getting, you know, even some free services as far as doing some trials and things like that, to ensure that we're being cost conscious for the company but also driving initiative forward. >> Yeah, Patrick, is there anything out there, you know, in the ecosystem that is on your wish list that would make your company and your job even easier? >> Great question. You know, I think better integration between all the programs, I mean, you've got a lot of best to breed programs out there so you worry about technology sprawl, you know, from application monitoring, to system monitoring, to cost monitoring and things like that, there is no silver bullet. So you know, if there was that would be great but you have to kind of pick the best to breed in all cases, we kind of go with the 80/20 rule. If a program does 80%, but it integrates with other programs and we're going to use that over one that's maybe 90, 95%, just for ease of use and computation. >> Great, well, Patrick, I want to give you the final word, any other final takeaways that you would share with your peers as to things they should be looking at or things they should prepare their teams for to be more effective and more secure? >> Yeah, I'd say don't be afraid of change but also work with your dev teams. If you make it too difficult for them or it becomes an us versus them, it's just never going to work. It has to be a partnership, they have to buy into the things that you're trying to do and in most cases, they will, they want to do the right things but you've got to kind of eliminate the noise from them and make sure that they're only getting the things that are important to the company. >> Well, Patrick, thank you much for sharing and absolutely, a very important service Jobcase is performing especially right now when, you know, jobs and as you said, flexible work environments are critically important. Thanks so much. >> Thank you. >> All right, be sure to check out the CloudHealth CloudLIVE event. I'm Stu Miniman, you'll see me and Corey Quinn in The Great Cloud Debate and thank you for watching theCUBE. (gentle music)

Published Date : May 20 2020

SUMMARY :

leaders all around the world and coming to you from and we are, you know, and to manage with kind interesting, you know, the rest of your work? and you know, it was, So tell us a little bit about, you know, and we wanted to make sure security was, you know, is making sure that we have and make sure that they and just rerun the scripts but 100% in the cloud. Okay, excellent, so you know, and so we rolled out closer to that, you know, sure we're getting, you know, the best to breed in all cases, and make sure that they're right now when, you know, and thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Corey Quinn	PERSON	0.99+
Patrick Hetherton	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Boston	LOCATION	0.99+
100%	QUANTITY	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
May 2020	DATE	0.99+
each program	QUANTITY	0.99+
CloudHealth	ORGANIZATION	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.99+
four years ago	DATE	0.99+
theCUBE	ORGANIZATION	0.98+
each team	QUANTITY	0.98+
first	QUANTITY	0.98+
each application	QUANTITY	0.97+
CloudHealth	TITLE	0.96+
Jobcase	ORGANIZATION	0.96+
about 110 million members	QUANTITY	0.96+
each one	QUANTITY	0.95+
America	LOCATION	0.94+
85%	QUANTITY	0.94+
last couple of months	DATE	0.93+
about two years ago	DATE	0.91+
SRE	ORGANIZATION	0.91+
Kubernetes	TITLE	0.9+
DevSecOps	TITLE	0.9+
CloudFormation	TITLE	0.9+
CloudHealth CloudLIVE	EVENT	0.88+
About 80	QUANTITY	0.88+
25 million unique visitors	QUANTITY	0.86+
six DevOps	QUANTITY	0.86+
90, 95%	QUANTITY	0.85+
three IT	QUANTITY	0.84+
past year	DATE	0.83+
one IT	QUANTITY	0.81+
about	DATE	0.79+
one DevOps	QUANTITY	0.73+
The Great Cloud Debate	EVENT	0.73+
years	QUANTITY	0.71+
VMware Secure State	TITLE	0.71+
global pandemic	EVENT	0.65+
DevOps	TITLE	0.64+
a month	QUANTITY	0.62+
80/20	OTHER	0.58+
couple	QUANTITY	0.54+
CUBE	EVENT	0.52+
Cloud Debate	EVENT	0.43+

UNLIST TILL 4/2 - Migrating Your Vertica Cluster to the Cloud

>> Jeff: Hello everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's break-out session has been titled, "Migrating Your Vertica Cluster to the Cloud." I'm Jeff Healey, and I'm in Vertica marketing. I'll be your host for this break-out session. Joining me here are Sumeet Keswani and Chris Daly, Vertica product technology engineers and key members of our customer success team. Before we begin, I encourage you to submit questions and comments during the virtual session. You don't have to wait, just type your question or comment in the question box below the slides and click Submit. As always, there will be a Q&A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer them offline. And alternatively, you can visit Vertica forums at forum.vertica.com to post your questions there after the session. Our engineering team is planning to join the forums to keep the conversation going. Also as a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available to view on demand this week. We'll send you a notification as soon as it's ready. Now let's get started. Over to you, Sumeet. >> Sumeet: Thank you, Jeff. Hello everyone, my name is Sumeet Keswani, and I will be talking about planning to deploy or migrate your Vertica cluster to the Cloud. So you may be moving an on-prem cluster or setting up a new cluster in the Cloud. And there are several design and operational considerations that will come into play. You know, some of these are cost, which industry you are in, or which expertise you have, in which Cloud platform. And there may be a personal preference too. After that, you know, there will be some operational considerations like VM and cluster sizing, what Vertica mode you want to deploy, Eon or Enterprise. It depends on your use keys. What are the DevOps skills available, you know, what elasticity, separation you need, you know, what is your backup and DR strategy, what do you want in terms of high availability. And you will have to think about, you know, how much data you have and where it's going to live. And in order to understand the cost, or the cost and the benefit of deployment and you will have to understand the access patterns, and how you are moving data from and to the Cloud. So things to consider before you move a deployment, a Vertica deployment to the Cloud, right, is one thing to keep in mind is, virtual CPUs, or CPUs in the Cloud, are not the same as the usual CPUs that you've been familiar with in your data center. A vCPU is half of a CPU because of hyperthreading. There is definitely the noisy neighbor effect. There is, depending on what other things are hosted in the Cloud environment, you may see performance, you may occasionally see performance issues. There are I/O limitations on the instance that you provision, so that what that really means is you can't always scale up. You might have to scale up, basically, you have to add more instances rather than getting bigger or the right size instances. Finally, there is an important distinction here. Virtualization is not free. There can be significant overhead to virtualization. It could be as much as 30%, so when you size and scale your clusters, you must keep that in mind. Now the other important aspect is, you know, where you put Vertica cluster is important. The choice of the region, how far it is from your various office locations. Where will the data live with respect to the cluster. And remember, popular locations can fill up. So if you want to scale out, additional capacity may or may not be available. So these are things you have to keep in mind when picking or choosing your Cloud platform and your deployment. So at this point, I want to make a plug for Eon mode. Eon mode is the latest mode, is a Cloud mode from Vertica. It has been designed with Cloud economics in mind. It uses shared storage, which is durable, available, and very cheap, like S3 storage or Google Cloud storage. It has been designed for quick scaling, like scale out, and highly elastic deployments. It has also been designed for high workload isolation, where each application or user group can be isolated from the other ones, so that they'll be paid and monitored separately, without affecting each other. But there are some disadvantages, or perhaps, you know, there's a cost for using Eon mode. Storage in S3 is neither cheap nor efficient. So there is a high latency of I/O when accessing data from S3. There is API and data access cost. There is API and data access cost associated with accessing your data in S3. Vertica in Eon mode has a pay as you go model, which you know, works for some people and does not work for others. And so therefore it is important to keep that in mind. And performance can be a little bit variable here, because it depends on cache, it depends on the local depot, which is a cache, and it is not as predictable as EE mode, so that's another trade-off. So let's spend about a minute and see how a Vertica cluster in Eon mode looks like. A Vertica cluster in Eon mode has S3 as the durability layer where all the data sits. There are subclusters, which are essentially just aggregation groups, which is separated compute, which will service different workloads. So for in this example, you may have two subclusters, one servicing ETL workload and the other one servicing (mic interference obscures speaking). These clusters are isolated, and they do not affect each other's performance. This allows you to scale them independently and isolate workloads. So this is the new Vertica Eon mode which has been specifically designed by us for use in the Cloud. But beyond this, you can use EE mode or Eon mode in the Cloud, it really depends on what your use case is. But both of these are possible, and we highly recommend Eon mode wherever possible. Okay, let's talk a little bit about what we mean by Vertica support in the Cloud. Now as you know, a Cloud is a shared data center, right. Performance in the Cloud can vary. It can vary between regions, availability zones, time of the day, choice of instance type, what concurrency you use, and of course the noisy neighbor effect. You know, we in Vertica, we performance, load, and stress test our product before every release. We have a bunch of use cases, we go through all of them, make sure that we haven't, you know, regressed any performance, and make sure that it works up to standards and gives you the high performance that you've come to expect. However, your solution or your workload is unique to you, and it is still your responsibility to make sure that it is tuned appropriately. To do this, one of the easiest things you can do is you know, pick a tested operating system, allocate the virtual machine, you know, with enough resources. It's something that we recommend, because we have tested it thoroughly. It goes a long way in giving you predictability. So after this I would like to now go into the various platforms, Cloud platforms, that Vertica has worked on. And I'll start with AWS, and my colleague Chris will speak about Azure and GCP. And our thoughts forward. So without further ado, let's start with the Amazon Web Services platform. So this is Vertica running on the Amazon Web Services platform. So as you probably are all aware, Amazon Web Services is the market leader in this space, and indeed really our biggest provider by far, and have been here for a very long time. And Vertica has a deep integration in the Amazon Web Services space. We provide a marketplace offering which has both pay as you go or a bring your own license model. We have many, you know, knowledge base articles, best practices, scripts, and resources that help you configure and use a Vertica database in the Cloud. We have several customers in the Cloud for many, many years now, and we have managed and console-based point and click deployments, you know, for ease of use in the Cloud. So Vertica has a deep integration in the Amazon space, and has been there for quite a bit now. So we communicate a lot of experience here. So let's talk about sizing on AWS. And sizing on any platform comes down to you know, these four or five different things. It comes down to picking the right instance type, picking the right disk volume and type, tuning and optimizing your networking, and finally, you know, some operational concerns like security, maintainability, and backup. So let's go into each one of these on the AWS ecosystem. So the choice of instance type is one of the important choices that you will make. In Eon mode, you know, you don't really need persistent disk. You can, you should probably choose ephemeral disk because it gives you extra speed, and speed with the instance type. We highly recommend the i3.4x instance types, which are very economical, have a big, 4 terabyte depot or cache per node. The i3.metal is similar to the i3.4, but has got significantly better performance, for those subclusters that need this extra oomph. The i3.2 is good for scale out of small ad hoc clusters. You know, they have a smaller cache and lower performance but it's cheap enough to use very indiscriminately. If you were in EE mode, well we don't use S3 as the layer of durability. Your local volumes is where we persist the data. Hence you do need an EBS volume in EE mode. In order to make sure that, you know, that the instance or the deployment is manageable, you might have to use some sort of a software RAID array over the EBS volumes. The most common instance type you see in EE mode is the r4.4x, the c4, or the m4 instance types. And then of course for temp space and depot we always recommend instance volumes. They're just much faster. Okay. So let's go, let's talk about optimizing your network or tuning your network. So the best, the best thing you can do about tuning your network, especially in Eon mode but in other modes too, is to get a VPC S3 endpoint. This is essentially a route table that makes sure that all traffic between your cluster and S3 goes over an internal fabric. This makes it much faster, you don't pay for egress cost, especially if you're doing external tables or your communal storage, but you do need to create it. Many times people will forget doing it. So you really do have to create it. And best of all, it's free. It doesn't cost you anything extra. You just have to create it during cluster creation time, and there's a significant performance difference for using it. The next thing about tuning your network is, you know, sizing it correctly. Pick the closest geographical region to where you'll consume the data. Pick the right availability zone. We highly recommend using cluster placement groups. In fact, they are required for the stability of the cluster. A cluster placement group is essentially, it operates this notion of rack. Nodes in a cluster placement group, are, you know, physically closer to each other than they would otherwise be. And this allows, you know, a 10 Gbps, bidirectional, TCP/IP flow between the nodes. And this makes sure that, you know, you get a high amount of Gbps per second. As you probably are all aware, the Cloud does not support broadcast or UDP broadcast. Hence you must use point-to-point UDP for spread in the Cloud, or in AWS. Beyond that, you know, point-to-point UDP does not scale very well beyond 20 nodes. So you know, as your cluster sizes increase, you must switch over to large cluster mode. And finally, use instances with enhanced networking or SR-IOV support. Again, it's free, it comes with the choice of the instance type and the operating system. We highly recommend it, it makes a big difference in terms of how your workload will perform. So let's talk a little bit about security, configuration, and orchestration. As I said, we provide CloudFormation scripts to make the ease of deployment. You can use the MC point and click. With regard to security, you know, Vertica does support instance profiles out of the box in Amazon. We recommend you use it. This is highly desirable so that you're not passing access keys and secret keys around. If you use our marketplace image, we have picked the latest operating systems, we have patched them, Amazon actually validates everything on marketplace and scans them for security vulnerabilities. So you get that for free. We do some basic configuration, like we disable root ssh access, we disallow any password access, we turn on encryption. And we run a basic set of security checks to make sure that the image is secure. Of course, it could be made more secure. But we try to balance out security, performance, and convenience. And finally, let's talk about backups. Especially in Eon mode I get the question, "Do we really need to back up our system, "since the data is in S3?" And the answer is yes, you do. Because you know, S3's not going to protect you against an accidental drop table. You know, S3 has a finite amount of reliability, durability, and availability. And you may want to be able to restore data differently. Also, backups are important if you're doing DR, or if you have additional cluster in a different region. The other cluster can be considered a backup. And finally, you know, why not create a backup or a disaster recovery cluster, you know, storage is cheap in the Cloud. So you know, we highly recommend you use it. So with this, I would like to hand it over to my colleague Christopher Daly, who will talk about the other two platforms that we support, that is Google and Azure. Over to you, Chris, thank you. >> Chris: Thanks, Sumeet, and hi everyone. So while there's no argument that we here at Vertica have a long history of running within the Amazon Web Services space, there are other alternative Cloud service providers where we do have a presence, such as Google Cloud Platform, or GCP. For those of you who are unfamiliar with GCP, it's considered the third-largest Cloud service provider in the marketspace, and it's priced very competitively to its peers. Has a lot of similarities to AWS in the products and services that it offers, but it tends to be the go-to place for newer businesses or startups. We officially started supporting GCP a little over a year ago with our first entry into their GCP marketplace. So a solution that deployed a fully-functional and ready-to-use Enterprise mode cluster. We followed up on that with the release and the support of Google storage buckets, and now I'm extremely pleased to announce that with the launch of Vertica 10, we're officially supporting Eon mode architecture in GCP as well. But that's not all, as we're adding additional offerings into the GCP marketplace. With the launch of version 10 we'll be introducing a second listing in the marketplace that allows for the deployment of an Eon mode cluster. It's all being driven by our own management consult. This will allow customers to quickly spin up Eon-based clusters within the GCP space. And if that wasn't enough, I'm also pleased to tell you that very soon after the launch we're going to be offering Vertica by the hour in GCP as well. And while we've done a lot to automate the solutions coming out of the marketplace, we recognize the simple fact that for a lot of you, building your cluster manually is really the only option. So with that in mind, let's talk about the things you need to understand in GCP to get that done. So wag me if you think this slide looks familiar. Well nope, it's not an erroneous duplicate slide from Sumeet's AWS section, it's merely an acknowledgement of all the things you need to consider for running Vertica in the Cloud. In Vertica, the choice of the operational mode will dictate some of the choices you'll need to make in the infrastructure, particularly around storage. Just like on-prem solutions, you'll need to understand the disk and networking capacities to get the most out of your cluster. And one of the most attractive things in GCP is the pricing, as it tends to run a little less than the others. But it does translate into less choices and options within the environment. If nothing else, I want you to take one thing away from this slide, and Sumeet said this earlier. VMs running, about AWS, Sumeet said this about AWS earlier. VMs running in the GCP space run on top of hardware that has hyperthreading enabled. And that a vCPU doesn't equate to a core, but rather a processing thread. This becomes particularly important if you're moving from an on-prem environment into the Cloud. Because a physical Vertica node with 32 cores is not the same thing as a VM with 32 vCPUs. In fact, with 32 vCPUs, you're only getting about 16 cores worth of performance. GCP does offer a handful of VM types, which they categorize by letter, but for us, most of these don't make great choices for Vertica nodes. The M series, however, does offer a good core to memory ratio, especially when you're looking at the high-mem variants. Also keep in mind, performance in I/O, such as network and disk, are partially dependent on the VM size, so customers in GCP space should be focusing on 16 vCPU VMs and above for their Vertica nodes. Disk options in GCP can be broken down into two basic types, persistent disks and local disks, which are ephemeral. Persistent disks come in two forms, standard or SSD. For Vertica in Eon mode, we recommend that customers use persistent SSD disks for the catalog, and either local SSD disks or persistent SSD disks for the depot and the temp space. Couple of things to think about here, though. Persistent disks are provisioned as a single device with a settable size. Local disks are provisioned as multiple disk devices with a fixed size, requiring you to use some kind of software RAIDing to create a single storage device. So while local SSD disks provide much more throughput, you're using CPU resources to maintain that RAID set. So you're giving, it's a little bit of a trade-off. Persistent disks offer redundancy, either within the zone that they exist or within the region, and if you're selecting regional redundancy, the disks are replicated across multiple zones in the region. This does have an effect in the performance to VM, so we don't recommend this. What we do recommend is the zonal redundancy when you're using persistent disks, as it gives you that redundancy level without actually affecting the performance. Remember also, in the Cloud space, all I/O is network I/O, as disks are basically block storage devices. This means that disk actions can and will slow down network traffic. And finally, the storage bucket access in GCP is based on GCP interoperability mode, which means that it's basically compliant with the AWS S3 API. In interoperability mode, access to the bucket is granted by a key pair that GCP refers to as HMAC keys. HMAC keys can be generated for individual users or for service accounts. We will recommend that when you're creating HMAC keys, choose a service account to ensure that the keys are not tied to a single employee. When thinking about storage for Enterprise mode, things change a little bit. We still recommend persistent SSD disks over standard ones. However, the use of local SSD disks for anything other than temp space is highly discouraged. I said it before, local SSD disks are ephemeral, meaning that the data's lost if the machine is turned off or goes down. So not really a place you want to store your data. In GCP, multiple persistent disks placed into a software RAID set does not create more throughput like you can find in other Clouds. The I/O saturation usually hits the VM limit long before it hits the disk limit. In fact, performance of a persistent disk is determined not just by the size of the disk but also by the size of the VM. So a good rule of thumb in GCP is to maximize your I/O throughput for persistent disks, is that the size tends to max out at two terabytes for SSDs and 10 terabytes for standard disks. Network performance in GCP can be thought of in two distinct ways. There's node-to-node traffic, and then there's egress traffic. Node-to-node performance in GCP is really good within the zone, with typical traffic between nodes falling in the 10-15 gigabits per second range. This might vary a little from zone to zone and region to region, but usually it's only limited, they're only limited by the existing traffic where the VMs exist. So kind of a noisy neighbor effect. Egress traffic from a VM, however, is subject to throughput caps, and these are based on the size of the VM. So the speed is set for the number of vCPUs in the VM at two gigabits per second per vCPU, and tops out at 32 gigabits per second. So the larger the VM, the more vCPUs you get, the larger the cap. So some things to consider in the NAV ring space for your Vertica cluster, pick a region that's physically close to you, even if you're connecting to the GCP network from a corporate LAN as opposed to the internet. The further the packets have to travel, the longer it's going to take. Also, GCP, like most Clouds, doesn't support UDP broadcast traffic on their virtual NAV ring, so you do have to use the point-to-point flag for spread when you're creating your cluster. And since the network cap on VMs is set at 32 gigabits per second per VM, maximize your network egress throughput and don't use VMs that are smaller than 16 vCPUs for your Vertica nodes. And that gets us to the one question I get asked the most often. How do I get my data into and out of the Cloud? Well, GCP offers many different methods to support different speeds and different price points for data ingress and egress. There's the obvious one, right, across the internet either directly to the VMs or into the storage bucket. Or you can, you know, light up a VPN tunnel to encrypt all that traffic. But additionally, GCP offers direct network interconnect from your corporate network. These get provided either by Google or by a partner, and they vary in speed. They also offer things called direct or carrier peering, which is connecting the edges of the networks between your network and GCP, and you can use a CDN interconnect, which creates, I believe, an on-demand connection from the GCP network, your network to the GCP network provided by a large host of CDN service providers. So GCP offers a lot of ways to move your data around in and out of the GCP Cloud. It's really a matter of what price point works for you, and what technology your corporation is looking to use. So we've talked about AWS, we've talked about GCP, it really only leaves one more Cloud. So last, and by far not the least, there's the Microsoft Azure environment. Holding on strong to the number two place in the major Cloud providers, Azure offers a very robust Cloud offering that's attractive to customers that already consume services from Microsoft. But what you need to keep in mind is that the underlying foundation of their Cloud is based on the Microsoft Windows products. And this makes their Cloud offering a little bit different in the services and offerings that they have. The good news here, though, is that Microsoft has done a very good job of getting their virtualization drivers baked into the modern kernels of most Linux operating systems, making running Linux-based VMs in Azure fairly seamless. So here's the slide again, but now you're going to notice some slight differences. First off, in Azure we only support Enterprise mode. This is because the Azure storage product is very different from Google Cloud storage and S3 on AWS. So while we're working on getting this supported, and we're starting to focus on this, we're just not there yet. This means that since we're only supporting Enterprise mode in Azure, getting the local disk performance right is one of the keys to success of running Vertica here, with the other major key being making sure that you're getting the appropriate networking speeds. Overall, Azure's a really good platform for Vertica, and its performance and pricing are very much on par with AWS. But keep in mind that the newer versions of the Linux operating systems like RHEL and CentOS run much better here than the older versions. Okay, so first things first again, just like GCP, in Azure VMs are running on top of hardware that has hyperthreading enabled. And because of the way Hyper-V, Azure's virtualization engine works, you can actually see this, right? So if you look down into the CPU information of the VM, you'll actually see how it groups the vCPUs by core and by thread. Azure offers a lot of VM types, and is adding new ones all the time. But for us, we see three VM types that make the most sense for Vertica. For customers that are looking to run production workloads in Azure, the Es_v3 and the Ls_v2 series are the two main recommendations. While they differ slightly in the CPU to memory ratio and the I/O throughput, the Es_v3 series is probably the best recommendation for a generalized Vertica node, with the Ls_v2 series being recommended for workloads with higher I/O requirements. If you're just looking to deploy a sandbox environment, the Ds_v3 series is a very suitable choice that really can reduce your overall Cloud spend. VM storage in Azure is provided by a grouping of four different types of disks, all offering different levels of performance. Introduced at the end of last year, the Ultra Disk option is the highest-performing disk type for VMs in Azure. It was designed for database workloads where high throughput and low latency is very desirable. However, the Ultra Disk option is not available in all regions yet, although that's been changing slowly since their launch. The Premium SSD option, which has been around for a while and is widely available, can also offer really nice performance, especially higher capacities. And just like other Cloud providers, the I/O throughput you get on VMs is dictated not only by the size of the disk, but also by the size of the VM and its type. So a good rule of thumb here, VM types with an S will have a much better throughput rate than ones that don't, meaning, and the larger VMs will have, you know, higher I/O throughput than the smaller ones. You can expand the VM disk throughput by using multiple disks in Azure and using a software RAID. This overcomes limitations of single disk performance, but keep in mind, you're now using CPU cycles to maintain that raid, so it is a bit of a trade-off. The other nice thing in Azure is that all their managed disks are encrypted by default on the server side, so there's really nothing you need to do here to enable that. And of course I mentioned this earlier. There is no native access to Azure storage yet, but it is something we're working on. We have seen folks using third-party applications like MinIO to access Azure's storage as an S3 bucket. So it might be something you want to keep in mind and maybe even test out for yourself. Networking in Azure comes in two different flavors, standard and accelerated. In standard networking, the entire network stack is abstracted and virtualized. So this works really well, however, there are performance limitations. Standard networking tends to top out around four gigabits per second. Accelerated networking in Azure is based on single root I/O virtualization of the Mellanox adapter. This is basically the VM talking directly to the physical network card in the host hardware, and it can produce network speeds up to 20 gigabits per second, so much, much faster. Keep in mind, though, that not all VM types and operating systems actually support accelerated networking, and you know, just like disk throughput, network throughput is based on VM type and size. So what do you need to think about for networking in the Azure space? Again, stay close to home. Pick regions that are geographically close to your location. Yes, the backbones between the regions are very, very fast, but the more hops your packets have to make, the longer it takes. Azure offers two types of groupings of their VMs, availability sets and availability zones. Availability zones offer good redundancy across multiple zones, but this actually increases the node-to-node latency, so we recommend you avoid this. Availability sets, on the other hand, keep all your VMs grouped together within a single zone, but makes sure that no two VMs are running on the same host hardware, for redundancy. And just like the other Clouds, UDP broadcast is not supported. So you have to use the point-to-point flag when you're creating your database to ensure that the spread works properly. Spread time out, okay, this is a good one. So recently, Microsoft has started monthly rolling updates of their environment. What this looks like is VMs running on top of hardware that's receiving an update can be paused. And this becomes problematic when the pausing of the VM exceeds eight seconds, as the unpaused members of the cluster now think the paused VM is down. So consider adjusting the spread time out for your clusters in Azure to 30 seconds, and this will help avoid a little of that. If you're deploying a large cluster in Azure, more than 20 nodes, use large closer mode, as point-to-point for spread doesn't really scale well with a lot of Vertica nodes. And finally, you know, pick VM types and operating systems that support accelerated networking. The difference in the node-to-node speeds can be very dramatic. So how do we move data around in Azure, right? So Microsoft views data egress a little differently than other Clouds, as it classifies any data being transmitted by a VM as egress. However, it only bills for data egress that actually leaves the Azure environment. Egress speed limits in Azure are based entirely on the VM type and size, and then they're limited by your connection to them. While not offering as many pathways to access their Cloud as GCP, Azure does offer a direct network-to-network connection called ExpressRoute. Offered by a large group of third-party processors, partners, the ExpressRoute offers multiple tiers of performance that are based on a flat charge for inbound data and a metered charge for outbound data. And of course you can still access these via the internet, and securely through a VPN gateway. So on behalf of Jeff, Sumeet, and myself, I'd like to thank you for listening to our presentation today, and we're now ready for Q&A.

Published Date : Mar 30 2020

SUMMARY :

Also as a reminder that you can maximize your screen So the best, the best thing you can do and the larger VMs will have, you know,

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Sumeet	PERSON	0.99+
Jeff Healey	PERSON	0.99+
Chris Daly	PERSON	0.99+
Jeff	PERSON	0.99+
Christopher Daly	PERSON	0.99+
Sumeet Keswani	PERSON	0.99+
Google	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
10 Gbps	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
forum.vertica.com	OTHER	0.99+
30 seconds	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
RHEL	TITLE	0.99+
Today	DATE	0.99+
32 cores	QUANTITY	0.99+
CentOS	TITLE	0.99+
more than 20 nodes	QUANTITY	0.99+
32 vCPUs	QUANTITY	0.99+
two platforms	QUANTITY	0.99+
eight seconds	QUANTITY	0.99+
Vertica	TITLE	0.99+
10 terabytes	QUANTITY	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
20 nodes	QUANTITY	0.99+
two terabytes	QUANTITY	0.99+
each application	QUANTITY	0.99+
S3	TITLE	0.99+
two types	QUANTITY	0.99+
Linux	TITLE	0.99+
two subclusters	QUANTITY	0.98+
first entry	QUANTITY	0.98+
one question	QUANTITY	0.98+
four	QUANTITY	0.98+
Azure	TITLE	0.98+
Vertica 10	TITLE	0.98+
4/2	DATE	0.98+
First	QUANTITY	0.98+
16 vCPU	QUANTITY	0.98+
two forms	QUANTITY	0.97+
MinIO	TITLE	0.97+
single employee	QUANTITY	0.97+
first	QUANTITY	0.97+
this week	DATE	0.96+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for CloudFormation: