Image Title

Search Results for salsa:

Emmy Eide, RedHat | CloudNativeSecurityCon 23


 

>> John Furrier: Hello, welcome back to theCUBE's coverage of Cloud Native Security Con 2023 North America the inaugural event. I'm John Furrier, host of theCUBE, along with Dave Alonte and Lisa Martin covering from the studio. But we have on location Emmy Eide, who is with Red Hat, director of Supply Chain Security. Emmy, great to have you on from location. Thanks for joining us. >> Emmy Eide: Yeah, thank you. >> So everyone wants to know this event is new, it's an aural event, cloud native con, coup con. Very successful. Was this event successful? They all want to know what's going on there. What's the vibe? What's the tracks like? Is it different? Why this event? Was it successful? What's different? >> Yeah, I've really enjoyed being here. The food is wonderful. There's also quite a few vendors here that are just some really cool emerging technologies coming out and a lot from open source, which is really cool to see as well. The talks are very interesting. It's really, they're very diverse in subject but still all security related which is really cool to see. And there's also a lot of different perspectives of how to approach security problems and the people behind them, which I love to see. And it's very nice to hear the different innovative ideas that we can go about doing security. >> We heard from some startups as well that they're very happy with the, with the decision to have a dedicated event. Red Hat is no stranger to open source. Obviously coup con, you guys are very successful there in cloud native con, Now the security con. Why do you think they did this? What's the vibe? What's the rationale? What's your take on this? And what's different from a topic standpoint? >> For non-security specific like events? Is that what you mean? >> What's different from coup con, cloud native con, and here at the cloud native security con? Obviously security's the focus. Is it just deeper dives? Is it more under the hood? Is it root problems or is this beyond Kubernetes? What's the focus, I guess. People want to know, you know, why the new event? >> I mean, there's a lot of focus on supply chain security, right? Like that's the hot topic in security right now. So that's been a huge focus. I can't speak to the differences of those other conferences. I haven't been able to attend them. But I will say that having a security specific conference, it really focuses on the open community and how technology is evolving, and how do you apply security. It's not just talking about tools which I think other conferences tend to focus on just the tools and you can really, I think, get lost in that as someone trying to learn about security or trying to even implement security, but they talk about what it takes to implement those tools, What's behind the people behind implementing those tools? >> Let's get into some of the key topics that we've identified and get your reaction. One, supply chain security, which I know you'll give a lot of commentary on 'cause that's your focus. Also we heard, like, Liz Rice talking about the extended Berkeley packet filtering. Okay, that's big. You know, your root kernel management, that's big. Developer productivity was kind of implied around removing the blockers of security, making it, you know, more aligned with developer first mentality. So that seems to be our takeaway. What's your reaction to those things? You see the same thing? >> I don't have a specific reaction to those things. >> Do you see the same thing happening on the ground there? Are they covering supply? >> Oh, yeah. >> Those three things are they the big focus? >> Yeah. Yeah, I think it's all of those things kind of like wrapped into one, right? But yeah, there's... I'm not sure how to answer your question. >> Well, let's jump into supply chain for instance. 'Cause that has come up a lot. >> Sure. >> What's the focus there on the supply chain security? Is it SBOMs? Is it the container security? What's the key conversations and topics being discussed around supply chain security? >> Well, I think there's a lot of laughter around SBOM right now because no one can really define it, specifically, and everyone's talking about it. So there's, there's a lot more than just the SBOM conversation. We're talking about like full end-to-end development process and that whole software supply chain that goes with it. So there's everything from infrastructure, security, all the way through to like signing transparency logs. Really the full gambit of supply chain, which is is really neat to see because it is such a broad topic. I think a lot of folks now are involved in supply chain security in some way. And so just kind of bringing that to the surface of what are the different people that are involved in this space, thinking about, what's on the top of their mind when it comes to supply chain security. >> How would you scope the order of magnitude of the uptick in supply chain attacks? Is it pretty heavy right now or is it, you know, people with the hair on fire or is it... What's the, give us the taste of the temperature in the room on the supply chain attacks? >> I think most of the folks who are involved in the space understand just that it's increasing. I mean, like, what is it? A 742% increase average annual year, year over year in supply chain attacks. So the amount of attacks increasing is a little daunting, right, for most of us. But it is what it is. So I think most of us right now are just trying to come together to say, "What are you doing that works? This is what I'm doing that works." And in all the different facets of that. 'cause I think we try to throw, we try to throw tools at a lot of problems and this problem is so big and broad reaching that we really are needing to share best practices as a community and as a security community. So this has been, this conference has been really great for that. >> Yeah, I've heard that a lot. You know, too many tools, not enough platform thinking, not enough architecture, needs some structure. Are you seeing any best practice around frameworks and structure around how to start getting in and and building out more of a better approach or posture? I mean, what's that, what's the, what's the state of the union for supply chain, how to handle that? >> Well, I talked about that a little bit in my my keynote that I gave, actually, which was about... And I've heard other other leaders talk about it too. And obviously it keyed my ear just because I'm so passionate about it, about partnership. So you know, empathetic security where the security team that's enforcing the policies, creating the policies, guidelines is working with the teams that are actually doing the production and the development, hand-in-hand, right? Like I can sit there and tell you, "Hey, you have all these problems and here's your security checklist or framework you need to follow." But that's not going to do them any good and it's going to create a ton of holes, right? So actually partnering with them helping them to understand the risks that are associated with their very specific need and use case, because every product has a different kind of quirk to it, right? Like how it's being developed. It might use a different tool and if I sit there and say, "Hey, you need to log on to this, you need to like make your tool work this platform over here and it's not compatible." I'm going to have to completely reframe how I'm doing productization. I need to know that as a security practitioner because me disrupting productization is not something that I should be doing. And I've heard a couple a couple of folks kind of talking about that, the people aspect behind how we implement these tools, the frameworks and the platforms, and how do we draw out risk, right? Like how do we talk about risk with these teams and really make them understand so it's part of their core culture in their understanding. So when they go back to their, when they go back and having to make decisions without me in the room they know they can make those business decisions with the risk as part of that decision. >> I love that empathetic angle because that's really going to, what needs to happen. It's not just, "Hey, that's your department, see you later." Or not even having a knowledge of the information. This idea of team construction, team management is a huge cultural shift. I'm sure the reaction was very positive. How do you explain that to an organization that's out there? Like how do you... what's the first three steps you got to take? Is there anything that you can share for advice people watch you saying, "Yeah we need to we need to change how our teams operate and interact with each other." >> Yeah, I think the first step is to take a good hard look at yourself. And if you are standing there on an ivory tower with a clipboard, you're probably doing it wrong. Check the box security is never going to be any way that works long term. It's going to take you a long time to implement any changes. At Red Hat, we did not look ourselves. You know, we've been doing a lot of great things in supply chain security for a while, but really taking that look and saying, "How can we be more empathetic leaders in the security space?" So we looked at that, then you say, "Okay, what is my my rate of change going to happen?" So if I need to make so many security changes explaining to these organizations, you're actually going to go faster. We improved our efficiency by 2000% just by doing that, just by creating this more empathetic. So why it seems like it's more hands-on, so it's going to be harder, it's easy to send out an email and say, "Hey, meet the security standard, right?" That might seem like the easy way 'cause you don't have time to engage. It's so much faster if you actually engage and share that message and have a a common understanding between the teams that like, "I'm here to deliver a product, so is the security team. The security team's here to deliver that same product and I want to help you do it in a trusted way." Right? >> Yeah. Dave Alonte, my co-host, was just on a session. We were talking together about security teams jumping on every team and putting a C on their jersey to be like the captain of the intramural team, and being involved, and it goes beyond just like the checklist, like you said, "Oh, I got the SBOM list of materials and I got a code scanning thing." That's not enough, is what we're hearing. >> No. >> Is there a framework or a methodology to go beyond that? You got the empathetic, that's really kind of team issue. You got to go beyond some of the tactical things. What's next beyond, you got the empathy and what's that framework structure when you say where you say anything there? >> So what do you do after you have the empathy, right? >> Yeah. >> I would say Salsa is a good place to start, the software levels. Supply chain levels for software artifacts. It's a mouthful. That's a really good maturity framework to start with. No matter what size organization you have, they're just going to be coming out here soon with version one. They release 0.1 a few months back. That's a really good place to give yourself a gut check of where you are in maturity and where you can go, what are best practices. And then there's the SSDF, which is the Secure Software Development framework. I think NIST wrote that one. But that is also a really, a really good framework and they map really well to each other, actually, When you work through Salsa, you're actually working through the SSDF requirements. >> Awesome. Well, great to have you on and great to get that that knowledge. I have to ask you like coup con, I remember when it started in Seattle, their first coup con events, right? Kind of small, similar to this one, but there's a lot of end user activities. Certainly the CNCF kind of was coming together like right after that. What's the end user activity like there this week? That seems to always been the driver of these events. It's a little bit organic. You got some of the key experts coming together, focus. Have you observed any end user activity in terms of contributions, participation? What's the story on the end user piece there? Is it heavy? Is it light? What's the... >> Um, yeah... It seems moderate. I guess somewhere in the middle. I would say largely heavy, but there's definitely participation. There is a lot of communing and networking happening between different organizations to partner together, which is important. But I haven't really paid attention much to like the Twitter side of this. >> Yeah, you've been busy doing the keynotes. How's Red Hat doing all this? You guys have been great positioned with the cloud native movement. Been following the Red Hat's moves since OpenStack days. Really good, good line of product, good open source, Mojo, of course. Good product mix, right, and relevant. Where's the security focus here? Obviously, you guys are clearly focused on security. How's the Red Hat story going on over there? >> There was yesterday a really good talk that explains that super well. It was given by a Red Hatter, connecting all of the open source projects we've been a part of and kind of explaining them. And obviously again, I'm keying in 'cause it's a supply chain kind of conversation, but I'd recommend that anyone who's going to go back and watch these on YouTube to check that one out just to see kind of how we're approaching the security space as well as how we contribute back to the community in that way. >> Awesome. Great to have you on. Final word, I'll give you the final word. What's the big buzz on supply chain? How would you peg the progress there? Feeling good about where things are? What's the current progress on supply chain security? >> I think that it has opened up a lot of doors for communication between security organizations that have tended to be closed. I'm in product security. Product securities, information securities tend to not speak externally about what we're doing. So you don't want to, you know, look bad or you don't want to expose any risk that we have, right? But it is, I think, necessary to open those lines of communication, to be able to start tackling this. It's a big problem throughout all of our industries, and if one supply chain is attacked and those products are used in someone else's supply chain, that can continue, right? So I think it's good. We have a lot of work to do as an industry and the advancements in technology is going to make that a little bit more complicated. But I'm excited for it. >> You can just throw AI at it. That's the big, everyone's doing AI. Just throw AI at it, it'll solve it. Isn't that the new thing? >> I do secure AI though. >> Super important. I love what you're doing there. Supply chain, open source needs, supply chain security. Open source needs this big time. It has to be there. Thank you for the work that you do. Really appreciate you coming on. Thank you. >> Yeah, thanks for having me. >> Yeah, good stuff. Supply chain, critical to open source growth. Open source is going to be the key to success in the future with automation and AI right around the corner. And that's important. This theCUBE covers from cloud native con, security con in North America, 2023. I'm John Furrier. Thanks for watching.

Published Date : Feb 3 2023

SUMMARY :

Emmy, great to have you on from location. What's the vibe? and the people behind them, What's the vibe? and here at the cloud native security con? it really focuses on the open community So that seems to be our takeaway. reaction to those things. I'm not sure how to answer your question. 'Cause that has come up a lot. bringing that to the surface of the uptick in supply chain attacks? And in all the different facets of that. how to handle that? and the development, hand-in-hand, right? knowledge of the information. It's going to take you a long just like the checklist, like you said, of the tactical things. a gut check of where you I have to ask you like coup con, I guess somewhere in the middle. Where's the security focus here? connecting all of the open source projects Great to have you on. and the advancements in Isn't that the new thing? It has to be there. Open source is going to be the

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave AlontePERSON

0.99+

Lisa MartinPERSON

0.99+

Liz RicePERSON

0.99+

John FurrierPERSON

0.99+

Emmy EidePERSON

0.99+

EmmyPERSON

0.99+

Red HatORGANIZATION

0.99+

SeattleLOCATION

0.99+

first stepQUANTITY

0.99+

North AmericaLOCATION

0.99+

yesterdayDATE

0.99+

742%QUANTITY

0.99+

NISTORGANIZATION

0.99+

2023DATE

0.99+

2000%QUANTITY

0.98+

this weekDATE

0.98+

Supply Chain SecurityORGANIZATION

0.97+

three thingsQUANTITY

0.97+

first three stepsQUANTITY

0.97+

theCUBEORGANIZATION

0.96+

TwitterORGANIZATION

0.96+

Cloud Native Security Con 2023 North AmericaEVENT

0.95+

SBOMORGANIZATION

0.94+

BerkeleyLOCATION

0.92+

YouTubeORGANIZATION

0.92+

SalsaTITLE

0.92+

Red HatterTITLE

0.9+

first mentalityQUANTITY

0.89+

a few months backDATE

0.79+

RedHatORGANIZATION

0.79+

first coup conQUANTITY

0.78+

OneQUANTITY

0.78+

versionQUANTITY

0.74+

CNCFORGANIZATION

0.7+

securityEVENT

0.7+

conORGANIZATION

0.67+

OpenStackTITLE

0.66+

one supplyQUANTITY

0.66+

Red HatTITLE

0.64+

nativeEVENT

0.63+

coupleQUANTITY

0.63+

CloudNativeSecurityCon 23EVENT

0.61+

cloud nativeEVENT

0.6+

MojoORGANIZATION

0.6+

oneQUANTITY

0.6+

KubernetesTITLE

0.57+

oneOTHER

0.5+

DockerCon 2022 | Aparna Sinha


 

>>Welcome to the cubes dock, our main stage coverage here at DockerCon 2022. I'm John furrier, host of the cube. We're here with cube alumni, a partner scene, the senior director of product and the developer platform at Google cloud, a partner. Great to see you. It's been a while how's things >>Great to see you, John. Thanks for having me. >>So obviously we've covered a lot about the Google's history and open source. If you go back, I mean go back generation 2000, it all started, it continues to continue to thrive the SDO, all the different projects you guys are around the future of containers and serverless all there. Give us the update. Why are customers choosing Google cloud? We're here at Docker con what's the big update from Google cloud's perspective from a, from a developer perspective? >>Well, John, uh, Google cloud has been, uh, the early cloud on containers, um, and by all measures from, we can, from what we can see, you know, it is the preferred cloud for container native workloads. Um, I think why our customers choosing cloud there's a, there's a few different reasons. Um, definitely one of the reasons is because it is a flexible and open platform. And I think that that is, uh, distinctive about Google cloud, as you mentioned, uh, many, many open source projects coming from Google and Google cloud in particular over the last 20 years, um, spanning, um, languages, um, you know, obviously, uh, the go programming language all the way to of course, Kubernetes. Um, and then, uh, more recently Isto and, uh, K native and many more, uh Tecton is one of the leading projects as well. Um, in the C I C D space. >>So I think that, uh, history is something that really attracts the developer population. It's also very, very important for enterprises that are, uh, modernizing and looking to accelerate their, uh, developer productivity. So that's been one major reason. I think the second major reason is really the security aspect, um, of the developer tool chain and in particular related to open source secure well, and I think the third, uh, reason that comes out, um, quite frequently when we, when we talk to our enterprise customers is Google cloud is unique in the multi-cloud space. Um, you know, one of the first, I think probably the first and, uh, only cloud provider to have a very strong multi-cloud strategy, uh, and that stems from the open source roots, but also, you know, uh, bringing more than just, uh, compute, bringing many of our data services also, uh, to the multi-cloud space. I think that's, those are the three reasons why, uh, developers often choose Google cloud. >>Yeah. And you see the multi-cloud also in a distributed computing environment. It's, I mean, multi-cloud is basically distributed computing where you've got hyperscalers and then edges emerging very quickly. Of course, we've talked about that in the past, on previous interviews, how security at the edge software opensource all coming together. Again, Kubernetes launched by Google contributed to the open source world that everyone knows that, or may not know that. Um, but, but that's key. Where do you see the container position come in? Because at the end of the day, containers is standard and now you've got Kubernetes and other parts wrapped around it. Where's container technology going in the coming, coming in the future years. Is it gonna be invisible? Is it gonna be programmable? What's your vision on that? >>This is an excellent question. And you're exactly right. You're seeing containers become mainstream. And some of the latest, uh, state of the, the state of the cloud business report, you're seeing, you know, 80% of enterprises, um, having some form of a container program and I've been involved in this industry since the very early days. So this is something we've been predicting, um, and it is happening even faster than expected. So that's becoming very mainstream, which is extremely exciting for us. Now you ask, you know, what is the future and what is the evolution of it? Um, so, and, and I think, uh, this is the right question because, um, you're seeing a lot of the future actually on Google cloud. Um, we're, we've won the, uh, Gartner and Forester quadrants as far as leader quadrants in, uh, you know, container offerings. And that's not just Kubernetes, of course, uh, Google Kubernetes engine has been, has been the leading area, but there's a whole host of offerings around that. >>Um, in particular I'd like to point out serverless containers with cloud run, as well as the entire DevOps pipeline around containers. And that's a big topic in the industry right now. It brings in, uh, security as related to, uh, developers. And then of course, uh, you know, providing an automated, secure pipeline for DevOps, um, as it relates to containers, we've had several announcements and, and, and a lot of success in this space. Uh, I, I can go through some of these things with cloud run, which is our serverless container offering. We've seen, uh, four X growth in adoption and, uh, consumption of that service last year in 2021. And that is continuing, uh, so it's very, very healthy and it is very much the reason customers are adopting. It is because they don't need to learn a lot of the underlying infrastructure. They don't need to manage any of the underlying infrastructure. >>There isn't necessarily a cluster to manage all of that is taken care of, uh, for them. And they can focus on their application. They can actually use, uh, make use of the benefits of containers, such as, uh, you know, scalability, um, such as, um, application awareness, uh, and such as a lot of the integrated tool chain for, uh, delivery for application delivery, right from your source repository into production, and then being able to bring out new versions of your application, test them, and then roll over. So this is kind of the new, uh, uh, generation I think is very much tied to the pandemic and what's happening in the world post pandemic, where developers are extremely important, developer productivity and, and fact developer work, life balance is extremely >>Important. Yeah. And I, and I think also one of the things that we're seeing to piggyback on that last comment, as well as your other points is developers have always been pulled to the front lines even 10 years ago. You saw the trend towards getting more closer to the customer now with cloud and edge and with open source being the innovation equation where entrepreneurs are starting projects, companies are starting projects, then they gotta get commercialized. So supply chain is a big discussion. We're hearing at Docker con we're hearing about shifting left of security data as code. You start to see the developer on the front lines in all aspects of this, and they want, they want security, they want efficiency, they want things in the pipeline. They don't wanna have to shift left, then come back again. So again, they starting to see this kind of productivity drive the business behavior of the companies cuz that's their, the value partners. That's the application side of cloud native. What's your thoughts for the developers who are doing that? What's in it for them with Google cloud? Why, why are you important to them? >>Yeah, and I think, uh, John, this is where, uh, developers, uh, tend to prefer Google cloud. And there's a couple of reasons for that. One is, you know, we are very much, uh, centered around developers. Um, you know, my job is, uh, you know, Google cloud developer platform. And, uh, our goal is to provide ease of use the easiest cloud for developers. Something that is, um, you know, really allows them to get their work done quickly. Developers want to be exposed to the best technology. They want to be able to be exposed to it in a way that that integrates into their workflow that integrates into the tools that they're used to, um, and allows them to get their job done quickly. And so a lot of what we're doing in, in the developer space is providing an integrated stack. Um, you know, whether you're building a web application or you're building a mobile application, or you're trying to do data analytics, uh, Google cloud should be a place that you come to. >>That's easy for you to use, to get the job done. Um, and, and, and the security aspect is not something that developers like to deal with. They want that to be taken care of for them, um, troubleshooting as well, you know, troubleshooting and, and upgrading. And all of that is something that they wanna be taken care of. And so that is something that we're baking into the platform. And you'll see that in a lot of our tooling, um, you know, the build process, uh, we're providing salsa compliance, um, and, and build Providence for the security teams to be able to audit. But it's not something that the, that the developer needs to take care of. It's something that is just part of the, the build process built into, uh, say, uh, cloud run or GK built into our compute options for making >>It for them, making it easy, simple, and reduce the steps it takes to get the job done. So great stuff par, great to see you in the last 30 seconds, we have left. Just give a quick commercial for what the key projects are in open source. You're proud of that people should pay attention to, we got CubeCon coming up, uh, in, uh, Europe and north America. What are some of the successes that you like to point out? >>Well, I really encourage, uh, developers to go and take a look, a new look at, go go 1.8, add support for generics. It should open up a brand new set of applications. So I definitely encourage folks to, to take a look at that, um, of, of course ISEO and service mesh. As, as your container footprint grows, you have many microservices looking at service mesh, uh, extremely important, and it also allows you to get to that SRE type of, um, uh, DevOps model where, you know, you're securing your services. You're also, uh, being able to monitor and control, uh, service usage. And then the last one is of course Tecton and this is where secure software supply chain comes up. Part I'll >>Mention that. I wish I had 20 minutes. Love chatting with you. We'll catch up with you later on the cube we're here at DockerCon. Thanks for your time. Back to the DockerCon main stages of the cube. I'm John farrier, back to the main stage for more coverage.

Published Date : May 11 2022

SUMMARY :

Welcome to the cubes dock, our main stage coverage here at DockerCon 2022. it all started, it continues to continue to thrive the SDO, all the different projects you guys are around um, and by all measures from, we can, from what we can see, you know, it is the preferred cloud for container uh, and that stems from the open source roots, but also, you know, uh, bringing more than Where do you see the container as far as leader quadrants in, uh, you know, container offerings. Um, in particular I'd like to point out serverless containers with cloud run, uh, make use of the benefits of containers, such as, uh, you know, scalability, um, closer to the customer now with cloud and edge and with open source being the innovation equation uh, you know, Google cloud developer platform. the build process, uh, we're providing salsa compliance, um, So great stuff par, great to see you in the last 30 seconds, we have left. um, uh, DevOps model where, you know, you're securing your services. We'll catch up with you later on the cube we're here at DockerCon.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

20 minutesQUANTITY

0.99+

John farrierPERSON

0.99+

80%QUANTITY

0.99+

Aparna SinhaPERSON

0.99+

EuropeLOCATION

0.99+

GoogleORGANIZATION

0.99+

firstQUANTITY

0.99+

thirdQUANTITY

0.99+

three reasonsQUANTITY

0.99+

TectonORGANIZATION

0.99+

GartnerORGANIZATION

0.99+

last yearDATE

0.99+

north AmericaLOCATION

0.99+

oneQUANTITY

0.98+

2021DATE

0.98+

DockerConEVENT

0.98+

ForesterORGANIZATION

0.97+

OneQUANTITY

0.97+

10 years agoDATE

0.96+

pandemicEVENT

0.96+

Docker conORGANIZATION

0.92+

salsaTITLE

0.91+

one major reasonQUANTITY

0.9+

Google cloudTITLE

0.86+

KubernetesTITLE

0.83+

DockerCon 2022EVENT

0.81+

second major reasonQUANTITY

0.8+

Google cloudTITLE

0.78+

cloudTITLE

0.78+

CubeConORGANIZATION

0.77+

last 20 yearsDATE

0.75+

ProvidenceORGANIZATION

0.7+

2000DATE

0.68+

Google KubernetesTITLE

0.61+

last 30 secondsDATE

0.59+

DevOpsTITLE

0.59+

KPERSON

0.46+

IstoPERSON

0.43+

1.8TITLE

0.35+

UNLIST TILL 4/1 - Putting Complex Data Types to Work


 

hello everybody thank you for joining us today from the virtual verdict of BBC 2020 today's breakout session is entitled putting complex data types to work I'm Jeff Healey I lead vertical marketing I'll be a host for this breakout session joining me is Deepak Magette II technical lead from verdict engineering but before we begin I encourage you to submit questions and comments during the virtual session you don't have to wait just type your question or comment and the question box below the slides and click Submit it won't be a Q&A session at the end of the presentation we'll answer as many questions were able to during that time any questions we don't address we'll do our best to answer them offline alternatively visit Vertica forms that formed up Vertica calm to post your questions there after the session engineering team is planning to join the forms conversation going and also as a reminder that you can maximize your screen by clicking a double arrow button in the lower right corner of the slides yes this virtual session is being recorded and will be available to view on demand this week we'll send you a notification as submits ready now let's get started over to you Deepak thanks yes make sure you talk about the complex a textbook they've been doing it wedeck R&D without further delay let's see why and how we should put completely aside to work in your data analytics so this is going to be the outline or overview of my talk today first I'm going to talk about what are complex data types in some use cases I will then quickly cover some file formats that support these complex website I will then deep dive into the current support for complex data types in America finally I'll conclude with some usage considerations and what is coming in are 1000 release and our future roadmap and directions for this project so what are complex stereotypes complex data types are nested data structures composed of tentative types community types are nothing but your int float and string war binary etc the basic types some examples of complex data types include struct also called row are a list set map and Union composite types can also be built by composing other complicated types computer types are very useful for handling sparse data we also make samples on this presentation on that use case and also they help simplify analysis so let's look at some examples of complex data types so the first example on the left you can see a simple customer which is of type struc with two fields namely make a field name of type string and field ID of type integer structs are nothing but a group of fields and each field is a type of its own the type can be primitive or another complex type and on the right we have some example data for this simple customer complex type so it's basically two fields of type string and integer so in this case you have two rows where the first row is Alex with name named Alex and ID 1 0 and the second row has name Mary with ID 2 0 0 2 the second complex type on the left is phone numbers of type array of data has the element type string so area is nothing but a collection of elements the elements could be again a primitive type or another complex type so in this example the collection is of type string which is a primitive type and on the right you have some example of this collection of a fairy type called phone numbers and basically each row has a set or the list or a collection of phone numbers on the first we have two phone numbers and second you have a single phone number in that array and the third type on the slide is the map data type map is nothing but a collection of key value pairs so each element is actually a key value and you have a collection of such elements the key is usually a primitive type however the value is can be a primitive or complex type so in this example the both the key and value are of type string and then if you look on the right side of the slide you have some sample data here we have HTTP requests where the key is the header type and the value is the header value so the for instance on the first row we have a key type pragma with value no cash key type host with value some hostname and similarly on the second row you have some key value called accept with some text HTML because yeah they actually have a collection of elements allison maps are commonly called as collections as a to talking to in mini documents so we saw examples of a one-level complex steps on this slide we have nested complex there types on the right we have the root complex site called web events of type struct script has a for field a session ID of type integer session duration of type timestamp and then the third and the fourth fields customer and history requests are further complex types themselves so customer is again a complex type of type struct with three fields where the first two fields name ID are primitive types however the third field is another complex type phone numbers which we just saw in the previous slide similarly history request is also the same map type that we just saw so in this example each complex types is independent and you can reuse a complex type inside other complex types for example you can build another type called orders and simply reuse the customer type however in a practical implementation you have to deal with complexities involving security ownership and like sets lifecycle dependencies so keeping complex types as independent has that advantage of reusing them however the complication with that is you have to deal with security and ownership and lifecycle dependencies so this is on this slide we have another style of declaring a nested complex type do is call inlined complex data type so we have the same web driven struct type however if you look at the complex sites that embedded into the parent type definition so customer and HTTP request definition is embedded in lined into this parent structure so the advantage of this is you won't have to deal with the security and other lifecycle dependency issues but with the downside being you can't reuse them so it's sort of a trade-off between the these two so so let's see now some use cases of these complex types so the first use case or the benefit of using complex stereotypes is that you'll be able to express analysis mode naturally compute I've simplified the expression of analysis logic thereby simplifying the data pipelines in sequel it feels as if you have tables inside table so let's look at an example on and say you want to list all the customers with more than one thousand website events so if you have complex types you can simply create a table called web events and with one column of type web even which is a complex step so we just saw that difference it has four fields station customer and HTTP request so you can basically have the entire schema or in one type if you don't have complex types you'll have to create four tables one essentially for each complex type and then you have to establish primary key foreign key dependencies across these tables now if you want to achieve your goal of of listing all the customers in more than thousand web requests if you have complex types you can simply use the dot notation to extract the name the contact and also use some special functions for maps that will give you the count of all the HTTP requests grid in thousand however if you don't have complex types you'll have to now join each table individually extract the results from sub query and again joined on the outer query and finally you can apply a predicate of total requests which are greater than thousand to basically get your final result so it's a complex steps basically simplify the query writing part also the execution itself is also simplified so you don't have to have joins if you have complex you can simply have a load step to load the map type and then you can apply the function on top of it directly however if you have separate tables you have to join all these data and apply the filter step and then finally another joint to get your results alright so the other advantage of complex types is that you can cross this semi structured data very efficiently for example if you have data from clique streams or page views the data is often sparse and maps are very well suited for such data so maps or semi-structured by nature and with this support you can now actually have semi structured data represented along with structured columns in in any database so maps have this nice of nice feature to cap encapsulated sparse data as an example the common fields of a kick stream click stream or page view data are pragma host and except if you don't have map types you will have to end up creating a column for each of this header or field types however if you have map you can basically embed as key value pairs for all the data so on the left here on the slide you can see an example where you have a separate column for each field you end up with a lot of nodes basically the sparse however if you can embed them into in a map you can put them into a single column and sort of yeah have better efficiency and better representation of spots they imagine if you have thousands of fields in a click stream or page view you will have thousands of columns you will need thousands of columns represent data if you don't have a map type correct so given these are the most commonly used complexity types let's see what are the file formats that actually support these complex data types so most of file formats popular ones support complex data types however they have different serve variations so for instance if you have JSON it supports arrays and objects which are complex data types however JSON data is schema-less it is row oriented and this text fits because it is Kimmel s it has to store it in encase on every job the second type of file format is Avro and Avro has records enums arrays Maps unions and a fixed type however Avro has a schema it is oriented and it is binary compressed the third category is basically the park' and our style of file formats where the columnar so parquet and arc have support for arrays maps and structs the hewa schema they are column-oriented unlike Avro which is oriented and they're also binary compressed and they support a very nice compression and encoding types additionally so the main difference between parquet and arc is only in terms of how they represent complex types parquet includes the complex type hierarchy as reputation deflation levels however orc uses a separate column at every parent of the complex type to basically the prisons are now less so that apart from that difference in how they represent complex types parking hogs have similar capabilities in terms of optimizations and other compression techniques so to summarize JSON has no schema has no binary format in this columnar so it is not columnar Avro has a schema because binary format however it is not columnar and parquet and art are have a schema have a binary format and are columnar so let's see how we can query these different kinds of complex types and also the different file formats that they can be present in in how we can basically query these different variations in Vertica so in Vertica we basically have this feature called flex tables to where you can load complex data types and analyze them so flex tables use a binary format called vemma to store data as key value pairs clicks tables are schema-less they are weak typed and they trade flexibility for performance so when I mean what I mean by schema-less is basically the keys provide the field name and each row can potentially have different keys and it is weak type because there's no type information at the column level we have some we will see some examples of of this week type in the following slides but basically there's no type information so so the data is stored in text format and because of the week type and schema-less nature of flex tables you can implement some optimum use cases like if you can trivially implement needs like schema evolution or keep the complex types types fluid if that is your use case then the weak tightness and schema-less nature of flex tables will help you a lot to get give you that flexibility however because you have this weak type you you have a downside of not getting the best possible performance so if you if your use case is to get the best possible performance you can use a new feature of the strongly-typed complex types that we started to introduce in Vertica so complex types here are basically a strongly typed complex types they have a schema and then they give you the best possible performance because the optimizer now has enough information from the schema and the type to implement optimization system column selection or all the nice techniques that Vertica employs to give you the best possible color performance can now be supported even for complex types so and we'll see some of the examples of these two types in these slides now so let's use a simple data called restaurants a restaurant data - as running throughout this poll excites to basically see all the different variations of flex and complex steps so on this slide you have some sample data with four fields and essentially two rows if you sort of loaded in if you just operate them out so the four fields are named cuisine locations in menu name in cuisine or of type watch are locations is essentially an array and menu array of a row of two fields item and price so if you the data is in JSON there is no schema and there is no type information so how do we process that in Vertica so in Vertica you can simply create a flex table called restaurants you can copy the restaurant dot J's the restaurants of JSON file into Vertica and basically you can now start analyzing the data so if you do a select star from restaurants you will see that all the data is actually in one column called draw and it also you have the other column called identity which is to give you some unique row row ID but the row column base again encapsulates all the data that gives in the restaurant so JSON file this tall column is nothing but the V map format the V map format is a binary format that encodes the data as key value pairs and RAW format is basically backed by the long word binary column type in Vertica so each key essentially gives you the field name and the values the field value and it's all in its however the values are in the text text representation so see now you want to get better performance of this JSON data flex tables has these nice functions to basically analyze your data or try to extract some schema and type information from your data so if you execute compute flex table keys on the restaurants table you will see a new table called public dot restaurants underscore keys and then that will give you some information about your JSON data so it was able to automatically infer that your data has four fields namely could be name cuisine locations in menu and could also get that the name in cuisine or watch are however since locations in menu are complex types themselves one is array and one is area for row it sort of uses the same be map format as ease to process them so it has four columns to two primitive of type watch R and 2 R P map themselves so now you can materialize these columns by altering the table definitions and adding columns of that particular type it inferred and then you can get better performance from this materialized columns and yeah it's basically it's not in a single column anymore you have four columns for the fare your restaurant data and you can get some column selection and other optimizations on on the data that Whittaker provides all right so that is three flex tables are basically helpful if you don't have a schema and if you don't have any type of permission however we saw earlier that some file formats like Parker and Avro have schema and have some type information so in those cases you don't have to do the first step of inputting the type so you can directly create the type external table definition of the type and then you can target it to the park a file and you can load it in by an external table in vertical so the same restaurants dot JSON if you call if you transfer it to a translations or park' format you can basically get the fields with look however the locations and menu are still in the B map format all right so the V map format also allows you to explode the data and it has some nice functions to yeah M extract the fields from P map format so you have this map items so the same restaurant later if you want to explode and you want to apply predicate on the fields of the RS and the address of pro you can have map items to export your data and then you can apply predicates on a particular field in the complex type data so on this slide is basically showing you how you can explode the entire data the menu items as well as the locations and basically give you the elements of each of these complex types up so as I mentioned the menus so if you go back to the previous slide the locations and menu items are still the bond binary or the V map format so the question is if you want what if you want to get perform better on the V map data so for primitive types you could materialize into the primitive style however if it's an array and array of row we will need some first-class complex type constructs and that is what we will see that are added in what is right now so Vertica has started to introduce complex stereotypes with where these complex types is sort of a strongly typed complex site so on this slide you have an example of a row complex type where so we create an external table called customers and you have a row type of twit to fields name and ID so the complex type is basically inlined into the tables into the column definition and on the second example you can see the create external table items which is unlisted row type so it has an item of type row which is so fast to peals name and the properties is again another nested row type with two fixed quantities label so these are basically strongly typed complex types and then the optimizer can now give you a better performance compared to the V map using the strongly typed information in their queries so we have support for pure rows and extra draws in external tables for power K we have support for arrays and nested arrays as well for external tables in power K so you can declare an external table called contacts with a flip phone number of array of integers similarly you can have a nested array of items of type integer we can declare a column with that strongly typed complex type so the other complex type support that we are adding in the thinner liz's support for optimized one dimensional arrays and sets for both ross and as well as RK external table so you can create internal table called phone numbers with a one-dimensional array so here you have phone numbers of array of type int you can have one dimensional you can have sets as well which is also one color one dimension arrays but sets are basically optimized for fast look ups they are have unique elements and they are ordered so big so you can get fast look ups using sets if that is a use case then set will give you very quick lookups for elements and we also implemented some functions to support arrays sets as well so you have applied min apply max which are scale out that you can apply on top of an array element and you can get the minimum element and so on so you can up you have support for additional functions as well so the other feature that is coming in ten o is the explored arrays of functionality so we have a implemented EU DX that will allow you to similar similar to the example you saw in the math items case you can extract elements from these arrays and you can apply different predicates or analysis on the elements so for example if you have this restaurant table with the column name watch our locations of each an area of archer and menu again an area watch our you can insert values using the array constructor into these columns so here we inserting three values lilies feed the with location with locations cambridge pittsburgh menu items cheese and pepperoni again another row with name restaurant named bob tacos location Houston and totila salsa and Patty on the third example so now you can basically explode the both arrays into and extract the elements out from these arrays so you can explode the location array and extract the location elements which is which are basically Houston Cambridge Pittsburgh New Jersey and also you can explode the menu items and extract individual elements and now you can sort of apply other predicates on the extruded data Kollek so so so let's see what are some usage considerations of these complex data types so complex data types as we saw earlier are nice if you have sparse data so if your data has clickstream or has some page view data then maps are very nice to have to represent your data and then you can sort of efficiently represent the in the space wise fashion for sparse data use a map types and compensate that as we saw earlier for the web request count query it will help you simplify the analysis as well you don't have to have joins and it will simplify your query analysis as I just mentioned if your use cases are for fast look ups then you can use a set type so arrays are nice but they have the ordering on them however if your primary use case to just look up for certain elements then we can use the set type also you can use the B map or the Flex functionality that we have in Vertica if you want flexibility in your complex set data type schema so like I mentioned earlier you can trivially implement needs like scheme evolution or even keep the complex types fluid so if you have multiple iterations of unit analysis and each iteration we are changing the fields because you're just exploring the data then we map and flex will give you that nice ease to change the fields within the complex type or across files and we can load fluid complex you can load complexity types with bit fluids is basically different fields in different Rho into V map and flex tables easily however if you're once you basically treated over your data you figured out what are the fields and the complex types that you really need you can use the strongly typed complex data types that we started to introduce in Vertica so you can use the array type the struct type in the map type for your data analysis so that's sort of the high level use cases for complex types in vertical so it depends on a lot on where your data analysis phase is fear early then your data is usually still fluid and you might want to use V Maps and flex to explore it once you finalize your schema you can use the strongly typed complex data types and to get the best possible performance holic so so what's coming in the following releases of Vertica so antenna which is coming in sometime now so yeah so we are adding which is the next release of vertical basically we're adding support for loading Park a complex data types to the V map format so parquet is a strongly typed file format basically it has the schema it also has the type information for each of the complex type however if you are exploring your data then you might have different park' files with different schemes so you can load them to the V map format first and then you can analyze your data and then you can switch to the strongly typed complex types we're also adding one dimensional optimized arrays and sets in growth and for parquet so yeah the complex sets are not just limited to parquet you can also store them in drawers however right now you only support one dimension arrays and set in rows we're also adding the Explorer du/dx for one-dimensional arrays in the in this release so you can as you saw in the previous example you can explode the data for of arrays in arrays and you can apply predicates on individual elements for the erase data so you can in it'll apply for set so you can cause them to milli to erase and Clinics code sets as well so what are the plans paths that you know release so we are going to continue both for strongly-typed computer types right now we don't have support for the full in the tail release we won't have support for the full all the combinations of complex types so we only have support for nested arrays sorriness listed pure arrays or nested pure rows and some are only limited to park a file format so we will continue to add more support for sub queries and nested complex sites in the following in the in following releases and we're also planning to add this B map data type so you saw in the examples that the V map data format is currently backed by the long word binary data format or the other column type because of this the optimizer really cannot distinguish which is a which is which data is actually a long wall binary or which is actually data and we map format so if we the idea is to basically add a type called V map and then the optimizer can now implement our support optimizations or even syntax such as dot notation and yeah if your data is columnar such as Parque then you can implement optimizations just keep push down where you can push the keys that are actually querying in your in your in your analysis and then only those keys should be loaded from parquet and built into the V map format so that way you get sort of the column selection optimization for complex types as well and yeah that's something you can achieve if you have different types for the V map format so that's something on the roadmap as well and then unless join is basically another nice to have feature right now if you want to explode and join the array elements you have to explode in the sub query and then in the outer query you have to join the data however if you have unless join till I love you to explode as well as join the data in the same query and on the fly you can do both and finally we are also adding support for this new feature called UD vector so that's on the plan too so our work for complex types is is essentially chain the fundamental way Vertica execute in the sense of functions and expression so right now all expressions in Vertica can return only a single column out acceptance in some cases like beauty transforms and so on but the scalar functions for instance if you take aut scalar you can get only one column out of it however if you have some use cases where you want to compute multiple computation so if you also have multiple computations on the same input data say you have input data of two integers and you want to compute both addition and multiplication on those two columns this is for example but in many many machine learning example use cases have similar patterns so say you want to do both these computations on the data at the same time then in the current approach you have to have one function for addition one function for multiplication and both of them will have to load the data once basically loading data twice to get both these computations turn however with the Uni vector support you can perform both these computations in the same function and you can return two columns out so essentially saving you the loading loading these columns twice you can only do it once and get both the results out so that's sort of what we are trying to implement with all the changes that we are doing to support complex data types in Vertica and also you don't have to use these over Clause like a uni transform so PD scale just like we do scalars you can have your a vector and you can have multiple columns returned from your computations so that sort of concludes my talk so thank you for listening to my presentation now we are ready for Q&A

Published Date : Mar 30 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

EntityCategoryConfidence
AmericaLOCATION

0.99+

Jeff HealeyPERSON

0.99+

second rowQUANTITY

0.99+

MaryPERSON

0.99+

two rowsQUANTITY

0.99+

two fieldsQUANTITY

0.99+

first rowQUANTITY

0.99+

two rowsQUANTITY

0.99+

two typesQUANTITY

0.99+

each rowQUANTITY

0.99+

two integersQUANTITY

0.99+

DeepakPERSON

0.99+

one functionQUANTITY

0.99+

three fieldsQUANTITY

0.99+

fourth fieldsQUANTITY

0.99+

each elementQUANTITY

0.99+

each fieldQUANTITY

0.99+

thirdQUANTITY

0.99+

more than thousand web requestsQUANTITY

0.99+

second exampleQUANTITY

0.99+

todayDATE

0.99+

each keyQUANTITY

0.99+

each tableQUANTITY

0.99+

four fieldsQUANTITY

0.99+

third fieldQUANTITY

0.99+

first exampleQUANTITY

0.99+

Deepak Magette IIPERSON

0.99+

two columnsQUANTITY

0.99+

third categoryQUANTITY

0.99+

two columnsQUANTITY

0.99+

two fieldsQUANTITY

0.99+

HoustonLOCATION

0.99+

first stepQUANTITY

0.99+

twiceQUANTITY

0.99+

thousands of columnsQUANTITY

0.98+

three valuesQUANTITY

0.98+

this weekDATE

0.98+

more than one thousand website eventsQUANTITY

0.98+

third typeQUANTITY

0.98+

each iterationQUANTITY

0.98+

bothQUANTITY

0.98+

greater than thousandQUANTITY

0.98+

cambridgeLOCATION

0.98+

JSONTITLE

0.98+

both arraysQUANTITY

0.97+

one columnQUANTITY

0.97+

thousands of fieldsQUANTITY

0.97+

secondQUANTITY

0.97+

third exampleQUANTITY

0.97+

twoQUANTITY

0.97+

single columnQUANTITY

0.96+

thousandQUANTITY

0.96+

AlexPERSON

0.96+

firstQUANTITY

0.96+

BBC 2020ORGANIZATION

0.96+

VerticaTITLE

0.96+

four columnsQUANTITY

0.95+

onceQUANTITY

0.95+

one typeQUANTITY

0.95+

V MapsTITLE

0.94+

one colorQUANTITY

0.94+

second typeQUANTITY

0.94+

one dimensionQUANTITY

0.94+

first two fieldsQUANTITY

0.93+

four tablesQUANTITY

0.91+

eachQUANTITY

0.91+