Oracle Announces MySQL HeatWave on AWS
>>Oracle continues to enhance my sequel Heatwave at a very rapid pace. The company is now in its fourth major release since the original announcement in December 2020. 1 of the main criticisms of my sequel, Heatwave, is that it only runs on O. C I. Oracle Cloud Infrastructure and as a lock in to Oracle's Cloud. Oracle recently announced that heat wave is now going to be available in AWS Cloud and it announced its intent to bring my sequel Heatwave to Azure. So my secret heatwave on AWS is a significant TAM expansion move for Oracle because of the momentum AWS Cloud continues to show. And evidently the Heatwave Engineering team has taken the development effort from O. C I. And is bringing that to A W S with a number of enhancements that we're gonna dig into today is senior vice president. My sequel Heatwave at Oracle is back with me on a cube conversation to discuss the latest heatwave news, and we're eager to hear any benchmarks relative to a W S or any others. Nippon has been leading the Heatwave engineering team for over 10 years and there's over 100 and 85 patents and database technology. Welcome back to the show and good to see you. >>Thank you. Very happy to be back. >>Now for those who might not have kept up with the news, uh, to kick things off, give us an overview of my sequel, Heatwave and its evolution. So far, >>so my sequel, Heat Wave, is a fully managed my secret database service offering from Oracle. Traditionally, my secret has been designed and optimised for transaction processing. So customers of my sequel then they had to run analytics or when they had to run machine learning, they would extract the data out of my sequel into some other database for doing. Unlike processing or machine learning processing my sequel, Heat provides all these capabilities built in to a single database service, which is my sequel. He'd fake So customers of my sequel don't need to move the data out with the same database. They can run transaction processing and predicts mixed workloads, machine learning, all with a very, very good performance in very good price performance. Furthermore, one of the design points of heat wave is is a scale out architecture, so the system continues to scale and performed very well, even when customers have very large late assignments. >>So we've seen some interesting moves by Oracle lately. The collaboration with Azure we've we've covered that pretty extensively. What was the impetus here for bringing my sequel Heatwave onto the AWS cloud? What were the drivers that you considered? >>So one of the observations is that a very large percentage of users of my sequel Heatwave, our AWS users who are migrating of Aurora or so already we see that a good percentage of my secret history of customers are migrating from GWS. However, there are some AWS customers who are still not able to migrate the O. C. I to my secret heat wave. And the reason is because of, um, exorbitant cost, which was charges. So in order to migrate the workload from AWS to go see, I digress. Charges are very high fees which becomes prohibitive for the customer or the second example we have seen is that the latency of practising a database which is outside of AWS is very high. So there's a class of customers who would like to get the benefits of my secret heatwave but were unable to do so and with this support of my secret trip inside of AWS, these customers can now get all the grease of the benefits of my secret he trip without having to pay the high fees or without having to suffer with the poorly agency, which is because of the ws architecture. >>Okay, so you're basically meeting the customer's where they are. So was this a straightforward lifted shift from from Oracle Cloud Infrastructure to AWS? >>No, it is not because one of the design girls we have with my sequel, Heatwave is that we want to provide our customers with the best price performance regardless of the cloud. So when we decided to offer my sequel, he headed west. Um, we have optimised my sequel Heatwave on it as well. So one of the things to point out is that this is a service with the data plane control plane and the console are natively running on AWS. And the benefits of doing so is that now we can optimise my sequel Heatwave for the E. W s architecture. In addition to that, we have also announced a bunch of new capabilities as a part of the service which will also be available to the my secret history of customers and our CI, But we just announced them and we're offering them as a part of my secret history of offering on AWS. >>So I just want to make sure I understand that it's not like you just wrapped your stack in a container and stuck it into a W s to be hosted. You're saying you're actually taking advantage of the capabilities of the AWS cloud natively? And I think you've made some other enhancements as well that you're alluding to. Can you maybe, uh, elucidate on those? Sure. >>So for status, um, we have taken the mind sequel Heatwave code and we have optimised for the It was infrastructure with its computer network. And as a result, customers get very good performance and price performance. Uh, with my secret he trade in AWS. That's one performance. Second thing is, we have designed new interactive counsel for the service, which means that customers can now provision there instances with the council. But in addition, they can also manage their schemas. They can. Then court is directly from the council. Autopilot is integrated. The council we have introduced performance monitoring, so a lot of capabilities which we have introduced as a part of the new counsel. The third thing is that we have added a bunch of new security features, uh, expose some of the security features which were part of the My Secret Enterprise edition as a part of the service, which gives customers now a choice of using these features to build more secure applications. And finally, we have extended my secret autopilot for a number of old gpus cases. In the past, my secret autopilot had a lot of capabilities for Benedict, and now we have augmented my secret autopilot to offer capabilities for elderly people. Includes as well. >>But there was something in your press release called Auto thread. Pooling says it provides higher and sustained throughput. High concerns concerns concurrency by determining Apple number of transactions, which should be executed. Uh, what is that all about? The auto thread pool? It seems pretty interesting. How does it affect performance? Can you help us understand that? >>Yes, and this is one of the capabilities of alluding to which we have added in my secret autopilot for transaction processing. So here is the basic idea. If you have a system where there's a large number of old EP transactions coming into it at a high degrees of concurrency in many of the existing systems of my sequel based systems, it can lead to a state where there are few transactions executing, but a bunch of them can get blocked with or a pilot tried pulling. What we basically do is we do workload aware admission control and what this does is it figures out, what's the right scheduling or all of these algorithms, so that either the transactions are executing or as soon as something frees up, they can start executing, so there's no transaction which is blocked. The advantage to the customer of this capability is twofold. A get significantly better throughput compared to service like Aurora at high levels of concurrency. So at high concurrency, for instance, uh, my secret because of this capability Uh oh, thread pulling offers up to 10 times higher compared to Aurora, that's one first benefit better throughput. The second advantage is that the true part of the system never drops, even at high levels of concurrency, whereas in the case of Aurora, the trooper goes up, but then, at high concurrency is, let's say, starting, uh, level of 500 or something. It depends upon the underlying shit they're using the troopers just dropping where it's with my secret heatwave. The truth will never drops. Now, the ramification for the customer is that if the truth is not gonna drop, the user can start off with a small shape, get the performance and be a show that even the workload increases. They will never get a performance, which is worse than what they're getting with lower levels of concurrency. So this let's leads to customers provisioning a shape which is just right for them. And if they need, they can, uh, go with the largest shape. But they don't like, you know, over pay. So those are the two benefits. Better performance and sustain, uh, regardless of the level of concurrency. >>So how do we quantify that? I know you've got some benchmarks. How can you share comparisons with other cloud databases especially interested in in Amazon's own databases are obviously very popular, and and are you publishing those again and get hub, as you have done in the past? Take us through the benchmarks. >>Sure, So benchmarks are important because that gives customers a sense of what performance to expect and what price performance to expect. So we have run a number of benchmarks. And yes, all these benchmarks are available on guitar for customers to take a look at. So we have performance results on all the three castle workloads, ol DB Analytics and Machine Learning. So let's start with the Rdp for Rdp and primarily because of the auto thread pulling feature. We show that for the IPCC for attended dataset at high levels of concurrency, heatwave offers up to 10 times better throughput and this performance is sustained, whereas in the case of Aurora, the performance really drops. So that's the first thing that, uh, tend to alibi. Sorry, 10 gigabytes. B B C c. I can come and see the performance are the throughput is 10 times better than Aurora for analytics. We have done a comparison of my secret heatwave in AWS and compared with Red Ship Snowflake Googled inquiry, we find that the price performance of my secret heatwave compared to read ship is seven times better. So my sequel, Heat Wave in AWS, provides seven times better price performance than red ship. That's a very, uh, interesting results to us. Which means that customers of Red Shift are really going to take the service seriously because they're gonna get seven times better price performance. And this is all running in a W s so compared. >>Okay, carry on. >>And then I was gonna say, compared to like, Snowflake, uh, in AWS offers 10 times better price performance. And compared to Google, ubiquity offers 12 times better price performance. And this is based on a four terabyte p PCH workload. Results are available on guitar, and then the third category is machine learning and for machine learning, uh, for training, the performance of my secret heatwave is 25 times faster compared to that shit. So all the three workloads we have benchmark's results, and all of these scripts are available on YouTube. >>Okay, so you're comparing, uh, my sequel Heatwave on AWS to Red Shift and snowflake on AWS. And you're comparing my sequel Heatwave on a W s too big query. Obviously running on on Google. Um, you know, one of the things Oracle is done in the past when you get the price performance and I've always tried to call fouls you're, like, double your price for running the oracle database. Uh, not Heatwave, but Oracle Database on a W s. And then you'll show how it's it's so much cheaper on on Oracle will be like Okay, come on. But they're not doing that here. You're basically taking my sequel Heatwave on a W s. I presume you're using the same pricing for whatever you see to whatever else you're using. Storage, um, reserved instances. That's apples to apples on A W s. And you have to obviously do some kind of mapping for for Google, for big query. Can you just verify that for me, >>we are being more than fair on two dimensions. The first thing is, when I'm talking about the price performance for analytics, right for, uh, with my secret heat rape, the cost I'm talking about from my secret heat rape is the cost of running transaction processing, analytics and machine learning. So it's a fully loaded cost for the case of my secret heatwave. There has been I'm talking about red ship when I'm talking about Snowflake. I'm just talking about the cost of these databases for running, and it's only it's not, including the source database, which may be more or some other database, right? So that's the first aspect that far, uh, trip. It's the cost for running all three kinds of workloads, whereas for the competition, it's only for running analytics. The second thing is that for these are those services whether it's like shit or snowflakes, That's right. We're talking about one year, fully paid up front cost, right? So that's what most of the customers would pay for. Many of the customers would pay that they will sign a one year contract and pay all the costs ahead of time because they get a discount. So we're using that price and the case of Snowflake. The costs were using is their standard edition of price, not the Enterprise edition price. So yes, uh, more than in this competitive. >>Yeah, I think that's an important point. I saw an analysis by Marx Tamer on Wiki Bond, where he was doing the TCO comparisons. And I mean, if you have to use two separate databases in two separate licences and you have to do et yelling and all the labour associated with that, that that's that's a big deal and you're not even including that aspect in in your comparison. So that's pretty impressive. To what do you attribute that? You know, given that unlike, oh, ci within the AWS cloud, you don't have as much control over the underlying hardware. >>So look hard, but is one aspect. Okay, so there are three things which give us this advantage. The first thing is, uh, we have designed hateful foreign scale out architecture. So we came up with new algorithms we have come up with, like, uh, one of the design points for heat wave is a massively partitioned architecture, which leads to a very high degree of parallelism. So that's a lot of hype. Each were built, So that's the first part. The second thing is that although we don't have control over the hardware, but the second design point for heat wave is that it is optimised for commodity cloud and the commodity infrastructure so we can have another guys, what to say? The computer we get, how much network bandwidth do we get? How much of, like objects to a brand that we get in here? W s. And we have tuned heat for that. That's the second point And the third thing is my secret autopilot, which provides machine learning based automation. So what it does is that has the users workload is running. It learns from it, it improves, uh, various premieres in the system. So the system keeps getting better as you learn more and more questions. And this is the third thing, uh, as a result of which we get a significant edge over the competition. >>Interesting. I mean, look, any I SV can go on any cloud and take advantage of it. And that's, uh I love it. We live in a new world. How about machine learning workloads? What? What did you see there in terms of performance and benchmarks? >>Right. So machine learning. We offer three capabilities training, which is fully automated, running in France and explanations. So one of the things which many of our customers told us coming from the enterprise is that explanations are very important to them because, uh, customers want to know that. Why did the the system, uh, choose a certain prediction? So we offer explanations for all models which have been derailed by. That's the first thing. Now, one of the interesting things about training is that training is usually the most expensive phase of machine learning. So we have spent a lot of time improving the performance of training. So we have a bunch of techniques which we have developed inside of Oracle to improve the training process. For instance, we have, uh, metal and proxy models, which really give us an advantage. We use adaptive sampling. We have, uh, invented in techniques for paralysing the hyper parameter search. So as a result of a lot of this work, our training is about 25 times faster than that ship them health and all the data is, uh, inside the database. All this processing is being done inside the database, so it's much faster. It is inside the database. And I want to point out that there is no additional charge for the history of customers because we're using the same cluster. You're not working in your service. So all of these machine learning capabilities are being offered at no additional charge inside the database and as a performance, which is significantly faster than that, >>are you taking advantage of or is there any, uh, need not need, but any advantage that you can get if two by exploiting things like gravity. John, we've talked about that a little bit in the past. Or trainee. Um, you just mentioned training so custom silicon that AWS is doing, you're taking advantage of that. Do you need to? Can you give us some insight >>there? So there are two things, right? We're always evaluating What are the choices we have from hybrid perspective? Obviously, for us to leverage is right and like all the things you mention about like we have considered them. But there are two things to consider. One is he is a memory system. So he favours a big is the dominant cost. The processor is a person of the cost, but memory is the dominant cost. So what we have evaluated and found is that the current shape which we are using is going to provide our customers with the best price performance. That's the first thing. The second thing is that there are opportunities at times when we can use a specialised processor for vaccinating the world for a bit. But then it becomes a matter of the cost of the customer. Advantage of our current architecture is on the same hardware. Customers are getting very good performance. Very good, energetic performance in a very good machine learning performance. If you will go with the specialised processor, it may. Actually, it's a machine learning, but then it's an additional cost with the customers we need to pay. So we are very sensitive to the customer's request, which is usually to provide very good performance at a very low cost. And we feel is that the current design we have as providing customers very good performance and very good price performance. >>So part of that is architectural. The memory intensive nature of of heat wave. The other is A W s pricing. If AWS pricing were to flip, it might make more sense for you to take advantage of something like like cranium. Okay, great. Thank you. And welcome back to the benchmarks benchmarks. Sometimes they're artificial right there. A car can go from 0 to 60 in two seconds. But I might not be able to experience that level of performance. Do you? Do you have any real world numbers from customers that have used my sequel Heatwave on A W s. And how they look at performance? >>Yes, absolutely so the my Secret service on the AWS. This has been in Vera for, like, since November, right? So we have a lot of customers who have tried the service. And what actually we have found is that many of these customers, um, planning to migrate from Aurora to my secret heat rape. And what they find is that the performance difference is actually much more pronounced than what I was talking about. Because with Aurora, the performance is actually much poorer compared to uh, like what I've talked about. So in some of these cases, the customers found improvement from 60 times, 240 times, right? So he travels 100 for 240 times faster. It was much less expensive. And the third thing, which is you know, a noteworthy is that customers don't need to change their applications. So if you ask the top three reasons why customers are migrating, it's because of this. No change to the application much faster, and it is cheaper. So in some cases, like Johnny Bites, what they found is that the performance of their applications for the complex storeys was about 60 to 90 times faster. Then we had 60 technologies. What they found is that the performance of heat we have compared to Aurora was 100 and 39 times faster. So, yes, we do have many such examples from real workloads from customers who have tried it. And all across what we find is if it offers better performance, lower cost and a single database such that it is compatible with all existing by sequel based applications and workloads. >>Really impressive. The analysts I talked to, they're all gaga over heatwave, and I can see why. Okay, last question. Maybe maybe two and one. Uh, what's next? In terms of new capabilities that customers are going to be able to leverage and any other clouds that you're thinking about? We talked about that upfront, but >>so in terms of the capabilities you have seen, like they have been, you know, non stop attending to the feedback from the customers in reacting to it. And also, we have been in a wedding like organically. So that's something which is gonna continue. So, yes, you can fully expect that people not dressed and continue to in a way and with respect to the other clouds. Yes, we are planning to support my sequel. He tripped on a show, and this is something that will be announced in the near future. Great. >>All right, Thank you. Really appreciate the the overview. Congratulations on the work. Really exciting news that you're moving my sequel Heatwave into other clouds. It's something that we've been expecting for some time. So it's great to see you guys, uh, making that move, and as always, great to have you on the Cube. >>Thank you for the opportunity. >>All right. And thank you for watching this special cube conversation. I'm Dave Volonte, and we'll see you next time.
SUMMARY :
The company is now in its fourth major release since the original announcement in December 2020. Very happy to be back. Now for those who might not have kept up with the news, uh, to kick things off, give us an overview of my So customers of my sequel then they had to run analytics or when they had to run machine So we've seen some interesting moves by Oracle lately. So one of the observations is that a very large percentage So was this a straightforward lifted shift from No, it is not because one of the design girls we have with my sequel, So I just want to make sure I understand that it's not like you just wrapped your stack in So for status, um, we have taken the mind sequel Heatwave code and we have optimised Can you help us understand that? So this let's leads to customers provisioning a shape which is So how do we quantify that? So that's the first thing that, So all the three workloads we That's apples to apples on A W s. And you have to obviously do some kind of So that's the first aspect And I mean, if you have to use two So the system keeps getting better as you learn more and What did you see there in terms of performance and benchmarks? So we have a bunch of techniques which we have developed inside of Oracle to improve the training need not need, but any advantage that you can get if two by exploiting We're always evaluating What are the choices we have So part of that is architectural. And the third thing, which is you know, a noteworthy is that In terms of new capabilities that customers are going to be able so in terms of the capabilities you have seen, like they have been, you know, non stop attending So it's great to see you guys, And thank you for watching this special cube conversation.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volonte | PERSON | 0.99+ |
December 2020 | DATE | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
France | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
10 times | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Heatwave | TITLE | 0.99+ |
100 | QUANTITY | 0.99+ |
60 times | QUANTITY | 0.99+ |
one year | QUANTITY | 0.99+ |
12 times | QUANTITY | 0.99+ |
GWS | ORGANIZATION | 0.99+ |
60 technologies | QUANTITY | 0.99+ |
first part | QUANTITY | 0.99+ |
240 times | QUANTITY | 0.99+ |
two separate licences | QUANTITY | 0.99+ |
third category | QUANTITY | 0.99+ |
second advantage | QUANTITY | 0.99+ |
0 | QUANTITY | 0.99+ |
seven times | QUANTITY | 0.99+ |
two seconds | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
seven times | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
one | QUANTITY | 0.99+ |
25 times | QUANTITY | 0.99+ |
second point | QUANTITY | 0.99+ |
November | DATE | 0.99+ |
85 patents | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
Aurora | TITLE | 0.99+ |
third thing | QUANTITY | 0.99+ |
Each | QUANTITY | 0.99+ |
second example | QUANTITY | 0.99+ |
10 gigabytes | QUANTITY | 0.99+ |
three things | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
two benefits | QUANTITY | 0.99+ |
one aspect | QUANTITY | 0.99+ |
first aspect | QUANTITY | 0.98+ |
two separate databases | QUANTITY | 0.98+ |
over 10 years | QUANTITY | 0.98+ |
fourth major release | QUANTITY | 0.98+ |
39 times | QUANTITY | 0.98+ |
first thing | QUANTITY | 0.98+ |
Heat Wave | TITLE | 0.98+ |
Michal Klaus, Ataccama
>> From theCUBE studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. >> Welcome back to CUBE 365. I'm your host, Rebecca Knight. Today we are with Michal Klaus. He is the CEO of Ataccama. Today Ataccama has just launched generation two of Ataccama ONE, a self-driving platform for data management and data governance. We're going to do a deep dive into the generation two of Ataccama ONE. We're going to learn what it means to make data management and governance self-driving and the impact it will have on organizations. Thanks so much for joining us on theCUBE, Michal. >> Thank you, Rebecca. Thanks for having me. >> So you are a technology veteran. You've been CEO of this company for 13 years. Tell our viewers a little bit about Ataccama. >> So Ataccama was started as basically a spinoff of a professional services company. And I was part of the professional services company. We were doing data integrations, data warehousing, things like that. And on every project, we would struggle with data quality and actually what we didn't know what it was called, but it was mastering, you know, scattered data across the whole enterprises. So after several projects, we developed a little kind of utility that we would use on the projects and it seemed to be very popular with our customers. So we decided to give it a try and spin it off as a product company. And that's how Ataccama was born. That's how it all started. And... >> That's how it all started, and now today you're launching generation two of Ataccama ONE. And this is about self-driving data management and governance. I can't hear the word self-driving without thinking about Elon Musk. Can you talk a little bit about what self-driving means in this context? >> So self-driving in the car industry, it will break a major shift into individual transportation, right? People will be able to reclaim one to two hours per day, which they now spend driving, which is pretty kind of mundane, low added value activity. But that's what the self-driving cars will bring. Basically people will be free to do more creative, more fun stuff, right? And we've taken this concept on a high level and we are bringing it to data management and data governance in a similar fashion, meaning organizations and people, data people, business people, will be free from the mundane activity of finding data, trying to put it together. They will be able to use readily made let's say data product, which will be, you know, available. It will be high quality. It will be governed. So that's how we are kind of using the analogy between the car industry and the data management industry. >> So what was the problem that you were seeing in the space? Was it just the way that your data scientists were spending their time? Was it the cumbersome ways that they were trying to mine the data? What was the problem? What was the challenge that you were trying to solve here? >> So there are actually a few challenges. One challenge is basically time to value. Today, when a business decides to come up with a new product or you need a new campaign for Christmas or something like this, there is an underlying need for data product, right? And it takes weeks or months to prepare that. And that's only if you have some infrastructure, in some cases it can take even longer. And that's one big issue. You need to be able to give non-technical users a way to instantly get the data they need. And you don't have that in organizations, basically nowhere at the moment. So that's the time to value. The other thing is basically resources, right? You have very valuable resources, data scientists, even analysts who spend, you know, there is this kind of (indistinct), right? They spend 80% on really preparing the data, and only 20% on the value added part of their jobs. And we are getting rid of the 80% again. And last but not least what we've been seeing, and it's really painful for organizations. You have a very kind of driven business people who just want to deliver business results. They don't want to bother with, you know, "Where do I get the data? How do I do it?" And then you have rightly so people who are focused on doing things in the right way, people focus on governance in general sense, meaning, you know, we have to follow policies. We have to, when integrating data, we want to do it in the right way so that it's reusable, et cetera, et cetera. And there is a growing tension between those two views, worldviews, I would say, and it's kind of really painful, creating a lot of conflict, preventing the business people to do what they want to do fast, and preventing the people who focus on governance, keeping things in order. And again, that's what our platform is solving or actually is actually making the gap disappear completely. >> It's removing that tension that you're talking about. So how is this different from the AI and machine learning that so many other companies are investing in? >> It is and isn't different. It isn't different in one way. Many companies, you know, in data management, outside of data management, are using AI to make life easier for people and organizations. Basically the machine learning is taking part of what people needed to be doing before that. And you have that in consumer applications, you have that in data management, B2B applications. Now the huge difference is that we've taken the several disciplines, kind of sub domains of data management, namely data profiling, data cataloging, data quality management, by that, we also mean data cleansing, and data mastering, and data integration as well. So we've taken all this. We redeveloped, we had that in our platform. We redeveloped it from scratch. And that allows us basically one critical thing, which is different. If you only apply AI on the level of the individual, let's say modules or products, you will end up with broken processes. You will have, you know, augmented data profiling, augmented data cataloging, but you will still have the walls between the products, from a customer's view, it's kind of a wall between the processes or sub-processes, the domains. So the fact that we have redeveloped it, or the reason why we have redeveloped it, was to get rid of those walls, those silos, and this way we can actually automate the whole process, not just the parts of the process. That's the biggest difference. >> I definitely want to ask you about removing those silos, but I want to get back to something you were saying before, and that is this idea that you built it from scratch. That really is what sets Ataccama apart, is that you architect these things in-house, which is different from a lot of competitors. Talk a little bit about why you see that as such an advantage. >> So this has been in our DNA, kind of from day one. When we started to build the core of our product, which is let's say data processing engine, we realize from day one, that it needs to be, you know, high performance, powerful. It needs to support real time scenarios. And it paid off greatly because if you have a product, for example, that doesn't have the real-time capability of slapping on the real time, it's almost impossible, right? You end up with a not so good core with some added functionality. And this is how we build the product gradually, you know, around the data processing, we build the data quality, we build the data mastering, then we build a metadata core next to it. And the whole platform now basically is built on basically on top of three major underlying components. One is the data processing. One is the metadata management core. And one is actually the AI core. And this allows us to do everything that I was talking about. This allows us to automate the whole process. >> I want to ask a little bit about the silos that you were talking about, and also the tension that you were just talking about earlier in our conversation that exists between business people and the data scientists, the ones who want to make sure we're getting everything right and fidelity, and that we're paying attention to governance. And then the people who are more focused on business outcomes, particularly at this time where we're all enduring a global pandemic, which has changed everything about the way we live and the way we work. Do you think that the silos have gotten worse during this pandemic when people are working from home, working asynchronously, working remotely, and how do you think this generation two of Ataccama ONE can help ease those challenges and those struggles that so many teams are having? >> Yeah. Thank you for the question. It's kind of, it's been on my mind for almost a year now, and actually in two ways, one way is how governments, our governments, how they're dealing with the pandemic, because there, the data is also the key to everything, right? It's the critical factor there. And I have to say the governments are not doing exactly a great job, also in the way they are managing the data and governing the data, because at the end of the day, what will be needed to fight the pandemic for good is a way to predict on a very highly granular basis, what is, and what is not happening in each city, in each county, and, you know, tighten or release the measures based on that. And of course you need very good data science for that, but you also need very good data management below that to have real time granular data. So that's one kind of thing that's been a little bit frustrating for me for a long time. Now, if we look at our customers, organizations and users, what's happening there is that, of course, we all see the shift to work from home. And we also see the needs to better support cooperation between the people who are not in one place anymore, right? So on the level of, let's say the user interface, what we brought to Ataccama ONE generation two is a new way users will be interacting with the platform, basically because of the self-driving nature, the users will more or less be confirming what the platform is suggesting. That's one major shift. And the other thing is there is a kind of implicitly built-in collaboration and governance process within the platform. So we believe that this will help the whole data democratization process, emphasized now by the pandemic and work from home and all these drivers. >> So what is the impact? We hear a lot about data democratization. What do you think the impact that will have going forward in terms of what will be driving companies, and how will that change the way employees and colleagues interact with and collaborate with each other? >> We've been hearing about digital transformation for quite a few years, all of us. And I guess, you know the joke, right? "Who is driving the digital transformation for you today? Is it CEO, COO, or CFO? No, it's COVID," right? It really accelerated transformation in ways we couldn't imagine. Now what that means is that if organizations are to succeed, bringing all the processes to the digital realm and all processes means everything from the market-facing, customer-facing customer service, but also all the internal processes you have to bring to the digital. What that really means is you also have to be able to give data to the people throughout the company, and you have to be able to do it in a way that's kind of on one hand safe. So you need to be able to define who can do what, who can see what in the data. On the other hand, you need to have kind of the courage simply to give the data to people and let them do what they understand best, which is their local kind of part of the organization, right? Local part of the process. And that's the biggest value we think our platform is bringing to the market, meaning it will allow exactly what I was talking about. Not to be afraid to give the data to the people, give the high quality instantly available data to the people. And at the same time, be assured that it is safe from the governance perspective. >> So it's helping companies think about problems differently, think about potential solutions differently, but most importantly, it's empowering the employees to be able to have the data themselves, and getting back to the self-driving car example, where we don't need to worry about driving places, we can use our own time for much more value-added things in our lives. And those employees can do the much more value-added things in their jobs. >> Yes, absolutely. You're absolutely right. The digital transformation is kind of followed, or maybe led by the change organizations are managed, right? If you look at the successful, you know, digital-first organizations like the big tech, right, Google, et cetera, you can see that their organization is very flat, which is something else than what you have in the traditional brick and mortar companies. So I think the shift from, you know, hierarchical organization to the more flat, more decentralized way of managing things, companies, needs to be also accompanied by the data availability for people. And you have to empower, as you say, everyone through the organization. >> How do you foresee the next 12 to 24 months playing out as we all adjust to this new normal? >> Wow, that's a pretty interesting question. I won't talk about what I think will be happening with the pandemic. I think we will see, I will talk about it a little bit. I think we will see the waves, hopefully with the amplitudes kind of narrowing. So that's on that side. What I think we will see, let's say in the economy and in the industry, I can comment on from the data management perspective. I think organizations will have to adopt the new way of working with data, giving the data to the people, empowering the people. If you don't do it, there is of course, some let's say momentum, right? When you're a large enterprise with a lot of, let's say, you know, big customer base, a lot of contracts accumulated. It won't go away that fast. But those who will not adapt, they will see a small, like longer gradual decline in their revenues, and their competitiveness in reality. Whereas those small and big ones who will adopt this new way of working with data, we will see them growing faster than the other ones. >> So for our viewers who want to know more about Ataccama's launch, it is www.Ataccama/selfdriving. What is next for this platform? I want you to close this out here and tell us what is next for generation two of Ataccama ONE? >> So we have just launched the platform. It is available to a limited number of customers in the beta version. The GA version is going to be available in spring, in February next year. And we will be kind of speeding up with additional releases of the platform, that will gradually make the whole suite of functionality available in the self-driving fashion. So that let's say a year from now, you will really be able to go to your browser and actually speak to the platform, speak your wish, which we call intent. We call the principle from intent to result. So for example, you'll be able to say, "I need all my customer and product ownership data as an API which is updated every two hours." And without having to do anything else, you will be able to get that API, which means really complex thing, right? You need to be able to map the sources, translate the data, transform it, populate the API, basically build the integration and governance pipeline. So we think we will get to this point, about the same time Elon Musk will actually deliver the full self-driving capability to the cars. >> It's an exciting future that you're painting right now. >> We think so too. >> Excellent, Michal Klaus, thank you so much for joining us today. >> Thank you, Rebecca. >> Stay tuned for more of CUBE 365. >> Thank you. (calm music)
SUMMARY :
leaders all around the world, and the impact it will Thanks for having me. So you are a technology veteran. and it seemed to be very I can't hear the word self-driving So self-driving in the car industry, So that's the time to value. the AI and machine learning So the fact that we have redeveloped it, is that you architect And one is actually the AI core. and the way we work. And the other thing is there is a kind of the way employees and the data to the people, it's empowering the employees And you have to empower, as you say, giving the data to the I want you to close this out here available in the self-driving fashion. that you're painting right now. thank you so much for joining us today. Thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michal Klaus | PERSON | 0.99+ |
Rebecca | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Ataccama | ORGANIZATION | 0.99+ |
13 years | QUANTITY | 0.99+ |
Elon Musk | PERSON | 0.99+ |
Elon Musk | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Michal | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
two ways | QUANTITY | 0.99+ |
Christmas | EVENT | 0.99+ |
Today | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
pandemic | EVENT | 0.99+ |
One challenge | QUANTITY | 0.99+ |
February next year | DATE | 0.99+ |
One | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one big issue | QUANTITY | 0.98+ |
www.Ataccama | OTHER | 0.98+ |
one way | QUANTITY | 0.98+ |
Ataccama | PERSON | 0.97+ |
20% | QUANTITY | 0.96+ |
theCUBE | ORGANIZATION | 0.95+ |
spring | DATE | 0.95+ |
two | QUANTITY | 0.94+ |
one critical thing | QUANTITY | 0.94+ |
each city | QUANTITY | 0.93+ |
24 months | QUANTITY | 0.93+ |
two hours per day | QUANTITY | 0.93+ |
each county | QUANTITY | 0.91+ |
two views | QUANTITY | 0.88+ |
one place | QUANTITY | 0.87+ |
12 | QUANTITY | 0.84+ |
generation two | QUANTITY | 0.83+ |
three major underlying components | QUANTITY | 0.82+ |
waves | EVENT | 0.79+ |
first | QUANTITY | 0.78+ |
one major shift | QUANTITY | 0.78+ |
one kind | QUANTITY | 0.77+ |
every two hours | QUANTITY | 0.74+ |
almost a year | QUANTITY | 0.73+ |
day one | QUANTITY | 0.73+ |
ONE generation two | COMMERCIAL_ITEM | 0.71+ |
Ataccama ONE | TITLE | 0.71+ |
selfdriving | OTHER | 0.71+ |
CUBE 365 | ORGANIZATION | 0.68+ |
ONE | TITLE | 0.54+ |
Ataccama | LOCATION | 0.54+ |
a year from | DATE | 0.49+ |
CUBE 365 | TITLE | 0.47+ |
ONE | COMMERCIAL_ITEM | 0.46+ |
CUBE | ORGANIZATION | 0.39+ |
ONE | QUANTITY | 0.35+ |
Another test of transitions
>> Hi, my name is Andy Clemenko. I'm a Senior Solutions Engineer at StackRox. Thanks for joining us today for my talk on labels, labels, labels. Obviously, you can reach me at all the socials. Before we get started, I like to point you to my GitHub repo, you can go to andyc.info/dc20, and it'll take you to my GitHub page where I've got all of this documentation, socials. Before we get started, I like to point you to my GitHub repo, you can go to andyc.info/dc20, (upbeat music) >> Hi, my name is Andy Clemenko. I'm a Senior Solutions Engineer at StackRox. Thanks for joining us today for my talk on labels, labels, labels. Obviously, you can reach me at all the socials. Before we get started, I like to point you to my GitHub repo, you can go to andyc.info/dc20, and it'll take you to my GitHub page where I've got all of this documentation, I've got the Keynote file there. YAMLs, I've got Dockerfiles, Compose files, all that good stuff. If you want to follow along, great, if not go back and review later, kind of fun. So let me tell you a little bit about myself. I am a former DOD contractor. This is my seventh DockerCon. I've spoken, I had the pleasure to speak at a few of them, one even in Europe. I was even a Docker employee for quite a number of years, providing solutions to the federal government and customers around containers and all things Docker. So I've been doing this a little while. One of the things that I always found interesting was the lack of understanding around labels. So why labels, right? Well, as a former DOD contractor, I had built out a large registry. And the question I constantly got was, where did this image come from? How did you get it? What's in it? Where did it come from? How did it get here? And one of the things we did to kind of alleviate some of those questions was we established a baseline set of labels. Labels really are designed to provide as much metadata around the image as possible. I ask everyone in attendance, when was the last time you pulled an image and had 100% confidence, you knew what was inside it, where it was built, how it was built, when it was built, you probably didn't, right? The last thing we obviously want is a container fire, like our image on the screen. And one kind of interesting way we can kind of prevent that is through the use of labels. We can use labels to address security, address some of the simplicity on how to run these images. So think of it, kind of like self documenting, Think of it also as an audit trail, image provenance, things like that. These are some interesting concepts that we can definitely mandate as we move forward. What is a label, right? Specifically what is the Schema? It's just a key-value. All right? It's any key and pretty much any value. What if we could dump in all kinds of information? What if we could encode things and store it in there? And I've got a fun little demo to show you about that. Let's start off with some of the simple keys, right? Author, date, description, version. Some of the basic information around the image. That would be pretty useful, right? What about specific labels for CI? What about a, where's the version control? Where's the source, right? Whether it's Git, whether it's GitLab, whether it's GitHub, whether it's Gitosis, right? Even SPN, who cares? Where are the source files that built, where's the Docker file that built this image? What's the commit number? That might be interesting in terms of tracking the resulting image to a person or to a commit, hopefully then to a person. How is it built? What if you wanted to play with it and do a git clone of the repo and then build the Docker file on your own? Having a label specifically dedicated on how to build this image might be interesting for development work. Where it was built, and obviously what build number, right? These kind of all, not only talk about continuous integration, CI but also start to talk about security. Specifically what server built it. The version control number, the version number, the commit number, again, how it was built. What's the specific build number? What was that job number in, say, Jenkins or GitLab? What if we could take it a step further? What if we could actually apply policy enforcement in the build pipeline, looking specifically for some of these specific labels? I've got a good example of, in my demo of a policy enforcement. So let's look at some sample labels. Now originally, this idea came out of label-schema.org. And then it was a modified to opencontainers, org.opencontainers.image. There is a link in my GitHub page that links to the full reference. But these are some of the labels that I like to use, just as kind of like a standardization. So obviously, Author's, an email address, so now the image is attributable to a person, that's always kind of good for security and reliability. Where's the source? Where's the version control that has the source, the Docker file and all the assets? How it was built, build number, build server the commit, we talked about, when it was created, a simple description. A fun one I like adding in is the healthZendpoint. Now obviously, the health check directive should be in the Docker file. But if you've got other systems that want to ping your applications, why not declare it and make it queryable? Image version, obviously, that's simple declarative And then a title. And then I've got the two fun ones. Remember, I talked about what if we could encode some fun things? Hypothetically, what if we could encode the Compose file of how to build the stack in the first image itself? And conversely the Kubernetes? Well, actually, you can and I have a demo to show you how to kind of take advantage of that. So how do we create labels? And really creating labels as a function of build time okay? You can't really add labels to an image after the fact. The way you do add labels is either through the Docker file, which I'm a big fan of, because it's declarative. It's in version control. It's kind of irrefutable, especially if you're tracking that commit number in a label. You can extend it from being a static kind of declaration to more a dynamic with build arguments. And I can show you, I'll show you in a little while how you can use a build argument at build time to pass in that variable. And then obviously, if you did it by hand, you could do a docker build--label key equals value. I'm not a big fan of the third one, I love the first one and obviously the second one. Being dynamic we can take advantage of some of the variables coming out of version control. Or I should say, some of the variables coming out of our CI system. And that way, it self documents effectively at build time, which is kind of cool. How do we view labels? Well, there's two major ways to view labels. The first one is obviously a docker pull and docker inspect. You can pull the image locally, you can inspect it, you can obviously, it's going to output as JSON. So you going to use something like JQ to crack it open and look at the individual labels. Another one which I found recently was Skopeo from Red Hat. This allows you to actually query the registry server. So you don't even have to pull the image initially. This can be really useful if you're on a really small development workstation, and you're trying to talk to a Kubernetes cluster and wanting to deploy apps kind of in a very simple manner. Okay? And this was that use case, right? Using Kubernetes, the Kubernetes demo. One of the interesting things about this is that you can base64 encode almost anything, push it in as text into a label and then base64 decode it, and then use it. So in this case, in my demo, I'll show you how we can actually use a kubectl apply piped from the base64 decode from the label itself from skopeo talking to the registry. And what's interesting about this kind of technique is you don't need to store Helm charts. You don't need to learn another language for your declarative automation, right? You don't need all this extra levels of abstraction inherently, if you use it as a label with a kubectl apply, It's just built in. It's kind of like the kiss approach to a certain extent. It does require some encoding when you actually build the image, but to me, it doesn't seem that hard. Okay, let's take a look at a demo. And what I'm going to do for my demo, before we actually get started is here's my repo. Here's a, let me actually go to the actual full repo. So here's the repo, right? And I've got my Jenkins pipeline 'cause I'm using Jenkins for this demo. And in my demo flask, I've got the Docker file. I've got my compose and my Kubernetes YAML. So let's take a look at the Docker file, right? So it's a simple Alpine image. The org statements are the build time arguments that are passed in. Label, so again, I'm using the org.opencontainers.image.blank, for most of them. There's a typo there. Let's see if you can find it, I'll show you it later. My source, build date, build number, commit. Build number and get commit are derived from the Jenkins itself, which is nice. I can just take advantage of existing URLs. I don't have to create anything crazy. And again, I've got my actual Docker build command. Now this is just a label on how to build it. And then here's my simple Python, APK upgrade, remove the package manager, kind of some security stuff, health check getting Python through, okay? Let's take a look at the Jenkins pipeline real quick. So here is my Jenkins pipeline and I have four major stages, four stages, I have built. And here in build, what I do is I actually do the Git clone. And then I do my docker build. From there, I actually tell the Jenkins StackRox plugin. So that's what I'm using for my security scanning. So go ahead and scan, basically, I'm staging it to scan the image. I'm pushing it to Hub, okay? Where I can see the, basically I'm pushing the image up to Hub so such that my StackRox security scanner can go ahead and scan the image. I'm kicking off the scan itself. And then if everything's successful, I'm pushing it to prod. Now what I'm doing is I'm just using the same image with two tags, pre-prod and prod. This is not exactly ideal, in your environment, you probably want to use separate registries and non-prod and a production registry, but for demonstration purposes, I think this is okay. So let's go over to my Jenkins and I've got a deliberate failure. And I'll show you why there's a reason for that. And let's go down. Let's look at my, so I have a StackRox report. Let's look at my report. And it says image required, required image label alert, right? Request that the maintainer, add the required label to the image, so we're missing a label, okay? One of the things we can do is let's flip over, and let's look at Skopeo. Right? I'm going to do this just the easy way. So instead of looking at org.zdocker, opencontainers.image.authors. Okay, see here it says build signature? That was the typo, we didn't actually pass in. So if we go back to our repo, we didn't pass in the the build time argument, we just passed in the word. So let's fix that real quick. That's the Docker file. Let's go ahead and put our dollar sign in their. First day with the fingers you going to love it. And let's go ahead and commit that. Okay? So now that that's committed, we can go back to Jenkins, and we can actually do another build. And there's number 12. And as you can see, I've been playing with this for a little bit today. And while that's running, come on, we can go ahead and look at the Console output. Okay, so there's our image. And again, look at all the build arguments that we're passing into the build statement. So we're passing in the date and the date gets derived on the command line. With the build arguments, there's the base64 encoded of the Compose file. Here's the base64 encoding of the Kubernetes YAML. We do the build. And then let's go down to the bottom layer exists and successful. So here's where we can see no system policy violations profound marking stack regimes security plugin, build step as successful, okay? So we're actually able to do policy enforcement that that image exists, that that label sorry, exists in the image. And again, we can look at the security report and there's no policy violations and no vulnerabilities. So that's pretty good for security, right? We can now enforce and mandate use of certain labels within our images. And let's flip back over to Skopeo, and let's go ahead and look at it. So we're looking at the prod version again. And there's it is in my email address. And that validated that that was valid for that policy. So that's kind of cool. Now, let's take it a step further. What if, let's go ahead and take a look at all of the image, all the labels for a second, let me remove the dash org, make it pretty. Okay? So we have all of our image labels. Again, author's build, commit number, look at the commit number. It was built today build number 12. We saw that right? Delete, build 12. So that's kind of cool dynamic labels. Name, healthz, right? But what we're looking for is we're going to look at the org.zdockerketers label. So let's go look at the label real quick. Okay, well that doesn't really help us because it's encoded but let's base64 dash D, let's decode it. And I need to put the dash r in there 'cause it doesn't like, there we go. So there's my Kubernetes YAML. So why can't we simply kubectl apply dash f? Let's just apply it from standard end. So now we've actually used that label. From the image that we've queried with skopeo, from a remote registry to deploy locally to our Kubernetes cluster. So let's go ahead and look everything's up and running, perfect. So what does that look like, right? So luckily, I'm using traefik for Ingress 'cause I love it. And I've got an object in my Kubernetes YAML called flask.doctor.life. That's my Ingress object for traefik. I can go to flask.docker.life. And I can hit refresh. Obviously, I'm not a very good web designer 'cause the background image in the text. We can go ahead and refresh it a couple times we've got Redis storing a hit counter. We can see that our server name is roundrobing. Okay? That's kind of cool. So let's kind of recap a little bit about my demo environment. So my demo environment, I'm using DigitalOcean, Ubuntu 19.10 Vms. I'm using K3s instead of full Kubernetes either full Rancher, full Open Shift or Docker Enterprise. I think K3s has some really interesting advantages on the development side and it's kind of intended for IoT but it works really well and it deploys super easy. I'm using traefik for Ingress. I love traefik. I may or may not be a traefik ambassador. I'm using Jenkins for CI. And I'm using StackRox for image scanning and policy enforcement. One of the things to think about though, especially in terms of labels is none of this demo stack is required. You can be in any cloud, you can be in CentOs, you can be in any Kubernetes. You can even be in swarm, if you wanted to, or Docker compose. Any Ingress, any CI system, Jenkins, circle, GitLab, it doesn't matter. And pretty much any scanning. One of the things that I think is kind of nice about at least StackRox is that we do a lot more than just image scanning, right? With the policy enforcement things like that. I guess that's kind of a shameless plug. But again, any of this stack is completely replaceable, with any comparative product in that category. So I'd like to, again, point you guys to the andyc.infodc20, that's take you right to the GitHub repo. You can reach out to me at any of the socials @clemenko or andy@stackrox.com. And thank you for attending. I hope you learned something fun about labels. And hopefully you guys can standardize labels in your organization and really kind of take your images and the image provenance to a new level. Thanks for watching. (upbeat music) >> Narrator: Live from Las Vegas It's theCUBE. Covering AWS re:Invent 2019. Brought to you by Amazon Web Services and Intel along with it's ecosystem partners. >> Okay, welcome back everyone theCUBE's live coverage of AWS re:Invent 2019. This is theCUBE's 7th year covering Amazon re:Invent. It's their 8th year of the conference. I want to just shout out to Intel for their sponsorship for these two amazing sets. Without their support we wouldn't be able to bring our mission of great content to you. I'm John Furrier. Stu Miniman. We're here with the chief of AWS, the chief executive officer Andy Jassy. Tech athlete in and of himself three hour Keynotes. Welcome to theCUBE again, great to see you. >> Great to be here, thanks for having me guys. >> Congratulations on a great show a lot of great buzz. >> Andy: Thank you. >> A lot of good stuff. Your Keynote was phenomenal. You get right into it, you giddy up right into it as you say, three hours, thirty announcements. You guys do a lot, but what I liked, the new addition, the last year and this year is the band; house band. They're pretty good. >> Andy: They're good right? >> They hit the queen notes, so that keeps it balanced. So we're going to work on getting a band for theCUBE. >> Awesome. >> So if I have to ask you, what's your walk up song, what would it be? >> There's so many choices, it depends on what kind of mood I'm in. But, uh, maybe Times Like These by the Foo Fighters. >> John: Alright. >> These are unusual times right now. >> Foo Fighters playing at the Amazon Intersect Show. >> Yes they are. >> Good plug Andy. >> Headlining. >> Very clever >> Always getting a good plug in there. >> My very favorite band. Well congratulations on the Intersect you got a lot going on. Intersect is a music festival, I'll get to that in a second But, I think the big news for me is two things, obviously we had a one-on-one exclusive interview and you laid out, essentially what looks like was going to be your Keynote, and it was. Transformation- >> Andy: Thank you for the practice. (Laughter) >> John: I'm glad to practice, use me anytime. >> Yeah. >> And I like to appreciate the comments on Jedi on the record, that was great. But I think the transformation story's a very real one, but the NFL news you guys just announced, to me, was so much fun and relevant. You had the Commissioner of NFL on stage with you talking about a strategic partnership. That is as top down, aggressive goal as you could get to have Rodger Goodell fly to a tech conference to sit with you and then bring his team talk about the deal. >> Well, ya know, we've been partners with the NFL for a while with the Next Gen Stats that they use on all their telecasts and one of the things I really like about Roger is that he's very curious and very interested in technology and the first couple times I spoke with him he asked me so many questions about ways the NFL might be able to use the Cloud and digital transformation to transform their various experiences and he's always said if you have a creative idea or something you think that could change the world for us, just call me he said or text me or email me and I'll call you back within 24 hours. And so, we've spent the better part of the last year talking about a lot of really interesting, strategic ways that they can evolve their experience both for fans, as well as their players and the Player Health and Safety Initiative, it's so important in sports and particularly important with the NFL given the nature of the sport and they've always had a focus on it, but what you can do with computer vision and machine learning algorithms and then building a digital athlete which is really like a digital twin of each athlete so you understand, what does it look like when they're healthy and compare that when it looks like they may not be healthy and be able to simulate all kinds of different combinations of player hits and angles and different plays so that you could try to predict injuries and predict the right equipment you need before there's a problem can be really transformational so we're super excited about it. >> Did you guys come up with the idea or was it a collaboration between them? >> It was really a collaboration. I mean they, look, they are very focused on players safety and health and it's a big deal for their- you know, they have two main constituents the players and fans and they care deeply about the players and it's a-it's a hard problem in a sport like Football, I mean, you watch it. >> Yeah, and I got to say it does point out the use cases of what you guys are promoting heavily at the show here of the SageMaker Studio, which was a big part of your Keynote, where they have all this data. >> Andy: Right. >> And they're data hoarders, they hoard data but the manual process of going through the data was a killer problem. This is consistent with a lot of the enterprises that are out there, they have more data than they even know. So this seems to be a big part of the strategy. How do you get the customers to actually wake up to the fact that they got all this data and how do you tie that together? >> I think in almost every company they know they have a lot of data. And there are always pockets of people who want to do something with it. But, when you're going to make these really big leaps forward; these transformations, the things like Volkswagen is doing where they're reinventing their factories and their manufacturing process or the NFL where they're going to radically transform how they do players uh, health and safety. It starts top down and if the senior leader isn't convicted about wanting to take that leap forward and trying something different and organizing the data differently and organizing the team differently and using machine learning and getting help from us and building algorithms and building some muscle inside the company it just doesn't happen because it's not in the normal machinery of what most companies do. And so it always, almost always, starts top down. Sometimes it can be the Commissioner or CEO sometimes it can be the CIO but it has to be senior level conviction or it doesn't get off the ground. >> And the business model impact has to be real. For NFL, they know concussions, hurting their youth pipe-lining, this is a huge issue for them. This is their business model. >> They lose even more players to lower extremity injuries. And so just the notion of trying to be able to predict injuries and, you know, the impact it can have on rules and the impact it can have on the equipment they use, it's a huge game changer when they look at the next 10 to 20 years. >> Alright, love geeking out on the NFL but Andy, you know- >> No more NFL talk? >> Off camera how about we talk? >> Nobody talks about the Giants being 2 and 10. >> Stu: We're both Patriots fans here. >> People bring up the undefeated season. >> So Andy- >> Everybody's a Patriot's fan now. (Laughter) >> It's fascinating to watch uh, you and your three hour uh, Keynote, uh Werner in his you know, architectural discussion, really showed how AWS is really extending its reach, you know, it's not just a place. For a few years people have been talking about you know, Cloud is an operational model its not a destination or a location but, I felt it really was laid out is you talked about Breadth and Depth and Werner really talked about you know, Architectural differentiation. People talk about Cloud, but there are very-there are a lot of differences between the vision for where things are going. Help us understand why, I mean, Amazon's vision is still a bit different from what other people talk about where this whole Cloud expansion, journey, put ever what tag or label you want on it but you know, the control plane and the technology that you're building and where you see that going. >> Well I think that, we've talked about this a couple times we have two macro types of customers. We have those that really want to get at the low level building blocks and stitch them together creatively however they see fit to create whatever's in their-in their heads. And then we have the second segment of customers that say look, I'm willing to give up some of that flexibility in exchange for getting 80% of the way there much faster. In an abstraction that's different from those low level building blocks. And both segments of builders we want to serve and serve well and so we've built very significant offerings in both areas. I think when you look at microservices um, you know, some of it has to do with the fact that we have this very strongly held belief born out of several years of Amazon where you know, the first 7 or 8 years of Amazon's consumer business we basically jumbled together all of the parts of our technology in moving really quickly and when we wanted to move quickly where you had to impact multiple internal development teams it was so long because it was this big ball, this big monolithic piece. And we got religion about that in trying to move faster in the consumer business and having to tease those pieces apart. And it really was a lot of impetus behind conceiving AWS where it was these low level, very flexible building blocks that6 don't try and make all the decisions for customers they get to make them themselves. And some of the microservices that you saw Werner talking about just, you know, for instance, what we-what we did with Nitro or even what we did with Firecracker those are very much about us relentlessly working to continue to uh, tease apart the different components. And even things that look like low level building blocks over time, you build more and more features and all of the sudden you realize they have a lot of things that are combined together that you wished weren't that slow you down and so, Nitro was a completely re imagining of our Hypervisor and Virtualization layer to allow us, both to let customers have better performance but also to let us move faster and have a better security story for our customers. >> I got to ask you the question around transformation because I think that all points, all the data points, you got all the references, Goldman Sachs on stage at the Keynote, Cerner, I mean healthcare just is an amazing example because I mean, that's demonstrating real value there there's no excuse. I talked to someone who wouldn't be named last night, in and around the area said, the CIA has a cost bar like this a cost-a budget like this but the demand for mission based apps is going up exponentially, so there's need for the Cloud. And so, you see more and more of that. What is your top down, aggressive goals to fill that solution base because you're also a very transformational thinker; what is your-what is your aggressive top down goals for your organization because you're serving a market with trillions of dollars of spend that's shifting, that's on the table. >> Yeah. >> A lot of competition now sees it too, they're going to go after it. But at the end of the day you have customers that have a demand for things, apps. >> Andy: Yeah. >> And not a lot of budget increase at the same time. This is a huge dynamic. >> Yeah. >> John: What's your goals? >> You know I think that at a high level our top down aggressive goals are that we want every single customer who uses our platform to have an outstanding customer experience. And we want that outstanding customer experience in part is that their operational performance and their security are outstanding, but also that it allows them to build, uh, build projects and initiatives that change their customer experience and allow them to be a sustainable successful business over a long period of time. And then, we also really want to be the technology infrastructure platform under all the applications that people build. And we're realistic, we know that you know, the market segments we address with infrastructure, software, hardware, and data center services globally are trillions of dollars in the long term and it won't only be us, but we have that goal of wanting to serve every application and that requires not just the security operational premise but also a lot of functionality and a lot of capability. We have by far the most amount of capability out there and yet I would tell you, we have 3 to 5 years of items on our roadmap that customers want us to add. And that's just what we know today. >> And Andy, underneath the covers you've been going through some transformation. When we talked a couple of years ago, about how serverless is impacting things I've heard that that's actually, in many ways, glue behind the two pizza teams to work between organizations. Talk about how the internal transformations are happening. How that impacts your discussions with customers that are going through that transformation. >> Well, I mean, there's a lot of- a lot of the technology we build comes from things that we're doing ourselves you know? And that we're learning ourselves. It's kind of how we started thinking about microservices, serverless too, we saw the need, you know, we would have we would build all these functions that when some kind of object came into an object store we would spin up, compute, all those tasks would take like, 3 or 4 hundred milliseconds then we'd spin it back down and yet, we'd have to keep a cluster up in multiple availability zones because we needed that fault tolerance and it was- we just said this is wasteful and, that's part of how we came up with Lambda and you know, when we were thinking about Lambda people understandably said, well if we build Lambda and we build this serverless adventure in computing a lot of people were keeping clusters of instances aren't going to use them anymore it's going to lead to less absolute revenue for us. But we, we have learned this lesson over the last 20 years at Amazon which is, if it's something that's good for customers you're much better off cannibalizing yourself and doing the right thing for customers and being part of shaping something. And I think if you look at the history of technology you always build things and people say well, that's going to cannibalize this and people are going to spend less money, what really ends up happening is they spend less money per unit of compute but it allows them to do so much more that they ultimately, long term, end up being more significant customers. >> I mean, you are like beating the drum all the time. Customers, what they say, we encompass the roadmap, I got that you guys have that playbook down, that's been really successful for you. >> Andy: Yeah. >> Two years ago you told me machine learning was really important to you because your customers told you. What's the next traunch of importance for customers? What's on top of mind now, as you, look at- >> Andy: Yeah. >> This re:Invent kind of coming to a close, Replay's tonight, you had conversations, you're a tech athlete, you're running around, doing speeches, talking to customers. What's that next hill from if it's machine learning today- >> There's so much I mean, (weird background noise) >> It's not a soup question (Laughter) And I think we're still in the very early days of machine learning it's not like most companies have mastered it yet even though they're using it much more then they did in the past. But, you know, I think machine learning for sure I think the Edge for sure, I think that um, we're optimistic about Quantum Computing even though I think it'll be a few years before it's really broadly useful. We're very um, enthusiastic about robotics. I think the amount of functions that are going to be done by these- >> Yeah. >> robotic applications are much more expansive than people realize. It doesn't mean humans won't have jobs, they're just going to work on things that are more value added. We're believers in augmented virtual reality, we're big believers in what's going to happen with Voice. And I'm also uh, I think sometimes people get bored you know, I think you're even bored with machine learning already >> Not yet. >> People get bored with the things you've heard about but, I think just what we've done with the Chips you know, in terms of giving people 40% better price performance in the latest generation of X86 processors. It's pretty unbelievable in the difference in what people are going to be able to do. Or just look at big data I mean, big data, we haven't gotten through big data where people have totally solved it. The amount of data that companies want to store, process, analyze, is exponentially larger than it was a few years ago and it will, I think, exponentially increase again in the next few years. You need different tools and services. >> Well I think we're not bored with machine learning we're excited to get started because we have all this data from the video and you guys got SageMaker. >> Andy: Yeah. >> We call it the stairway to machine learning heaven. >> Andy: Yeah. >> You start with the data, move up, knock- >> You guys are very sophisticated with what you do with technology and machine learning and there's so much I mean, we're just kind of, again, in such early innings. And I think that, it was so- before SageMaker, it was so hard for everyday developers and data scientists to build models but the combination of SageMaker and what's happened with thousands of companies standardizing on it the last two years, plus now SageMaker studio, giant leap forward. >> Well, we hope to use the data to transform our experience with our audience. And we're on Amazon Cloud so we really appreciate that. >> Andy: Yeah. >> And appreciate your support- >> Andy: Yeah, of course. >> John: With Amazon and get that machine learning going a little faster for us, that would be better. >> If you have requests I'm interested, yeah. >> So Andy, you talked about that you've got the customers that are builders and the customers that need simplification. Traditionally when you get into the, you know, the heart of the majority of adoption of something you really need to simplify that environment. But when I think about the successful enterprise of the future, they need to be builders. how'l I normally would've said enterprise want to pay for solutions because they don't have the skill set but, if they're going to succeed in this new economy they need to go through that transformation >> Andy: Yeah. >> That you talk to, so, I mean, are we in just a total new era when we look back will this be different than some of these previous waves? >> It's a really good question Stu, and I don't think there's a simple answer to it. I think that a lot of enterprises in some ways, I think wish that they could just skip the low level building blocks and only operate at that higher level abstraction. That's why people were so excited by things like, SageMaker, or CodeGuru, or Kendra, or Contact Lens, these are all services that allow them to just send us data and then run it on our models and get back the answers. But I think one of the big trends that we see with enterprises is that they are taking more and more of their development in house and they are wanting to operate more and more like startups. I think that they admire what companies like AirBnB and Pintrest and Slack and Robinhood and a whole bunch of those companies, Stripe, have done and so when, you know, I think you go through these phases and eras where there are waves of success at different companies and then others want to follow that success and replicate it. And so, we see more and more enterprises saying we need to take back a lot of that development in house. And as they do that, and as they add more developers those developers in most cases like to deal with the building blocks. And they have a lot of ideas on how they can creatively stich them together. >> Yeah, on that point, I want to just quickly ask you on Amazon versus other Clouds because you made a comment to me in our interview about how hard it is to provide a service to other people. And it's hard to have a service that you're using yourself and turn that around and the most quoted line of my story was, the compression algorithm- there's no compression algorithm for experience. Which to me, is the diseconomies of scale for taking shortcuts. >> Andy: Yeah. And so I think this is a really interesting point, just add some color commentary because I think this is a fundamental difference between AWS and others because you guys have a trajectory over the years of serving, at scale, customers wherever they are, whatever they want to do, now you got microservices. >> Yeah. >> John: It's even more complex. That's hard. >> Yeah. >> John: Talk about that. >> I think there are a few elements to that notion of there's no compression algorithm for experience and I think the first thing to know about AWS which is different is, we just come from a different heritage and a different background. We ran a business for a long time that was our sole business that was a consumer retail business that was very low margin. And so, we had to operate at very large scale given how many people were using us but also, we had to run infrastructure services deep in the stack, compute storage and database, and reliable scalable data centers at very low cost and margins. And so, when you look at our business it actually, today, I mean its, its a higher margin business in our retail business, its a lower margin business in software companies but at real scale, it's a high volume, relatively low margin business. And the way that you have to operate to be successful with those businesses and the things you have to think about and that DNA come from the type of operators we have to be in our consumer retail business. And there's nobody else in our space that does that. So, you know, the way that we think about costs, the way we think about innovation in the data center, um, and I also think the way that we operate services and how long we've been operating services as a company its a very different mindset than operating package software. Then you look at when uh, you think about some of the uh, issues in very large scale Cloud, you can't learn some of those lessons until you get to different elbows of the curve and scale. And so what I was telling you is, its really different to run your own platform for your own users where you get to tell them exactly how its going to be done. But that's not the way the real world works. I mean, we have millions of external customers who use us from every imaginable country and location whenever they want, without any warning, for lots of different use cases, and they have lots of design patterns and we don't get to tell them what to do. And so operating a Cloud like that, at a scale that's several times larger than the next few providers combined is a very different endeavor and a very different operating rigor. >> Well you got to keep raising the bar you guys do a great job, really impressed again. Another tsunami of announcements. In fact, you had to spill the beans earlier with Quantum the day before the event. Tight schedule. I got to ask you about the musical festival because, I think this is a very cool innovation. It's the inaugural Intersect conference. >> Yes. >> John: Which is not part of Replay, >> Yes. >> John: Which is the concert tonight. Its a whole new thing, big music act, you're a big music buff, your daughter's an artist. Why did you do this? What's the purpose? What's your goal? >> Yeah, it's an experiment. I think that what's happened is that re:Invent has gotten so big, we have 65 thousand people here, that to do the party, which we do every year, its like a 35-40 thousand person concert now. Which means you have to have a location that has multiple stages and, you know, we thought about it last year and when we were watching it and we said, we're kind of throwing, like, a 4 hour music festival right now. There's multiple stages, and its quite expensive to set up that set for a party and we said well, maybe we don't have to spend all that money for 4 hours and then rip it apart because actually the rent to keep those locations for another two days is much smaller than the cost of actually building multiple stages and so we thought we would try it this year. We're very passionate about music as a business and I think we-I think our customers feel like we've thrown a pretty good music party the last few years and we thought we would try it at a larger scale as an experiment. And if you look at the economics- >> At the headliners real quick. >> The Foo Fighters are headlining on Saturday night, Anderson Paak and the Free Nationals, Brandi Carlile, Shawn Mullins, um, Willy Porter, its a good set. Friday night its Beck and Kacey Musgraves so it's a really great set of um, about thirty artists and we're hopeful that if we can build a great experience that people will want to attend that we can do it at scale and it might be something that both pays for itself and maybe, helps pay for re:Invent too overtime and you know, I think that we're also thinking about it as not just a music concert and festival the reason we named it Intersect is that we want an intersection of music genres and people and ethnicities and age groups and art and technology all there together and this will be the first year we try it, its an experiment and we're really excited about it. >> Well I'm gone, congratulations on all your success and I want to thank you we've been 7 years here at re:Invent we've been documenting the history. You got two sets now, one set upstairs. So appreciate you. >> theCUBE is part of re:Invent, you know, you guys really are apart of the event and we really appreciate your coming here and I know people appreciate the content you create as well. >> And we just launched CUBE365 on Amazon Marketplace built on AWS so thanks for letting us- >> Very cool >> John: Build on the platform. appreciate it. >> Thanks for having me guys, I appreciate it. >> Andy Jassy the CEO of AWS here inside theCUBE, it's our 7th year covering and documenting the thunderous innovation that Amazon's doing they're really doing amazing work building out the new technologies here in the Cloud computing world. I'm John Furrier, Stu Miniman, be right back with more after this short break. (Outro music)
SUMMARY :
at org the org to the andyc and it was. of time. That's hard. I think that
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Andy Clemenko | PERSON | 0.99+ |
Andy | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
CIA | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
John | PERSON | 0.99+ |
3 | QUANTITY | 0.99+ |
StackRox | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
4 hours | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Volkswagen | ORGANIZATION | 0.99+ |
Rodger Goodell | PERSON | 0.99+ |
AirBnB | ORGANIZATION | 0.99+ |
Roger | PERSON | 0.99+ |
40% | QUANTITY | 0.99+ |
Brandi Carlile | PERSON | 0.99+ |
Pintrest | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
two days | QUANTITY | 0.99+ |
4 hour | QUANTITY | 0.99+ |
7th year | QUANTITY | 0.99+ |
Willy Porter | PERSON | 0.99+ |
Friday night | DATE | 0.99+ |
andy@stackrox.com | OTHER | 0.99+ |
7 years | QUANTITY | 0.99+ |
Goldman Sachs | ORGANIZATION | 0.99+ |
two tags | QUANTITY | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
Foo Fighters | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
Giants | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
andyc.info/dc20 | OTHER | 0.99+ |
65 thousand people | QUANTITY | 0.99+ |
Saturday night | DATE | 0.99+ |
Slack | ORGANIZATION | 0.99+ |
two sets | QUANTITY | 0.99+ |
flask.docker.life | OTHER | 0.99+ |
Werner | PERSON | 0.99+ |
two things | QUANTITY | 0.99+ |
Shawn Mullins | PERSON | 0.99+ |
Robinhood | ORGANIZATION | 0.99+ |
Intersect | ORGANIZATION | 0.99+ |
thousands | QUANTITY | 0.99+ |
Kacey Musgraves | PERSON | 0.99+ |
4 hundred milliseconds | QUANTITY | 0.99+ |
first image | QUANTITY | 0.99+ |
Andy
>> Hi, my name is Andy Clemenko. I'm a Senior Solutions Engineer at StackRox. Thanks for joining us today for my talk on labels, labels, labels. Obviously, you can reach me at all the socials. Before we get started, I like to point you to my GitHub repo, you can go to andyc.info/dc20, and it'll take you to my GitHub page where I've got all of this documentation, I've got the Keynote file there. YAMLs, I've got Dockerfiles, Compose files, all that good stuff. If you want to follow along, great, if not go back and review later, kind of fun. So let me tell you a little bit about myself. I am a former DOD contractor. This is my seventh DockerCon. I've spoken, I had the pleasure to speak at a few of them, one even in Europe. I was even a Docker employee for quite a number of years, providing solutions to the federal government and customers around containers and all things Docker. So I've been doing this a little while. One of the things that I always found interesting was the lack of understanding around labels. So why labels, right? Well, as a former DOD contractor, I had built out a large registry. And the question I constantly got was, where did this image come from? How did you get it? What's in it? Where did it come from? How did it get here? And one of the things we did to kind of alleviate some of those questions was we established a baseline set of labels. Labels really are designed to provide as much metadata around the image as possible. I ask everyone in attendance, when was the last time you pulled an image and had 100% confidence, you knew what was inside it, where it was built, how it was built, when it was built, you probably didn't, right? The last thing we obviously want is a container fire, like our image on the screen. And one kind of interesting way we can kind of prevent that is through the use of labels. We can use labels to address security, address some of the simplicity on how to run these images. So think of it, kind of like self documenting, Think of it also as an audit trail, image provenance, things like that. These are some interesting concepts that we can definitely mandate as we move forward. What is a label, right? Specifically what is the Schema? It's just a key-value. All right? It's any key and pretty much any value. What if we could dump in all kinds of information? What if we could encode things and store it in there? And I've got a fun little demo to show you about that. Let's start off with some of the simple keys, right? Author, date, description, version. Some of the basic information around the image. That would be pretty useful, right? What about specific labels for CI? What about a, where's the version control? Where's the source, right? Whether it's Git, whether it's GitLab, whether it's GitHub, whether it's Gitosis, right? Even SPN, who cares? Where are the source files that built, where's the Docker file that built this image? What's the commit number? That might be interesting in terms of tracking the resulting image to a person or to a commit, hopefully then to a person. How is it built? What if you wanted to play with it and do a git clone of the repo and then build the Docker file on your own? Having a label specifically dedicated on how to build this image might be interesting for development work. Where it was built, and obviously what build number, right? These kind of all, not only talk about continuous integration, CI but also start to talk about security. Specifically what server built it. The version control number, the version number, the commit number, again, how it was built. What's the specific build number? What was that job number in, say, Jenkins or GitLab? What if we could take it a step further? What if we could actually apply policy enforcement in the build pipeline, looking specifically for some of these specific labels? I've got a good example of, in my demo of a policy enforcement. So let's look at some sample labels. Now originally, this idea came out of label-schema.org. And then it was a modified to opencontainers, org.opencontainers.image. There is a link in my GitHub page that links to the full reference. But these are some of the labels that I like to use, just as kind of like a standardization. So obviously, Author's, an email address, so now the image is attributable to a person, that's always kind of good for security and reliability. Where's the source? Where's the version control that has the source, the Docker file and all the assets? How it was built, build number, build server the commit, we talked about, when it was created, a simple description. A fun one I like adding in is the healthZendpoint. Now obviously, the health check directive should be in the Docker file. But if you've got other systems that want to ping your applications, why not declare it and make it queryable? Image version, obviously, that's simple declarative And then a title. And then I've got the two fun ones. Remember, I talked about what if we could encode some fun things? Hypothetically, what if we could encode the Compose file of how to build the stack in the first image itself? And conversely the Kubernetes? Well, actually, you can and I have a demo to show you how to kind of take advantage of that. So how do we create labels? And really creating labels as a function of build time okay? You can't really add labels to an image after the fact. The way you do add labels is either through the Docker file, which I'm a big fan of, because it's declarative. It's in version control. It's kind of irrefutable, especially if you're tracking that commit number in a label. You can extend it from being a static kind of declaration to more a dynamic with build arguments. And I can show you, I'll show you in a little while how you can use a build argument at build time to pass in that variable. And then obviously, if you did it by hand, you could do a docker build--label key equals value. I'm not a big fan of the third one, I love the first one and obviously the second one. Being dynamic we can take advantage of some of the variables coming out of version control. Or I should say, some of the variables coming out of our CI system. And that way, it self documents effectively at build time, which is kind of cool. How do we view labels? Well, there's two major ways to view labels. The first one is obviously a docker pull and docker inspect. You can pull the image locally, you can inspect it, you can obviously, it's going to output as JSON. So you going to use something like JQ to crack it open and look at the individual labels. Another one which I found recently was Skopeo from Red Hat. This allows you to actually query the registry server. So you don't even have to pull the image initially. This can be really useful if you're on a really small development workstation, and you're trying to talk to a Kubernetes cluster and wanting to deploy apps kind of in a very simple manner. Okay? And this was that use case, right? Using Kubernetes, the Kubernetes demo. One of the interesting things about this is that you can base64 encode almost anything, push it in as text into a label and then base64 decode it, and then use it. So in this case, in my demo, I'll show you how we can actually use a kubectl apply piped from the base64 decode from the label itself from skopeo talking to the registry. And what's interesting about this kind of technique is you don't need to store Helm charts. You don't need to learn another language for your declarative automation, right? You don't need all this extra levels of abstraction inherently, if you use it as a label with a kubectl apply, It's just built in. It's kind of like the kiss approach to a certain extent. It does require some encoding when you actually build the image, but to me, it doesn't seem that hard. Okay, let's take a look at a demo. And what I'm going to do for my demo, before we actually get started is here's my repo. Here's a, let me actually go to the actual full repo. So here's the repo, right? And I've got my Jenkins pipeline 'cause I'm using Jenkins for this demo. And in my demo flask, I've got the Docker file. I've got my compose and my Kubernetes YAML. So let's take a look at the Docker file, right? So it's a simple Alpine image. The org statements are the build time arguments that are passed in. Label, so again, I'm using the org.opencontainers.image.blank, for most of them. There's a typo there. Let's see if you can find it, I'll show you it later. My source, build date, build number, commit. Build number and get commit are derived from the Jenkins itself, which is nice. I can just take advantage of existing URLs. I don't have to create anything crazy. And again, I've got my actual Docker build command. Now this is just a label on how to build it. And then here's my simple Python, APK upgrade, remove the package manager, kind of some security stuff, health check getting Python through, okay? Let's take a look at the Jenkins pipeline real quick. So here is my Jenkins pipeline and I have four major stages, four stages, I have built. And here in build, what I do is I actually do the Git clone. And then I do my docker build. From there, I actually tell the Jenkins StackRox plugin. So that's what I'm using for my security scanning. So go ahead and scan, basically, I'm staging it to scan the image. I'm pushing it to Hub, okay? Where I can see the, basically I'm pushing the image up to Hub so such that my StackRox security scanner can go ahead and scan the image. I'm kicking off the scan itself. And then if everything's successful, I'm pushing it to prod. Now what I'm doing is I'm just using the same image with two tags, pre-prod and prod. This is not exactly ideal, in your environment, you probably want to use separate registries and non-prod and a production registry, but for demonstration purposes, I think this is okay. So let's go over to my Jenkins and I've got a deliberate failure. And I'll show you why there's a reason for that. And let's go down. Let's look at my, so I have a StackRox report. Let's look at my report. And it says image required, required image label alert, right? Request that the maintainer, add the required label to the image, so we're missing a label, okay? One of the things we can do is let's flip over, and let's look at Skopeo. Right? I'm going to do this just the easy way. So instead of looking at org.zdocker, opencontainers.image.authors. Okay, see here it says build signature? That was the typo, we didn't actually pass in. So if we go back to our repo, we didn't pass in the the build time argument, we just passed in the word. So let's fix that real quick. That's the Docker file. Let's go ahead and put our dollar sign in their. First day with the fingers you going to love it. And let's go ahead and commit that. Okay? So now that that's committed, we can go back to Jenkins, and we can actually do another build. And there's number 12. And as you can see, I've been playing with this for a little bit today. And while that's running, come on, we can go ahead and look at the Console output. Okay, so there's our image. And again, look at all the build arguments that we're passing into the build statement. So we're passing in the date and the date gets derived on the command line. With the build arguments, there's the base64 encoded of the Compose file. Here's the base64 encoding of the Kubernetes YAML. We do the build. And then let's go down to the bottom layer exists and successful. So here's where we can see no system policy violations profound marking stack regimes security plugin, build step as successful, okay? So we're actually able to do policy enforcement that that image exists, that that label sorry, exists in the image. And again, we can look at the security report and there's no policy violations and no vulnerabilities. So that's pretty good for security, right? We can now enforce and mandate use of certain labels within our images. And let's flip back over to Skopeo, and let's go ahead and look at it. So we're looking at the prod version again. And there's it is in my email address. And that validated that that was valid for that policy. So that's kind of cool. Now, let's take it a step further. What if, let's go ahead and take a look at all of the image, all the labels for a second, let me remove the dash org, make it pretty. Okay? So we have all of our image labels. Again, author's build, commit number, look at the commit number. It was built today build number 12. We saw that right? Delete, build 12. So that's kind of cool dynamic labels. Name, healthz, right? But what we're looking for is we're going to look at the org.zdockerketers label. So let's go look at the label real quick. Okay, well that doesn't really help us because it's encoded but let's base64 dash D, let's decode it. And I need to put the dash r in there 'cause it doesn't like, there we go. So there's my Kubernetes YAML. So why can't we simply kubectl apply dash f? Let's just apply it from standard end. So now we've actually used that label. From the image that we've queried with skopeo, from a remote registry to deploy locally to our Kubernetes cluster. So let's go ahead and look everything's up and running, perfect. So what does that look like, right? So luckily, I'm using traefik for Ingress 'cause I love it. And I've got an object in my Kubernetes YAML called flask.doctor.life. That's my Ingress object for traefik. I can go to flask.docker.life. And I can hit refresh. Obviously, I'm not a very good web designer 'cause the background image in the text. We can go ahead and refresh it a couple times we've got Redis storing a hit counter. We can see that our server name is roundrobing. Okay? That's kind of cool. So let's kind of recap a little bit about my demo environment. So my demo environment, I'm using DigitalOcean, Ubuntu 19.10 Vms. I'm using K3s instead of full Kubernetes either full Rancher, full Open Shift or Docker Enterprise. I think K3s has some really interesting advantages on the development side and it's kind of intended for IoT but it works really well and it deploys super easy. I'm using traefik for Ingress. I love traefik. I may or may not be a traefik ambassador. I'm using Jenkins for CI. And I'm using StackRox for image scanning and policy enforcement. One of the things to think about though, especially in terms of labels is none of this demo stack is required. You can be in any cloud, you can be in CentOs, you can be in any Kubernetes. You can even be in swarm, if you wanted to, or Docker compose. Any Ingress, any CI system, Jenkins, circle, GitLab, it doesn't matter. And pretty much any scanning. One of the things that I think is kind of nice about at least StackRox is that we do a lot more than just image scanning, right? With the policy enforcement things like that. I guess that's kind of a shameless plug. But again, any of this stack is completely replaceable, with any comparative product in that category. So I'd like to, again, point you guys to the andyc.infodc20, that's take you right to the GitHub repo. You can reach out to me at any of the socials @clemenko or andy@stackrox.com. And thank you for attending. I hope you learned something fun about labels. And hopefully you guys can standardize labels in your organization and really kind of take your images and the image provenance to a new level. Thanks for watching. (upbeat music)
SUMMARY :
at org the org to the andyc
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Andy Clemenko | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
100% | QUANTITY | 0.99+ |
StackRox | ORGANIZATION | 0.99+ |
two tags | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
flask.docker.life | OTHER | 0.99+ |
andy@stackrox.com | OTHER | 0.99+ |
Andy | PERSON | 0.99+ |
andyc.info/dc20 | OTHER | 0.99+ |
Docker | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
flask.doctor.life | OTHER | 0.99+ |
third one | QUANTITY | 0.99+ |
Dockerfiles | TITLE | 0.99+ |
seventh | QUANTITY | 0.99+ |
Kubernetes | TITLE | 0.98+ |
first one | QUANTITY | 0.98+ |
second one | QUANTITY | 0.98+ |
label-schema.org | OTHER | 0.98+ |
One | QUANTITY | 0.98+ |
Keynote | TITLE | 0.98+ |
andyc.infodc20 | OTHER | 0.98+ |
first image | QUANTITY | 0.98+ |
First day | QUANTITY | 0.97+ |
CentOs | TITLE | 0.97+ |
StackRox | TITLE | 0.97+ |
Skopeo | ORGANIZATION | 0.96+ |
Red Hat | ORGANIZATION | 0.96+ |
Git | TITLE | 0.96+ |
Ubuntu 19.10 Vms | TITLE | 0.95+ |
one | QUANTITY | 0.95+ |
build 12 | OTHER | 0.95+ |
JQ | TITLE | 0.95+ |
base64 | TITLE | 0.93+ |
Jenkins | TITLE | 0.93+ |
build number 12 | OTHER | 0.91+ |
org.opencontainers.image. | OTHER | 0.91+ |
Ingress | ORGANIZATION | 0.89+ |
DOD | ORGANIZATION | 0.89+ |
opencontainers.image.authors. | OTHER | 0.89+ |
a second | QUANTITY | 0.89+ |
two major ways | QUANTITY | 0.89+ |
Jenkins StackRox | TITLE | 0.88+ |
Gitosis | TITLE | 0.86+ |
GitLab | ORGANIZATION | 0.86+ |
GitHub | ORGANIZATION | 0.86+ |
two fun ones | QUANTITY | 0.84+ |
GitLab | TITLE | 0.82+ |
skopeo | ORGANIZATION | 0.82+ |
Docker | TITLE | 0.81+ |
JSON | TITLE | 0.81+ |
traefik | TITLE | 0.77+ |
skopeo | TITLE | 0.76+ |
@clemenko | PERSON | 0.74+ |
Rancher | TITLE | 0.74+ |
Ingress | TITLE | 0.73+ |
org.zdocker | OTHER | 0.72+ |
Redis | TITLE | 0.72+ |
DigitalOcean | TITLE | 0.71+ |
org.opencontainers.image.blank | OTHER | 0.71+ |
Kuber | ORGANIZATION | 0.69+ |