Image Title

Search Results for Elastic MapReduce:

Dr. Matt Wood, AWS | AWS Summit SF 2022


 

(gentle melody) >> Welcome back to theCUBE's live coverage of AWS Summit in San Francisco, California. Events are back. AWS Summit in New York City this summer, theCUBE will be there as well. Check us out there. I'm glad to have events back. It's great to have of everyone here. I'm John Furrier, host of theCUBE. Dr. Matt Wood is with me, CUBE alumni, now VP of Business Analytics Division of AWS. Matt, great to see you. >> Thank you, John. It's great to be here. I appreciate it. >> I always call you Dr. Matt Wood because Andy Jackson always says, "Dr. Matt, we would introduce you on the arena." (Matt laughs) >> Matt: The one and only. >> The one and only, Dr. Matt Wood. >> In joke, I love it. (laughs) >> Andy style. (Matt laughs) I think you had walk up music too. >> Yes, we all have our own personalized walk up music. >> So talk about your new role, not a new role, but you're running the analytics business for AWS. What does that consist of right now? >> Sure. So I work. I've got what I consider to be one of the best jobs in the world. I get to work with our customers and the teams at AWS to build the analytics services that millions of our customers use to slice dice, pivot, better understand their data, look at how they can use that data for reporting, looking backwards. And also look at how they can use that data looking forward, so predictive analytics and machine learning. So whether it is slicing and dicing in the lower level of Hadoop and the big data engines, or whether you're doing ETL with Glue, or whether you're visualizing the data in QuickSight or building your models in SageMaker. I got my fingers in a lot of pies. >> One of the benefits of having CUBE coverage with AWS since 2013 is watching the progression. You were on theCUBE that first year we were at Reinvent in 2013, and look at how machine learning just exploded onto the scene. You were involved in that from day one. It's still day one, as you guys say. What's the big thing now? Look at just what happened. Machine learning comes in and then a slew of services come in. You've got SageMaker, became a hot seller right out of the gate. The database stuff was kicking butt. So all this is now booming. That was a real generational change over for database. What's the perspective? What's your perspective on that's evolved? >> I think it's a really good point. I totally agree. I think for machine learning, there's sort of a Renaissance in machine learning and the application of machine learning. Machine learning as a technology has been around for 50 years, let's say. But to do machine learning right, you need like a lot of data. The data needs to be high quality. You need a lot of compute to be able to train those models and you have to be able to evaluate what those models mean as you apply them to real world problems. And so the cloud really removed a lot of the constraints. Finally, customers had all of the data that they needed. We gave them services to be able to label that data in a high quality way. There's all the compute you need to be able to train the models. And so where you go? And so the cloud really enabled this Renaissance with machine learning. And we're seeing honestly a similar Renaissance with data and analytics. If you look back five to ten years, analytics was something you did in batch, your data warehouse ran an analysis to do reconciliation at the end of the month, and that was it. (John laughs) And so that's when you needed it. But today, if your Redshift cluster isn't available, Uber drivers don't turn up, DoorDash deliveries don't get made. Analytics is now central to virtually every business, and it is central to virtually every business's digital transformation. And being able to take that data from a variety of sources, be able to query it with high performance, to be able to actually then start to augment that data with real information, which usually comes from technical experts and domain experts to form wisdom and information from raw data. That's kind of what most organizations are trying to do when they kind of go through this analytics journey. >> It's interesting. Dave Velanta and I always talk on theCUBE about the future. And you look back, the things we're talking about six years ago are actually happening now. And it's not hyped up statement to say digital transformation is actually happening now. And there's also times when we bang our fists on the table saying, say, "I really think this is so important." And David says, "John, you're going to die on that hill." (Matt laughs) And so I'm excited that this year, for the first time, I didn't die on that hill. I've been saying- >> Do all right. >> Data as code is the next infrastructure as code. And Dave's like, "What do you mean by that?" We're talking about how data gets... And it's happening. So we just had an event on our AWS startups.com site, a showcase for startups, and the theme was data as code. And interesting new trends emerging really clearly, the role of a data engineer, right? Like an SRE, what an SRE did for cloud, you have a new data engineering role because of the developer onboarding is massively increasing, exponentially, new developers. Data science scientists are growing, but the pipelining and managing and engineering as a system, almost like an operating system. >> Kind of as a discipline. >> So what's your reaction to that about this data engineer, data as code? Because if you have horizontally scalable data, you've got to be open, that's hard (laughs), okay? And you got to silo the data that needs to be siloed for compliance and reason. So that's a big policy around that. So what's your reaction to data's code and the data engineering phenomenon? >> It's a really good point. I think with any technology project inside of an organization, success with analytics or machine learning, it's kind of 50% technology and then 50% cultural. And you have often domain experts. Those could be physicians or drug design experts, or they could be financial experts or whoever they might be, got deep domain expertise, and then you've got technical implementation teams. And there's kind of a natural, often repulsive force. I don't mean that rudely, but they just don't talk the same language. And so the more complex a domain and the more complex the technology, the stronger their repulsive force. And it can become very difficult for domain experts to work closely with the technical experts to be able to actually get business decisions made. And so what data engineering does and data engineering is, in some cases a team, or it can be a role that you play. It's really allowing those two disciplines to speak the same language. You can think of it as plumbing, but I think of it as like a bridge. It's a bridge between the technical implementation and the domain experts, and that requires a very disparate range of skills. You've got to understand about statistics, you've got to understand about the implementation, you got to understand about the data, you got to understand about the domain. And if you can put all of that together, that data engineering discipline can be incredibly transformative for an organization because it builds the bridge between those two groups. >> I was advising some young computer science students at the sophomore, junior level just a couple of weeks ago, and I told them I would ask someone at Amazon this question. So I'll ask you, >> Matt: Okay. since you've been in the middle of it for years, they were asking me, and I was trying to mentor them on how do you become a data engineer, from a practical standpoint? Courseware, projects to work on, how to think, not just coding Python, because everyone's coding in Python, but what else can they do? So I was trying to help them. I didn't really know the answer myself. I was just trying to kind of help figure it out with them. So what is the answer, in your opinion, or the thoughts around advice to young students who want to be data engineers? Because data scientists is pretty clear on what that is. You use tools, you make visualizations, you manage data, you get answers and insights and then apply that to the business. That's an application. That's not the standing up a stack or managing the infrastructure. So what does that coding look like? What would your advice be to folks getting into a data engineering role? >> Yeah, I think if you believe this, what I said earlier about 50% technology, 50 % culture, the number one technology to learn as a data engineer is the tools in the cloud which allow you to aggregate data from virtually any source into something which is incrementally more valuable for the organization. That's really what data engineering is all about. It's about taking from multiple sources. Some people call them silos, but silos indicates that the storage is kind of fungible or undifferentiated. That's really not the case. Success requires you to have really purpose built, well crafted, high performance, low cost engines for all of your data. So understanding those tools and understanding how to use them, that's probably the most important technical piece. Python and programming and statistics go along with that, I think. And then the most important cultural part, I think is... It's just curiosity. You want to be able to, as a data engineer, you want to have a natural curiosity that drives you to seek the truth inside an organization, seek the truth of a particular problem, and to be able to engage because probably you're going to some choice as you go through your career about which domain you end up in. Maybe you're really passionate about healthcare, or you're really just passionate about transportation or media, whatever it might be. And you can allow that to drive a certain amount of curiosity. But within those roles, the domains are so broad you kind of got to allow your curiosity to develop and lead you to ask the right questions and engage in the right way with your teams, because you can have all the technical skills in the world. But if you're not able to help the team's truth seek through that curiosity, you simply won't be successful. >> We just had a guest, 20 year old founder, Johnny Dallas who was 16 when he worked at Amazon. Youngest engineer- >> Johnny Dallas is a great name, by the way. (both chuckle) >> It's his real name. It sounds like a football player. >> That's awesome. >> Rock star. Johnny CUBE, it's me. But he's young and he was saying... His advice was just do projects. >> Matt: And get hands on. Yeah. >> And I was saying, hey, I came from the old days where you get to stand stuff up and you hugged on for the assets because you didn't want to kill the project because you spent all this money. And he's like, yeah, with cloud you can shut it down. If you do a project that's not working and you get bad data no one's adopting it or you don't like it anymore, you shut it down, just something else. >> Yeah, totally. >> Instantly abandon it and move on to something new. That's a progression. >> Totally! The blast radius of decisions is just way reduced. We talk a lot about in the old world, trying to find the resources and get the funding is like, all right, I want to try out this kind of random idea that could be a big deal for the organization. I need $50 million and a new data center. You're not going to get anywhere. >> And you do a proposal, working backwards, documents all kinds of stuff. >> All that sort of stuff. >> Jump your hoops. >> So all of that is gone. But we sometimes forget that a big part of that is just the prototyping and the experimentation and the limited blast radius in terms of cost, and honestly, the most important thing is time, just being able to jump in there, fingers on keyboards, just try this stuff out. And that's why at AWS, we have... Part of the reason we have so many services, because we want, when you get into AWS, we want the whole toolbox to be available to every developer. And so as your ideas develop, you may want to jump from data that you have that's already in a database to doing realtime data. And then you have the tools there. And when you want to get into real time data, you don't just have kinesis, you have real time analytics, and you can run SQL against that data. The capabilities and the breadth really matter when it comes to prototyping. >> That's the culture piece, because what was once a dysfunctional behavior. I'm going to go off the reservation and try something behind my boss' back, now is a side hustle or fun project. So for fun, you can just code something. >> Yeah, totally. I remember my first Hadoop projects. I found almost literally a decommissioned set of servers in the data center that no one was using. They were super old. They're about to be literally turned off. And I managed to convince the team to leave them on for me for another month. And I installed Hadoop on them and got them going. That just seems crazy to me now that I had to go and convince anybody not to turn these servers off. But what it was like when you- >> That's when you came up with Elastic MapReduce because you said this is too hard, we got to make it easier. >> Basically yes. (John laughs) I was installing Hadoop version Beta 9.9 or whatever. It was like, this is really hard. >> We got to make it simpler. All right, good stuff. I love the walk down memory Lane. And also your advice. Great stuff. I think culture is huge. That's why I like Adam's keynote at Reinvent, Adam Selipsky talk about Pathfinders and trailblazers, because that's a blast radius impact when you can actually have innovation organically just come from anywhere. That's totally cool. >> Matt: Totally cool. >> All right, let's get into the product. Serverless has been hot. We hear a lot of EKS is hot. Containers are booming. Kubernetes is getting adopted, still a lot of work to do there. Cloud native developers are booming. Serverless, Lambda. How does that impact the analytics piece? Can you share the hot products around how that translates? >> Absolutely, yeah. >> Aurora, SageMaker. >> Yeah, I think it's... If you look at kind of the evolution and what customers are asking for, they don't just want low cost. They don't just want this broad set of services. They don't just want those services to have deep capabilities. They want those services to have as low an operating cost over time as possible. So we kind of really got it down. We got built a lot of muscle, a lot of services about getting up and running and experimenting and prototyping and turning things off and turning them on and turning them off. And that's all great. But actually, you really only in most projects start something once and then stop something once, and maybe there's an hour in between or maybe there's a year. But the real expense in terms of time and operations and complexity is sometimes in that running cost. And so we've heard very loudly and clearly from customers that running cost is just undifferentiated to them. And they want to spend more time on their work. And in analytics, that is slicing the data, pivoting the data, combining the data, labeling the data, training their models, running inference against their models, and less time doing the operational pieces. >> Is that why the service focuses there? >> Yeah, absolutely. It dramatically reduces the skill required to run these workloads of any scale. And it dramatically reduces the undifferentiated heavy lifting because you get to focus more of the time that you would have spent on the operations on the actual work that you want to get done. And so if you look at something just like Redshift Serverless, that we launched a Reinvent, we have a lot of customers that want to run the cluster, and they want to get into the weeds where there is benefit. We have a lot of customers that say there's no benefit for me, I just want to do the analytics. So you run the operational piece, you're the experts. We run 60 million instant startups every single day. We do this a lot. >> John: Exactly. We understand the operations- >> I just want the answers. Come on. >> So just give me the answers or just give me the notebook or just give me the inference prediction. Today, for example, we announced Serverless Inference. So now once you've trained your machine learning model, just run a few lines of code or you just click a few buttons and then you got an inference endpoint that you do not have to manage. And whether you're doing one query against that end point per hour or you're doing 10 million, we'll just scale it on the back end. I know we got not a lot of time left, but I want to get your reaction on this. One of the things about the data lakes not being data swamps has been, from what I've been reporting and hearing from customers, is that they want to retrain their machine learning algorithm. They need that data, they need the real time data, and they need the time series data. Even though the time has passed, they got to store in the data lake. So now the data lake's main function is being reusing the data to actually retrain. It works properly. So a lot of post mortems turn into actually business improvements to make the machine learnings smarter, faster. Do you see that same way? Do you see it the same way? >> Yeah, I think it's really interesting >> Or is that just... >> No, I think it's totally interesting because it's convenient to kind of think of analytics as a very clear progression from point A to point B. But really, you're navigating terrain for which you do not have a map, and you need a lot of help to navigate that terrain. And so having these services in place, not having to run the operations of those services, being able to have those services be secure and well governed. And we added PII detection today. It's something you can do automatically, to be able to use any unstructured data, run queries against that unstructured data. So today we added text queries. So you can just say, well, you can scan a badge, for example, and say, well, what's the name on this badge? And you don't have to identify where it is. We'll do all of that work for you. It's more like a branch than it is just a normal A to B path, a linear path. And that includes loops backwards. And sometimes you've got to get the results and use those to make improvements further upstream. And sometimes you've got to use those... And when you're downstream, it will be like, "Ah, I remember that." And you come back and bring it all together. >> Awesome. >> So it's a wonderful world for sure. >> Dr. Matt, we're here in theCUBE. Just take the last word and give the update while you're here what's the big news happening that you're announcing here at Summit in San Francisco, California, and update on the business analytics group. >> Yeah, we did a lot of announcements in the keynote. I encourage everyone to take a look at, that this morning with Swami. One of the ones I'm most excited about is the opportunity to be able to take dashboards, visualizations. We're all used to using these things. We see them in our business intelligence tools, all over the place. However, what we've heard from customers is like, yes, I want those analytics, I want that visualization, I want it to be up to date, but I don't actually want to have to go from my tools where I'm actually doing my work to another separate tool to be able to look at that information. And so today we announced 1-click public embedding for QuickSight dashboard. So today you can literally as easily as embedding a YouTube video, you can take a dashboard that you've built inside QuickSight, cut and paste the HTML, paste it into your application and that's it. That's what you have to do. It takes seconds. >> And it gets updated in real time. >> Updated in real time. It's interactive. You can do everything that you would normally do. You can brand it, there's no power by QuickSight button or anything like that. You can change the colors, fit in perfectly with your application. So that's an incredibly powerful way of being able to take an analytics capability that today sits inside its own little fiefdom and put it just everywhere. Very transformative. >> Awesome. And the business is going well. You got the Serverless detail win for you there. Good stuff. Dr. Matt Wood, thank you for coming on theCUBE. >> Anytime. Thank you. >> Okay, this is theCUBE's coverage of AWS Summit 2022 in San Francisco, California. I'm John Furrier, host of theCUBE. Stay with us for more coverage of day two after this short break. (gentle music)

Published Date : Apr 21 2022

SUMMARY :

It's great to have of everyone here. I appreciate it. I always call you Dr. Matt Wood The one and only, In joke, I love it. I think you had walk up music too. Yes, we all have our own So talk about your and the big data engines, One of the benefits and you have to be able to evaluate And you look back, and the theme was data as code. And you got to silo the data And so the more complex a domain students at the sophomore, junior level I didn't really know the answer myself. the domains are so broad you kind of We just had a guest, is a great name, by the way. It's his real name. His advice was just do projects. Matt: And get hands on. and you hugged on for the assets move on to something new. and get the funding is like, And you do a proposal, And then you have the tools there. So for fun, you can just code something. And I managed to convince the team That's when you came I was installing Hadoop I love the walk down memory Lane. How does that impact the analytics piece? that is slicing the data, And so if you look at something We understand the operations- I just want the answers. that you do not have to manage. And you don't have to and give the update while you're here is the opportunity to be able that you would normally do. And the business is going well. Thank you. I'm John Furrier, host of theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Johnny DallasPERSON

0.99+

Andy JacksonPERSON

0.99+

John FurrierPERSON

0.99+

Dave VelantaPERSON

0.99+

DavePERSON

0.99+

AWSORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

JohnPERSON

0.99+

MattPERSON

0.99+

Adam SelipskyPERSON

0.99+

10 millionQUANTITY

0.99+

$50 millionQUANTITY

0.99+

Matt WoodPERSON

0.99+

60 millionQUANTITY

0.99+

todayDATE

0.99+

50%QUANTITY

0.99+

fiveQUANTITY

0.99+

AdamPERSON

0.99+

two groupsQUANTITY

0.99+

San Francisco, CaliforniaLOCATION

0.99+

16QUANTITY

0.99+

2013DATE

0.99+

PythonTITLE

0.99+

1-clickQUANTITY

0.99+

a yearQUANTITY

0.99+

TodayDATE

0.99+

HadoopTITLE

0.99+

ten yearsQUANTITY

0.99+

two disciplinesQUANTITY

0.99+

New York CityLOCATION

0.99+

San Francisco, CaliforniaLOCATION

0.99+

an hourQUANTITY

0.99+

firstQUANTITY

0.99+

this yearDATE

0.99+

CUBEORGANIZATION

0.99+

first timeQUANTITY

0.98+

50 %QUANTITY

0.98+

theCUBEORGANIZATION

0.98+

millionsQUANTITY

0.98+

AWS SummitEVENT

0.98+

YouTubeORGANIZATION

0.98+

memory LaneLOCATION

0.98+

UberORGANIZATION

0.98+

20 year oldQUANTITY

0.97+

day twoQUANTITY

0.97+

OneQUANTITY

0.97+

SageMakerTITLE

0.97+

AWS Summit 2022EVENT

0.97+

QuickSightTITLE

0.96+

bothQUANTITY

0.96+

SwamiPERSON

0.96+

50 yearsQUANTITY

0.96+

oneQUANTITY

0.96+

SQLTITLE

0.95+

Elastic MapReduceTITLE

0.95+

Dr.PERSON

0.94+

Johnny CUBEPERSON

0.93+

Clive Charlton and Aditya Agrawal | AWS Public Sector Summit Online


 

(upbeat music) >> Narrator: From around the globe. It's The CUBE, with digital coverage of AWS public sector online, (upbeat music) brought to you by, Amazon Web Services. >> Everyone welcome back to The CUBE virtual coverage, of AWS public sector summit online. I'm John Furrier, your host of The CUBE. Normally we're in person, out on Asia-Pacific, and all the different events related to public sector. But this year we have to do it remote, and we're going to do the remote virtual CUBE, with Data Virtual Public Sector Online Summit. And we have two great guests here, about Digital Earth Africa project, Clive Charlton. Head of Solutions Architecture, Sub-Saharan Africa with AWS, Clive thanks for coming on, and Aditya Agrawal founder of D4DInsights, and also the advisor for the Digital Earth Africa project with AWS. So gentlemen, thank you for coming on. Appreciate you coming on remotely. >> Thanks for having us. >> Thank you for having us, John. >> So Clive take us through real quickly. Just take a minute to describe what is the Digital Earth Africa Project. What are the problems, that you're aiming to solve? >> Well, we're really aiming to provide, actionable data to governments, and organization around Africa, by providing satellite imagery, in an easy to use format, and doing that on the cloud, that serves countries throughout Africa. >> And just from a cloud perspective, give us a quick taste of what's going on, just with the tech, it's on Amazon. You got a little satellite action. Is there ground station involved? Give us a little bit more color around, you know, what's the scope of the project. >> Yeah, so, historically speaking you'd have to process satellite imagery down link it, and then do some heavy heavy lifting, around the processing of the data. Digital Earth Africa was built, from the experiences from Digital Earth Australia, originally developed by a Geo-sciences Australia and they use container services for Kubernetes's called Elastic Kubernetes Service to spin up virtual machines, which we are required to process the raw satellite imagery, into a format called a Cloud Optimized GeoTIFF. This format is used to store very large volumes of data in a format that's really easy to query. So, organizations can just use NHTTP get range request. Just a query part of the file, that they're interested in, which means, the results are served much, much quicker, from much, much better overall experience, under the hood, the store where the data is stored in the Amazon Simple Storage Service, which is S3, and the Metadata Index in a Relational Database Service, that runs the Open Data CUBE Library, which is allows Digital Earth Africa, to store this data in both space and time. >> It's interesting. I just did a, some interviews last week, on a symposium on space and cybersecurity, and we were talking about , the impact of satellites and GPS and just the overall infrastructure shift. And it's just another part of the edge of the network. Aditya, I want to get your thoughts on this, and your reaction to the Digital Earth, cause you're an advisor. Let's zoom out. What's the impact of people's lives? Give us a quick overview, of how you see it playing out because, explaining to someone, who doesn't know anything about the project, like, okay what is it about, and how does it actually impact people? >> Sure. So, you know, as, as Clive mentioned, I mean there's, there's definitely a, a digital infrastructure behind Digital Earth Africa, in a way that it's going to be able to serve free and open satellite data. And often the, the issue around satellite data, especially within the context of Africa, and other parts of the world is that there's a level of capacity that's required, in order to be able to use that data. But there's also all kinds of access issues, because, traditionally satellite data is heavy. There's the old model of being able to download the data and then being able to do something with it. And then often about 80% of the time, that you spend on satellite data is spent, just pre processing the data, before you can actually, do any of the fun analysis around it, that really sort of impacts the kinds of decisions and actions that you're looking for. And so that's why Digital Earth Africa. And that's why this partnership, with Amazon is a fantastic partnership, because it really allows us, to be able, to scale the approach across the entire continent, make it easy for that data to be accessed and make it easier for people to be able to use that data. The way that Digital Earth Africa is being operationalized, is that we're not just looking at it, from the perspective of, let's put another infrastructure into Africa. We want this program, and it is a program, that we want institutionalized within Africa itself. One that leverages expertise across the continent, and one that brings in organizations across the continent to really sort of take the leadership and ownership of this program as it moves forward. The idea of it is that, once you're able to have this information, being able to address issues like food security, climate change, coastal resilience, land degradation where illegal mining is, where is the water? We want to be able to do that, in a way that it's really looking at what are the national development priorities within the countries themselves, and how does it also then support regional and global frameworks like Africa's Agenda 2063 and the sustainable development goals. >> No doubt in my mind, obviously, is that huge benefits to these kinds of technologies. I want to also just ask you, as a follow up is a huge space race going on, right now, explosion of availability of satellite data. And again, more satellites going up, There's more congestion, more contention. Again, we had a big event on that cybersecurity, and the congestion issue, but, you know, satellite data was power everyone here in the United States, you want an Uber, you want Google Maps you've got your everywhere with GPS, without it, we'd be kind of like (laughing), wondering what's going on. How do we even vote these days? So certainly an impact, but there's a huge surge of availability, of the use of satellite data. How do you explain this? And what are some of the challenges, from the data side that's coming, from the Digital Earth Africa project that you guys hope to resolve? >> Sure. I mean, that's a great question. I mean, I think at one level, when you're looking at the space race right now, satellites are becoming cheaper. They're becoming more efficient. There's increased technology now, on the types of sensors that you can deploy. There's companies like Planet, that are really revolutionizing how even small countries are able to deploy their own satellites, and the constellation that they're putting forward, in terms of the frequency by which, you're able to get data, for any given part of the earth on a daily basis, coupled with that. And you know, this is really sort of in climbs per view, but the cloud computing capabilities, and overall computing power that you have today, then what you had 10 years, 15 years ago is so vastly different. What used to take weeks to do before, for any kind of analysis on satellite data, which is heavy data now takes, you know, minutes or hours to do. So when you put all that together, again, you know, I think it really speaks, to the power of this partnership with Amazon and really, what that means, for how this data is going to be delivered to Africa, because it really allows for the scalability, for anything that happens through Digital Earth Africa. And so, for example, one of the approaches, that we're taking us, we identify what the priorities, and needs are at the country level. Let's say that it's a land degradation, there's often common issues across countries. And so when we can take one particular issue, tested with additional countries, and then we can scale it across the whole continent because the infrastructure is there for the whole continent. >> Yeah. That's a great point. So many storylines here. We'll get to climb in a second on sustainability. And I want to talk about the Open Data Platform. Obviously, open data, having data is one thing, but now train data, and having more trusted data becomes a huge issue. Again, I want to dig into that for a second, but, Clive, I want to ask you, first, what region are we in? I mean, is this, you guys actually have a great, first of all, we've been covering the region expansion from Bahrain all the way, as moves around the world, probably soon in space. There'll be a region Amazon space station region probably, someday in the future but, what region are you running the project out of? Can you, and why is it important? Can you share the update on the regional piece? >> Well, we're very pleased, that Digital Earth Africa, is using the new Africa region in Cape Town, in South Africa, which was launched in April of this year. It's one of 24 regions around the world and we have another three new regions announced, what this means for users of Digital Earth Africa is, they're able to use region closest to them, which gives them the best user experience. It's the, it's the quickest connection for them. But more importantly, we also wanted to use, an African solution, for African people and using the Africa region in Cape Town, really aligned with that thinking. >> So, localization on the data, latency, all that stuff is kind of within the region, within country here. Right? >> That's right, Yeah >> And why is that important? Is there any other benefits? Why should someone care? Obviously, this failover option, I mean, in any other countries to go to, but why is having something, in that region important for this project? >> Well, it comes down to latency for the, for the users. So, being as close to the data, as possible is, is really important, for the user experience. Especially when you're looking at large data sets, and big queries. You don't want to be, you don't want to be waiting a long lag time, for that query to go backwards and forwards, between the user and the region. So, having the data, in the Africa region in Cape Town is important. >> So it's about the region, I love when these new regions rollout from Amazon, Cause obviously it's this huge buildup CapEx, in this huge data center servers and everything. Sustainability is a huge part of the story. How does the sustainability piece fit into the, the data initiative supported in Africa? Can you share some updates on that? >> Well, this, this project is also closely aligned with the, Amazon Sustainability Data Initiative, which looks to accelerate sustainability research. and innovation, really by minimizing the cost, and the time required to acquire, and analyze large sustainability datasets. So the initiative supports innovators, and researchers with the data and tools, and, and technical experience, that they need to move sustainability, to the next level. These are public datasets and publicly available to anyone. In addition, to that, the initiative provides cloud grants to those who are interested in exploring, exploring the use of AWS technology and scalable infrastructure, to serve sustainability challenges, of this nature. >> Aditya, I want to hear your thoughts, on this comment that Clive made around latency, and certainly having a region there has great benefits. You don't need to hop on that. Everyone knows I'm a big fan of the regional model, but it brings up the issue, of what's going on in the country, from an infrastructure standpoint, a lot of mobility, a lot of edge computing. I can almost imagine that. So, so how do you see that evolving, from a business standpoint, from a project standpoint data standpoint, can you comment and react to that edge, edge angle? >> Yeah, I mean, I think, I think that, the value of an open data infrastructure, is that, you want to use that infrastructure, to create a whole data ecosystem type of an approach. And so, from the perspective of being able. to make this data readily accessible, making it efficiently accessible, and really being able to bring industry, into that ecosystem, because of what we really want as we, as the program matures, is for this program, to then also instigate the development of new businesses, entrepreneurship, really get the young people across Africa, which has the largest proportion of young people, anywhere in the world, to be engaged around what you can do, with satellite data, and the types of businesses that can be developed around it. And, so, by having all of our data reside in Cape Town on the continent there's obviously technical benefits, to that in terms of, being able to apply the data, and create new businesses. There's also a, a perception in the fact that, the data that Digital Earth Africa is serving, is in Africa and residing in Africa which does have, which does go a long way. >> Yeah. And that's a huge value. And I can just imagine the creativity cloud, if you can comment on this open data platform idea, because some of the commentary that we've been having on The CUBE here, and all around the world is data's great. We all know we're living with a lot of data, you starting to see that, the commoditization and horizontal scalability of data, is one thing, but to put it into software defined environments, whether, it's an entrepreneur coding up an app, or doing something to share some transparency, around some initiatives going on within the region or on the continent, it's about trusted data. It's about sharing algorithms. AI is also a consumer of data, machines consume data. So, it's not just the technology data, is part of this new normal. What's this Open Data Platform, And how does that translate into value in your opinion? >> I, yeah. And you know, when, when data is shared on, on AWS anyone can analyze it and build services on top of it, using a broad range of compute and data to data analytics products, you know, things like Amazon EC2, or Lambda, which is all serverless compute, to things like Amazon Elastic MapReduce, for complex extract and transformation processes, but sharing data in the cloud, lets users, spend more time on the data analysis, rather than, than the data acquisition. And researchers can analyze data shared on AWS, without needing to pay to store their own copy, which is what the Open Data Platform provides. You only have to pay for the compute that you use and you don't need to purchase storage, to start a new project. So the registry of the open data on AWS, makes it easy to find those datasets, but, by making them publicly available through AWS services. And when you share, share your data on AWS, you make it available, to a large and growing community of developers, and startups, and enterprises, all around the world. And you know, and we've been talking particularly around, around Africa. >> Yeah. So it's an open source model, basically, it's free. You don't, it doesn't cost you anything probably, just started maybe down the road, if it gets heavy, maybe to charging but the most part easy for scientists to use and then you're leveraging it into the open, contributing back. Is that right? >> Yep. That's right. To me getting, getting researchers, and startups, and organizations growing quickly, without having to worry about the data acquisition, they can just get going and start building. >> I want to get back to Aditya, on this skill gap issue, because you brought up something that, I thought was really cool. People are going to start building apps. I'm going to start to see more innovation. What are the needs out there? Because we're seeing a huge onboarding of new talent, young talent, people rescaling from existing jobs, certainly COVID accelerated, people looking for more different kinds of work. I'm sure there's a lot of (laughing) demand to, to do some innovative things. The question I always get, and want to get your reaction is, what are the skills needed to, to get involved, to one contribute, but also benefit from it, whether it's the data satellite, data or just how to get involved skill-wise >> Sure. >> Yes. >> Yeah. So most recently we've created a six week training course. That's really kind of taken users from understanding, the basics of Earth Observation Data, to how to work, with Python, to how to create their own Jupyter notebooks, and their own Use cases. And so there's a, there's a wide sort of range of skill sets, that are required depending on who you are because, effectively, what we want to be able to do is get everyone from, kind of the technical user, that might have some remote sensing background to the developer, to the policy maker, and decision maker, to understand the value of this infrastructure, whether you're the one who's actually analyzing the data. If you're the one who's developing new applications, or you're taking that information from a managerial or policy level discussion to actually deliver the action and sort of impact that you're looking for. And so, you know, in, in that regard, we're working with ITC in the Netherlands and again, with institutions across Africa, that already have a mandate, and expertise in this particular area, to create a holistic capacity development program, that will address all of those different factors. >> So I guess the follow up question I want to have is, how do you ensure the priorities of Africa are addressed, as part of this program? >> Yeah, so, we are, we've created a governance model, that really is both top down, and bottom up. At the bottom up level, We have a technical advisory committee, that has over 15 institutions, many of which are based across Africa, that really have a good understanding of the needs, the priorities, and the mandate for how to work with countries. And at the top down level, we're developing a governing board, that will be inclusive, of the key continental level institutions, that really provide the political buy-in, the sustainability of the program, and really provide overall guidance. And within that, we're also creating an operational models, such that these institutions, that do have the capacity to support the program, they're actually the ones, who are also going to be supporting, the implementation of the program itself. >> And there's been some United Nations, sustained development projects all kinds of government involvement, around making sure certain things would happen, within the country. Can you just share, some of the highlights, or some of the key initiatives, that are going on, that you're supporting, to make it a better, better world? >> Yeah. So this is, this program is very closely aligned to a sustainable development agenda. And so looking after, looking developing methods, that really address, the sustainable development goals as one facet, in Africa, there's another program looking overall, overall national development priorities and sustainability called the Agenda 2063. And really like, I think what it really comes down to this, this wouldn't be happening, without the country level involvement themselves. So, this started with five countries, originally, Senegal, Ghana, Kenya, Tanzania, and the government of Kenya itself, has really been, a kind of a founding partner for, how Digital Earth Africa and it's predecessor of Africa Regional Data Cube, came to be. And so without high level support, and political buying within those governments, I mean, it's really because of that. That's why we're, we're where we are. >> I need you to thank you for coming on and sharing that insight. Clive will give you the final word, for the folks watching Digital Earth Africa, processes, petabytes of data. I mean the satellite data as well, huge, you mentioned it's a new region. You're running Kubernetes, Elastic Kubernetes Service, making containers easy to use, pay as you go. So you get cutting edge, take the one minute to, to share why this region's cutting edge. Does it have the scale of other regions? What should they know about AWS, in Cape Town, for Africa's new region? Take a minute to, to put plugin. >> Yeah, thank you for that, John. So all regions are built in the, in the same way, all around the world. So they're built for redundancy and reliability. They typically have a minimum of three, what we call Availability Zones. And each one is a contains a, a cluster of, of data centers, and all interconnected with fast fiber. So, you know, you can survive, you know, a failure with with no impact to your services. And the Cape Town region is built in exactly the same the same way, we have most of the services available in the, in the Cape Town region, like most other regions. So, as a user of AWS, you, you can have the confidence that, You can deploy your services and workloads, into AWS and run it in the same in the same way, with the same kind of speed, and the same kind of support, and infrastructure that's backing any region, anywhere else in the world. >> Well great. Thanks for that plug, Aditya, thank you for your insight. And again, innovation follows cloud computing, whether you're building on top of it as a startup a government or enterprise, or the big society better, in this case, the Digital Earth Africa project. Great. A great story. Thank you for sharing. I appreciate it. >> Thank you for having us. >> Thank you for having us, John >> I'm John Furrier with, The CUBE, virtual remote, not in person this year. I hope to see you next time in person. Thanks for watching. (upbeat music) (upbeat music decreases)

Published Date : Oct 20 2020

SUMMARY :

Narrator: From around the globe. and all the different events What are the problems, and doing that on the cloud, you know, and the Metadata Index in a and just the overall infrastructure shift. and other parts of the world and the congestion issue, and the constellation that on the regional piece? It's one of 24 regions around the world So, localization on the data, in the Africa region in So it's about the region, and the time required to acquire, fan of the regional model, and the types of businesses and all around the world is data's great. the compute that you use it into the open, about the data acquisition, What are the needs out there? kind of the technical user, and the mandate for how or some of the key initiatives, and the government of Kenya itself, I mean the satellite data as well, and the same kind of support, or the big society better, I hope to see you next time in person.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Aditya AgrawalPERSON

0.99+

AmazonORGANIZATION

0.99+

AWSORGANIZATION

0.99+

ClivePERSON

0.99+

Cape TownLOCATION

0.99+

JohnPERSON

0.99+

AfricaLOCATION

0.99+

John FurrierPERSON

0.99+

Amazon Web ServicesORGANIZATION

0.99+

United StatesLOCATION

0.99+

six weekQUANTITY

0.99+

Agenda 2063TITLE

0.99+

Clive CharltonPERSON

0.99+

PythonTITLE

0.99+

AdityaPERSON

0.99+

NetherlandsLOCATION

0.99+

South AfricaLOCATION

0.99+

five countriesQUANTITY

0.99+

United NationsORGANIZATION

0.99+

last weekDATE

0.99+

one minuteQUANTITY

0.99+

Digital Earth AfricaORGANIZATION

0.99+

earthLOCATION

0.99+

bothQUANTITY

0.98+

D4DInsightsORGANIZATION

0.98+

April of this yearDATE

0.98+

10 yearsQUANTITY

0.98+

this yearDATE

0.98+

UberORGANIZATION

0.98+

BahrainLOCATION

0.98+

S3TITLE

0.97+

15 years agoDATE

0.97+

over 15 institutionsQUANTITY

0.97+

each oneQUANTITY

0.97+

Data Virtual Public Sector Online SummitEVENT

0.97+

oneQUANTITY

0.96+

firstQUANTITY

0.96+

about 80%QUANTITY

0.96+

threeQUANTITY

0.96+

EarthLOCATION

0.96+

Africa Regional Data CubeORGANIZATION

0.96+

Google MapsTITLE

0.95+

one levelQUANTITY

0.94+