Dr. Matt Wood, AWS | AWS Summit SF 2022

(gentle melody) >> Welcome back to theCUBE's live coverage of AWS Summit in San Francisco, California. Events are back. AWS Summit in New York City this summer, theCUBE will be there as well. Check us out there. I'm glad to have events back. It's great to have of everyone here. I'm John Furrier, host of theCUBE. Dr. Matt Wood is with me, CUBE alumni, now VP of Business Analytics Division of AWS. Matt, great to see you. >> Thank you, John. It's great to be here. I appreciate it. >> I always call you Dr. Matt Wood because Andy Jackson always says, "Dr. Matt, we would introduce you on the arena." (Matt laughs) >> Matt: The one and only. >> The one and only, Dr. Matt Wood. >> In joke, I love it. (laughs) >> Andy style. (Matt laughs) I think you had walk up music too. >> Yes, we all have our own personalized walk up music. >> So talk about your new role, not a new role, but you're running the analytics business for AWS. What does that consist of right now? >> Sure. So I work. I've got what I consider to be one of the best jobs in the world. I get to work with our customers and the teams at AWS to build the analytics services that millions of our customers use to slice dice, pivot, better understand their data, look at how they can use that data for reporting, looking backwards. And also look at how they can use that data looking forward, so predictive analytics and machine learning. So whether it is slicing and dicing in the lower level of Hadoop and the big data engines, or whether you're doing ETL with Glue, or whether you're visualizing the data in QuickSight or building your models in SageMaker. I got my fingers in a lot of pies. >> One of the benefits of having CUBE coverage with AWS since 2013 is watching the progression. You were on theCUBE that first year we were at Reinvent in 2013, and look at how machine learning just exploded onto the scene. You were involved in that from day one. It's still day one, as you guys say. What's the big thing now? Look at just what happened. Machine learning comes in and then a slew of services come in. You've got SageMaker, became a hot seller right out of the gate. The database stuff was kicking butt. So all this is now booming. That was a real generational change over for database. What's the perspective? What's your perspective on that's evolved? >> I think it's a really good point. I totally agree. I think for machine learning, there's sort of a Renaissance in machine learning and the application of machine learning. Machine learning as a technology has been around for 50 years, let's say. But to do machine learning right, you need like a lot of data. The data needs to be high quality. You need a lot of compute to be able to train those models and you have to be able to evaluate what those models mean as you apply them to real world problems. And so the cloud really removed a lot of the constraints. Finally, customers had all of the data that they needed. We gave them services to be able to label that data in a high quality way. There's all the compute you need to be able to train the models. And so where you go? And so the cloud really enabled this Renaissance with machine learning. And we're seeing honestly a similar Renaissance with data and analytics. If you look back five to ten years, analytics was something you did in batch, your data warehouse ran an analysis to do reconciliation at the end of the month, and that was it. (John laughs) And so that's when you needed it. But today, if your Redshift cluster isn't available, Uber drivers don't turn up, DoorDash deliveries don't get made. Analytics is now central to virtually every business, and it is central to virtually every business's digital transformation. And being able to take that data from a variety of sources, be able to query it with high performance, to be able to actually then start to augment that data with real information, which usually comes from technical experts and domain experts to form wisdom and information from raw data. That's kind of what most organizations are trying to do when they kind of go through this analytics journey. >> It's interesting. Dave Velanta and I always talk on theCUBE about the future. And you look back, the things we're talking about six years ago are actually happening now. And it's not hyped up statement to say digital transformation is actually happening now. And there's also times when we bang our fists on the table saying, say, "I really think this is so important." And David says, "John, you're going to die on that hill." (Matt laughs) And so I'm excited that this year, for the first time, I didn't die on that hill. I've been saying- >> Do all right. >> Data as code is the next infrastructure as code. And Dave's like, "What do you mean by that?" We're talking about how data gets... And it's happening. So we just had an event on our AWS startups.com site, a showcase for startups, and the theme was data as code. And interesting new trends emerging really clearly, the role of a data engineer, right? Like an SRE, what an SRE did for cloud, you have a new data engineering role because of the developer onboarding is massively increasing, exponentially, new developers. Data science scientists are growing, but the pipelining and managing and engineering as a system, almost like an operating system. >> Kind of as a discipline. >> So what's your reaction to that about this data engineer, data as code? Because if you have horizontally scalable data, you've got to be open, that's hard (laughs), okay? And you got to silo the data that needs to be siloed for compliance and reason. So that's a big policy around that. So what's your reaction to data's code and the data engineering phenomenon? >> It's a really good point. I think with any technology project inside of an organization, success with analytics or machine learning, it's kind of 50% technology and then 50% cultural. And you have often domain experts. Those could be physicians or drug design experts, or they could be financial experts or whoever they might be, got deep domain expertise, and then you've got technical implementation teams. And there's kind of a natural, often repulsive force. I don't mean that rudely, but they just don't talk the same language. And so the more complex a domain and the more complex the technology, the stronger their repulsive force. And it can become very difficult for domain experts to work closely with the technical experts to be able to actually get business decisions made. And so what data engineering does and data engineering is, in some cases a team, or it can be a role that you play. It's really allowing those two disciplines to speak the same language. You can think of it as plumbing, but I think of it as like a bridge. It's a bridge between the technical implementation and the domain experts, and that requires a very disparate range of skills. You've got to understand about statistics, you've got to understand about the implementation, you got to understand about the data, you got to understand about the domain. And if you can put all of that together, that data engineering discipline can be incredibly transformative for an organization because it builds the bridge between those two groups. >> I was advising some young computer science students at the sophomore, junior level just a couple of weeks ago, and I told them I would ask someone at Amazon this question. So I'll ask you, >> Matt: Okay. since you've been in the middle of it for years, they were asking me, and I was trying to mentor them on how do you become a data engineer, from a practical standpoint? Courseware, projects to work on, how to think, not just coding Python, because everyone's coding in Python, but what else can they do? So I was trying to help them. I didn't really know the answer myself. I was just trying to kind of help figure it out with them. So what is the answer, in your opinion, or the thoughts around advice to young students who want to be data engineers? Because data scientists is pretty clear on what that is. You use tools, you make visualizations, you manage data, you get answers and insights and then apply that to the business. That's an application. That's not the standing up a stack or managing the infrastructure. So what does that coding look like? What would your advice be to folks getting into a data engineering role? >> Yeah, I think if you believe this, what I said earlier about 50% technology, 50 % culture, the number one technology to learn as a data engineer is the tools in the cloud which allow you to aggregate data from virtually any source into something which is incrementally more valuable for the organization. That's really what data engineering is all about. It's about taking from multiple sources. Some people call them silos, but silos indicates that the storage is kind of fungible or undifferentiated. That's really not the case. Success requires you to have really purpose built, well crafted, high performance, low cost engines for all of your data. So understanding those tools and understanding how to use them, that's probably the most important technical piece. Python and programming and statistics go along with that, I think. And then the most important cultural part, I think is... It's just curiosity. You want to be able to, as a data engineer, you want to have a natural curiosity that drives you to seek the truth inside an organization, seek the truth of a particular problem, and to be able to engage because probably you're going to some choice as you go through your career about which domain you end up in. Maybe you're really passionate about healthcare, or you're really just passionate about transportation or media, whatever it might be. And you can allow that to drive a certain amount of curiosity. But within those roles, the domains are so broad you kind of got to allow your curiosity to develop and lead you to ask the right questions and engage in the right way with your teams, because you can have all the technical skills in the world. But if you're not able to help the team's truth seek through that curiosity, you simply won't be successful. >> We just had a guest, 20 year old founder, Johnny Dallas who was 16 when he worked at Amazon. Youngest engineer- >> Johnny Dallas is a great name, by the way. (both chuckle) >> It's his real name. It sounds like a football player. >> That's awesome. >> Rock star. Johnny CUBE, it's me. But he's young and he was saying... His advice was just do projects. >> Matt: And get hands on. Yeah. >> And I was saying, hey, I came from the old days where you get to stand stuff up and you hugged on for the assets because you didn't want to kill the project because you spent all this money. And he's like, yeah, with cloud you can shut it down. If you do a project that's not working and you get bad data no one's adopting it or you don't like it anymore, you shut it down, just something else. >> Yeah, totally. >> Instantly abandon it and move on to something new. That's a progression. >> Totally! The blast radius of decisions is just way reduced. We talk a lot about in the old world, trying to find the resources and get the funding is like, all right, I want to try out this kind of random idea that could be a big deal for the organization. I need $50 million and a new data center. You're not going to get anywhere. >> And you do a proposal, working backwards, documents all kinds of stuff. >> All that sort of stuff. >> Jump your hoops. >> So all of that is gone. But we sometimes forget that a big part of that is just the prototyping and the experimentation and the limited blast radius in terms of cost, and honestly, the most important thing is time, just being able to jump in there, fingers on keyboards, just try this stuff out. And that's why at AWS, we have... Part of the reason we have so many services, because we want, when you get into AWS, we want the whole toolbox to be available to every developer. And so as your ideas develop, you may want to jump from data that you have that's already in a database to doing realtime data. And then you have the tools there. And when you want to get into real time data, you don't just have kinesis, you have real time analytics, and you can run SQL against that data. The capabilities and the breadth really matter when it comes to prototyping. >> That's the culture piece, because what was once a dysfunctional behavior. I'm going to go off the reservation and try something behind my boss' back, now is a side hustle or fun project. So for fun, you can just code something. >> Yeah, totally. I remember my first Hadoop projects. I found almost literally a decommissioned set of servers in the data center that no one was using. They were super old. They're about to be literally turned off. And I managed to convince the team to leave them on for me for another month. And I installed Hadoop on them and got them going. That just seems crazy to me now that I had to go and convince anybody not to turn these servers off. But what it was like when you- >> That's when you came up with Elastic MapReduce because you said this is too hard, we got to make it easier. >> Basically yes. (John laughs) I was installing Hadoop version Beta 9.9 or whatever. It was like, this is really hard. >> We got to make it simpler. All right, good stuff. I love the walk down memory Lane. And also your advice. Great stuff. I think culture is huge. That's why I like Adam's keynote at Reinvent, Adam Selipsky talk about Pathfinders and trailblazers, because that's a blast radius impact when you can actually have innovation organically just come from anywhere. That's totally cool. >> Matt: Totally cool. >> All right, let's get into the product. Serverless has been hot. We hear a lot of EKS is hot. Containers are booming. Kubernetes is getting adopted, still a lot of work to do there. Cloud native developers are booming. Serverless, Lambda. How does that impact the analytics piece? Can you share the hot products around how that translates? >> Absolutely, yeah. >> Aurora, SageMaker. >> Yeah, I think it's... If you look at kind of the evolution and what customers are asking for, they don't just want low cost. They don't just want this broad set of services. They don't just want those services to have deep capabilities. They want those services to have as low an operating cost over time as possible. So we kind of really got it down. We got built a lot of muscle, a lot of services about getting up and running and experimenting and prototyping and turning things off and turning them on and turning them off. And that's all great. But actually, you really only in most projects start something once and then stop something once, and maybe there's an hour in between or maybe there's a year. But the real expense in terms of time and operations and complexity is sometimes in that running cost. And so we've heard very loudly and clearly from customers that running cost is just undifferentiated to them. And they want to spend more time on their work. And in analytics, that is slicing the data, pivoting the data, combining the data, labeling the data, training their models, running inference against their models, and less time doing the operational pieces. >> Is that why the service focuses there? >> Yeah, absolutely. It dramatically reduces the skill required to run these workloads of any scale. And it dramatically reduces the undifferentiated heavy lifting because you get to focus more of the time that you would have spent on the operations on the actual work that you want to get done. And so if you look at something just like Redshift Serverless, that we launched a Reinvent, we have a lot of customers that want to run the cluster, and they want to get into the weeds where there is benefit. We have a lot of customers that say there's no benefit for me, I just want to do the analytics. So you run the operational piece, you're the experts. We run 60 million instant startups every single day. We do this a lot. >> John: Exactly. We understand the operations- >> I just want the answers. Come on. >> So just give me the answers or just give me the notebook or just give me the inference prediction. Today, for example, we announced Serverless Inference. So now once you've trained your machine learning model, just run a few lines of code or you just click a few buttons and then you got an inference endpoint that you do not have to manage. And whether you're doing one query against that end point per hour or you're doing 10 million, we'll just scale it on the back end. I know we got not a lot of time left, but I want to get your reaction on this. One of the things about the data lakes not being data swamps has been, from what I've been reporting and hearing from customers, is that they want to retrain their machine learning algorithm. They need that data, they need the real time data, and they need the time series data. Even though the time has passed, they got to store in the data lake. So now the data lake's main function is being reusing the data to actually retrain. It works properly. So a lot of post mortems turn into actually business improvements to make the machine learnings smarter, faster. Do you see that same way? Do you see it the same way? >> Yeah, I think it's really interesting >> Or is that just... >> No, I think it's totally interesting because it's convenient to kind of think of analytics as a very clear progression from point A to point B. But really, you're navigating terrain for which you do not have a map, and you need a lot of help to navigate that terrain. And so having these services in place, not having to run the operations of those services, being able to have those services be secure and well governed. And we added PII detection today. It's something you can do automatically, to be able to use any unstructured data, run queries against that unstructured data. So today we added text queries. So you can just say, well, you can scan a badge, for example, and say, well, what's the name on this badge? And you don't have to identify where it is. We'll do all of that work for you. It's more like a branch than it is just a normal A to B path, a linear path. And that includes loops backwards. And sometimes you've got to get the results and use those to make improvements further upstream. And sometimes you've got to use those... And when you're downstream, it will be like, "Ah, I remember that." And you come back and bring it all together. >> Awesome. >> So it's a wonderful world for sure. >> Dr. Matt, we're here in theCUBE. Just take the last word and give the update while you're here what's the big news happening that you're announcing here at Summit in San Francisco, California, and update on the business analytics group. >> Yeah, we did a lot of announcements in the keynote. I encourage everyone to take a look at, that this morning with Swami. One of the ones I'm most excited about is the opportunity to be able to take dashboards, visualizations. We're all used to using these things. We see them in our business intelligence tools, all over the place. However, what we've heard from customers is like, yes, I want those analytics, I want that visualization, I want it to be up to date, but I don't actually want to have to go from my tools where I'm actually doing my work to another separate tool to be able to look at that information. And so today we announced 1-click public embedding for QuickSight dashboard. So today you can literally as easily as embedding a YouTube video, you can take a dashboard that you've built inside QuickSight, cut and paste the HTML, paste it into your application and that's it. That's what you have to do. It takes seconds. >> And it gets updated in real time. >> Updated in real time. It's interactive. You can do everything that you would normally do. You can brand it, there's no power by QuickSight button or anything like that. You can change the colors, fit in perfectly with your application. So that's an incredibly powerful way of being able to take an analytics capability that today sits inside its own little fiefdom and put it just everywhere. Very transformative. >> Awesome. And the business is going well. You got the Serverless detail win for you there. Good stuff. Dr. Matt Wood, thank you for coming on theCUBE. >> Anytime. Thank you. >> Okay, this is theCUBE's coverage of AWS Summit 2022 in San Francisco, California. I'm John Furrier, host of theCUBE. Stay with us for more coverage of day two after this short break. (gentle music)

Published Date : Apr 21 2022

SUMMARY :

It's great to have of everyone here. I appreciate it. I always call you Dr. Matt Wood The one and only, In joke, I love it. I think you had walk up music too. Yes, we all have our own So talk about your and the big data engines, One of the benefits and you have to be able to evaluate And you look back, and the theme was data as code. And you got to silo the data And so the more complex a domain students at the sophomore, junior level I didn't really know the answer myself. the domains are so broad you kind of We just had a guest, is a great name, by the way. It's his real name. His advice was just do projects. Matt: And get hands on. and you hugged on for the assets move on to something new. and get the funding is like, And you do a proposal, And then you have the tools there. So for fun, you can just code something. And I managed to convince the team That's when you came I was installing Hadoop I love the walk down memory Lane. How does that impact the analytics piece? that is slicing the data, And so if you look at something We understand the operations- I just want the answers. that you do not have to manage. And you don't have to and give the update while you're here is the opportunity to be able that you would normally do. And the business is going well. Thank you. I'm John Furrier, host of theCUBE.

ENTITIES

Entity	Category	Confidence
Johnny Dallas	PERSON	0.99+
Andy Jackson	PERSON	0.99+
John Furrier	PERSON	0.99+
Dave Velanta	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Matt	PERSON	0.99+
Adam Selipsky	PERSON	0.99+
10 million	QUANTITY	0.99+
$50 million	QUANTITY	0.99+
Matt Wood	PERSON	0.99+
60 million	QUANTITY	0.99+
today	DATE	0.99+
50%	QUANTITY	0.99+
five	QUANTITY	0.99+
Adam	PERSON	0.99+
two groups	QUANTITY	0.99+
San Francisco, California	LOCATION	0.99+
16	QUANTITY	0.99+
2013	DATE	0.99+
Python	TITLE	0.99+
1-click	QUANTITY	0.99+
a year	QUANTITY	0.99+
Today	DATE	0.99+
Hadoop	TITLE	0.99+
ten years	QUANTITY	0.99+
two disciplines	QUANTITY	0.99+
New York City	LOCATION	0.99+
San Francisco, California	LOCATION	0.99+
an hour	QUANTITY	0.99+
first	QUANTITY	0.99+
this year	DATE	0.99+
CUBE	ORGANIZATION	0.99+
first time	QUANTITY	0.98+
50 %	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
AWS Summit	EVENT	0.98+
YouTube	ORGANIZATION	0.98+
memory Lane	LOCATION	0.98+
Uber	ORGANIZATION	0.98+
20 year old	QUANTITY	0.97+
day two	QUANTITY	0.97+
One	QUANTITY	0.97+
SageMaker	TITLE	0.97+
AWS Summit 2022	EVENT	0.97+
QuickSight	TITLE	0.96+
both	QUANTITY	0.96+
Swami	PERSON	0.96+
50 years	QUANTITY	0.96+
one	QUANTITY	0.96+
SQL	TITLE	0.95+
Elastic MapReduce	TITLE	0.95+
Dr.	PERSON	0.94+
Johnny CUBE	PERSON	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Johnny CUBE: